InvestorsHub Logo

Bob Zumbrunnen

03/27/04 7:40 PM

#38181 RE: Condor #38180

An SI global PM should have/should be utilized to inform SI members.

I'm sorry, but ROTFLMAO!!!

Global PM's were at the core of the problem and that's currently the one part of the system most capable of bringing it all crashing down again.

And there's not really a whole lot to inform people of that they don't already know. The site was dead as a doornail and now it's not. Anyone who is following the Welcome thread over there has all the details. I figure if people want to know badly enough more than what they already know (was dead; now alive), they can come to that thread to find out. I don't need to spoonfeed the info to a lot of people who likely don't care beyond the fact that it's now operational, especially if said spoonfeeding were done via the most vulnerable part of the system.

And one other thing that I learned in the corporate world I left years ago: If I spend too much of my time telling people what I'm doing, I don't really get a chance to do any of it. Programming is my most important job duty right now and everything else, even things having to do with making money, are a distant 2nd.

But, since I'm here and I'm typing about what happened to the site, I'll basically repeat here what I've said on SI's Welcome thread so that more people will see it.

The site crashed while I was trimming down the table that contains links to messages. It's used for identifying what messages are in your inbox, which are in the "trash", which you've written, and which you've moved to folders.

It had 19.2 million rows in it. By removing all the rows that were in trash folders, I knocked it down to about 15 million rows.

The trouble started when I noticed that there were probably more than 10 million system-generated PM's (like the one you suggest I should send) that'd never been read and likely never would be because not all the "members" of SI really are member. At least a couple hundred thousand of them (an educated guess) are never-used accounts that were created when people signed up for other Go2Net or InfoSpace properties.

Since that particular table is one that has to be imported from scratch every time I bring it over to the development system (meaning I can't just bring over the newest records because then I'd miss changes made to earlier records, like moving them to folders or trashing them), I decided I wanted to get rid of the 10 million or so rows that weren't needed.

I wrote a program to do so. In batches of about 300k rows at a time.

And the system promptly crashed.

To the best of my knowledge at this point, what happened is that Oracle was keeping track of every deletion (in it's "redo" logs) just in case I wanted to undo the deletions.

The redo logs filled up or caused the hard drives themselves to fill up. Not sure which yet, but suspect the former.

An Oracle DBA who used to be responsible for the site rolled up his sleeves and about 12 hours after he started, he had it running again, and it appears that this morning or late last night he fixed the remaining bugs the site was exhibiting.

The plan now is to hurry up on my development of the new version of the site and hopefully very soon (a week or two; not months now) have everyone moved over to it, and leave the old site alive just for things like portfolios. All messaging will be moved to the new version. Which is not only more stable but is also in an environment I know and understand pretty intimately, unlike the existing version.

Once everyone's moved over to the new site for messaging, I'll start getting other things migrated (portfolios, for example), and when we finally no longer need anything on the old system, it'll be shut down, the drives erased, and all of the pieces will go up on eBay. Of course, everything on it, no matter how trivial it might seem, will be copied onto hard drives and tapes we'll keep, just in case they're ever needed.

So, that's the status and the plan.

As I'm typing this, I'm on Dell's website spec'ing out SI's new webserver and it's hoped that when it arrives, I'll have enough of the software written that we can plug it in, copy the software to it, and redirect everyone to it. We'll do this on a weekend since there're bound to be problems with things like indexes that're best addressed while the traffic load is smallish.