[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lojban] Downtime Post-Mortem



Short version:

    The box is now at my house, where it has four times as much
    bandwidth (1.5/768 ADSL, up from a rate-shaped 200Kbps
    connection) and where I can deal with problems quickly without
    needing someone else to give me physical access to the machine.

This is a Good Thing.  Most of the downtime was *NOT*, in any way,
my fault, for the record.


There will be more downtime later in the week, not sure when, but
not for more than an hour or two.


Detailed version:

Here's the chain of event:

Jun 14 10:51	Machine crashed.  No idea why; possibly bad memory.

Jun 14 10:55	I mailed my co-lo, but to the *wrong* *address*.

Jun 14 17:15	Realizing my error, I re-send the mail.  The mail
states that if the machine doesn't reboot cleanly, I'll be pulling
the machine out of the co-lo, because I apparently can't make a 1U
machine stable enough to stay in a co-lo where I can't get access
24/7.

Jun 14 17:45	Co-lo guy uses a remote power setup to power off and
then power on the machine's power jack.  Machine fails to come up.

Jun 15		I mail the co-lo to find out when I can take it out,
discover it's not until tomorrow (seriously, I have to wait a full
day to get in, no joke).  I start working on getting the DNS
pointing to the secondary IP at my home.

Jun 16 10:30	I phone co-lo guy to say when Ruth Anne (who
graciously helped with all this; thank you again!) will be by to get
the machine.  Scheduled pickup is 12:30.

Jun 16 12:15	Co-lo guy calls back and says that the reason the
machine never came up is because it was off.  My machines are
*always* set to stay powered off after a power outage, and guess
what having the power dropped on the machine's power jack looked
like?  Yep, a power outage.  He hadn't bothered to check that the
machine was actually receiving power when he power cycled it 43
hours earlier.

Jun 16 12:20	Machine completes its successful boot process.
Since there's no way I'm leaving the machine in a co-lo that can
pull this shit, I shut it down cleanly for the removal.

Jun 16 13:15	Machine is installed and up at my house.  Machine
shows signs of memory wierdness on boot up, will be checking that
later.  No sign of reason for crash in the logs.

-Robin

-- 
Me: http://www.digitalkingdom.org/~rlpowell/  ***   I'm a *male* Robin.
"but I'm not stupid and people are not stupid who think samely with me"
-- from an actual, real, non-spam mail sent to webmaster@lojban.org
http://www.lojban.org/             ***              .i cimo'o prali .ui