On Sat, 11 Mar 2006 03:42:32 +0000, Roy Schestowitz wrote:
> I was initially going to suggest that you consider hardware failures, before
> assuming a software-related and thus easily-fixable error.
Considering the relative scope of the software, not so much in complexity
as in sheer rate of processing data, it wasn't unreasonable to consider
it a possibility. However, it's been quite conclusively determined to be
hardware - just not, quite yet, the specifics.
>>> the same exact hardware - not, say, drop the drives into another box -
>>> read the data off the drive, dump it out over the NICs and see what
> That can indeed be an issue. You should make the habit of copying your
> entire hard-drive (if not just important 'lumps' from the tree) onto a
> second hard-drive for exactly that reason.
Well, for one thing, this is a SCSI-based RAID setup. Short of a
controller failure which pooches both drives at the same time, this
shouldn't be much of an issue.
That said, the important bits - the code - were developed on a different
machine, and deployed to that one, so complete meltdown, while annoying,
> Backups can easily be cronned (GUI if you prefer), or at least be made
I do that now, for the "live" data. At most we'll lose 5 minutes' worth.
>>> NICs, the video card, everything in the machine. Works like a charm.
>>> It also lets me mount the drives read-only, so I can access them but
>>> not write to them.
> Excellent. So you should be able to SCP all the data to another machine,
> at least until the hardware issue is resolved.
Yup. However, I'm not too worried about the data at this point; virtually
all of it is already backed up. It's more the extra time in reinstalling.
Then again, chances are I can't trust the current state anyways, because
if there was a controller failure and a disk scrambling, as there seems
to have been, who knows the current state? Sigh.
> Linux saves the day. Less ironically, it saves itself from itself or
> *rather*, it saves the computer from its own hardware issues.
Allows itself to be saved by itself, if you will... despite failing
> I believe
> the best one could ever do with Windows is 'ghost' it occasionally. For
> large heaps of data, this can consume many CD's or require a variety of
> pricey peripherals.
Indeed. Now where, exactly, is the configuration file and lease data for
the DHCP server, so I can simply copy the files to another machine, fire
up DHCP, and have the exact same state? Without trying to boot the flaky
machine's GUI and pull the data out of the registry that way?
> When I was younger (and still using Windows), I dreaded the day when the
> O/S would refuse to boot, let alone the day when a hardware failure
> would lead to data loss. Frankly, with Linux I have full confidence.
> Data is resilient.
What kills me is when Windows won't even boot properly off the CD in
recovery mode. Fire up recovery console, tell it which partition to try
to recover... complete system hang. Exactly the same thing as booting off
the HD directly. What's the point in a recovery mode, if it won't let you
At least with Linux, I know I can always boot a "recovery CD" - a LiveCD -
which will let me mount the drives and get the data off them, unless the
drives themselves are completely fried - even if it means dropping the
drives into a different machine. I have no such expectation in Windows.
> Speaking of the need to back up, this machine that I currently use has
> been up and running uninterruptedly for two and a half years. From the
> point of view of the O/s alone, it is less prone to breakage than its
> counterparts, which I used in the past.
My systems don't tend to stay up that long, but that's due to my fiddling,
not the OS. That said, one of our servers here just recently got
rebooted; it had been up something around a year plus. Got another one
that's been chugging away merrily, handling something on the order of
several gigs worth of traffic every day, for what, 279 days? Something
like that; I posted the actual figure a little earlier in another thread.
Set 'em up, lock 'em down, apply the occasional security fix to the actual
services being used, and otherwise just forget about 'em. :)
>> Good luck with your NIC. At least with Linux (or Unix) you should have
>> a fighting chance. :-)
Fighting is right, but as I said, seems to be fighting more with the
hardware than anything.
MS, because work should be measured by effort, rather than result.