F10+dmraid eats puppies! (and ate my system too)

Bill Davidsen davidsen at tmr.com
Sat Dec 13 18:16:04 UTC 2008


Graham TerMarsch wrote:
> I ran into this earlier in the week and after finally getting my machine back 
> online am surprised to see that people aren't making a big stink about 
> this... its got subtle nuances that make it nearly impossible to fix without 
> loss of data.
> 
> I've found the following threads/bugs that appear related:
> 
>  https://bugzilla.redhat.com/show_bug.cgi?id=474697
>  http://forums.fedoraforum.org/showthread.php?t=206206
>  http://forums.fedoraforum.org/showthread.php?t=206284
> 
> Here's what happened to me...
> 
> I upgraded from F9 to F10 back on Nov 29th, and things seemed fine.  I 
> upgraded the kernel last Wednesday, rebooted, and started seeing all sorts of 
> crazy weirdness.  At first the system wouldn't boot at all, dying on errors 
> of "killing init" and "corrupted libraries".  I thought it sounded like FS 
> corruption, so I booted the rescue CD, ran fsck (which came back clean), and 
> then proceeded to re-install some of the packages with the corrupted 
> libraries, so I could at least get the machine up and running again.
> 
> After several cycles of "rescue CD, install packages, reboot, fail", I 
> decided that even if I could get it running I wasn't going to trust it.  Went 
> back to the rescue CD, and started backing up files onto other machines on 
> the network here.
> 
> I then re-installed the machine, leaving my "/home" and "/usr/local" 
> partitions as they were; reformatted everything else, but left those alone.  
> Got the system up, but was then presented with the most shocking thing... it 
> looked like my machine had basically done time-travel and was now *exactly* 
> as it was on November 29th.  Files I know I'd edited were missing changes, e-
> mails were lost, databases were missing data.
> 
> Took me a while to figure it out, but here's what happened...
> 
> When I upgraded from F9 to F10, Anaconda detected my nvidia dmraid mirror and 
> installed F10 onto both halves of the mirror.  When I rebooted, though, it 
> only picked up *ONE HALF* of the mirror... /dev/sda.  It had the UUIDs right, 
> but it didn't mount /device/mapper/nvidia_xxxx but mounted sda instead.  When 
> I did the kernel upgrade this week, *that* mounted sdb.  When I reinstalled, 
> it *also* mounted sdb, not sda or dmraid.
> 
> When I looked at sda directly, I saw all of my recent changes to files that 
> I'd made since the 29th.  When I looked at sdb directly, it was a snapshot of 
> what my machine looked like on the 29th.
> 
> When we actually manage to get the bug fixed that caused this, anyone who's 
> had this problem is potentially going to be in for a bigger world of hurt 
> when applying the fix... I don't even think we can (with confidence) just 
> nuke one half of the mirror and rebuild based on whats on the other half; how 
> do we know which half they've been using?  In my case, I'd made ~2wks of 
> changes to sda not knowing that I was only using half the mirror, and then 
> after updating the kernel got bumped over to sdb and made changes there while 
> trying to fix it.  Neither one was a mirror of the other, and each one had 
> something on it that needed to be preserved.  YUCK.
> 
> Once I realized what'd happened to my machine I went into the BIOS and turned 
> off the nvidia fakeraid and re-installed directly onto the two drives.  Isn't 
> what I want as I'd at least like to have _some_ mirror of my data somewhere, 
> but it was the only way I could get this machine running again.
> 
> Be forewarned.... F10+dmraid is *DANGEROUS* right now...
> 
My perception is that using mdadm is a more reliable technology at the moment.

-- 
Bill Davidsen <davidsen at tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot




More information about the fedora-list mailing list