Software RAID problem

Fri Jul 14 18:00:24 UTC 2006

On Thu, 13 Jul 2006 15:06:03 -0700, "Rick Stevens"
<rstevens at vitalstream.com> said:
> On Thu, 2006-07-13 at 14:07 -0600, redhat at buglecreek.com wrote:
> > We have a critical system that has Redhat 8.0 installed.  The system
> > uses the older raidtools not mdadm. We are in the process of rebuilding
> > a new box, but in the meantime we have a software raid issue.  The
> > system had to be rebooted and we ended up with the following raid
> > problem: 
> > cat /proc/mdstat shows: 
> > 
> > Personalities : [raid0] [raid1]
> > read_ahead 1024 sectors
> > md1 : active raid1 hda2[0]
> >       119684160 blocks [2/1] [U_]
> > 
> > md2 : active raid0 hda3[0] hdb2[1]
> >       208640 blocks 64k chunks
> > 
> > md0 : active raid1 hda1[0] hdb1[1]
> >       264960 blocks [2/2] [UU]
> > 
> > Looks like we have a problem with md1 device which is the / partition.
> > lsraid -A -a /dev/md1 shows:
> > 
> > [dev   9,   1] /dev/md1         C27DAE7E.7C02AF01.5143DCC8.62FD07C3
> > online
> > [dev   3,   2] /dev/hda2        C27DAE7E.7C02AF01.5143DCC8.62FD07C3 good
> > [dev   ?,   ?] (unknown)        00000000.00000000.00000000.00000000
> > missing
> > 
> > The applicable section of /etc/raidtab is:
> > 
> > raiddev             /dev/md1
> > raid-level                  1
> > nr-raid-disks               2
> > chunk-size                  64k
> > persistent-superblock       1
> > nr-spare-disks              0
> >     device          /dev/hda2
> >     raid-disk     0
> >     device          /dev/hdb3
> >     raid-disk     
> > 
> > It seems that /dev/hdb3 has issues.  Is there a way to get /dev/hdb3
> > back online.  Can you do something with raidhotadd:
> > raidhotadd /dev/md1 /dev/hdb3
> > 
> > This is a very critical system and I want to make sure we don't do
> > anything that would totally bring the system down, at least until we can
> > build a new system.  Any help would be appreciated.
> 
> The FIRST thing you do is back up /dev/md1 (or what's left of it) in
> case the remediation doesn't work or does something evil (it shouldn't).
> And you can continue to run in the degraded state.
> 
> You can use raidhotadd to try to bring the drive back into the fold, but
> it may not join if the drive is indeed defective.  Try the raidhotadd,
> then check /proc/mdstat again.  If you see a "(F)" following the
> "hdb3[1]" bit, the drive failed.  That doesn't mean the drive is fried,
> but SOMETHING is wrong.
> 
> Try to raidhotremove the drive from the RAID, then run badblocks on the
> partition in question (/dev/hdb3).  When it completes, try the
> raidhotadd again and see if it joins and starts the resync.
> 
> Probably none of my business, but why is such a critical machine still
> running RH8?  RH8.0 is farking ancient and, IMHO, the absolute worst
> release of RH ever...which is why RH9 came out so quickly after it.
> 
> ----------------------------------------------------------------------
> - Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
> - VitalStream, Inc.                       http://www.vitalstream.com -
> -                                                                    -
> -      A day for firm decisions!!!   Well, then again, maybe not!    -
> ----------------------------------------------------------------------
> 
> _______________________________________________
> Redhat-install-list mailing list
> Redhat-install-list at redhat.com
> https://www.redhat.com/mailman/listinfo/redhat-install-list
> To Unsubscribe Go To ABOVE URL or send a message to:
> redhat-install-list-request at redhat.com
> Subject: unsubscribe

Thanks Rick,

I knew that using RH8.0 would raise a few eyebrows, but due to personnel
changes etc it slipped through the cracks.  Anyway, when you use the
raidhotremove command can you execute it on a partition like this:
raidhotremove /dev/md1 /dev/hdb3 ?  Just like raidhotadd?  For the
badblocks command, simply run badblocks /dev/hdb3?

Thanks