Now What do you do when you blow a Raid 1 disk?

Tue Nov 18 08:58:16 UTC 2008

Neal Rhodes wrote:
> So, software Raid 1 in Fedora is just the bee's knees.   Until a drive
> actually fails.    Then it's not so much.   How do you get out of this
> swamp? 
> 
> What are the steps to take out a dead drive, stuff in a brand new
> identical disk drive, and get the Raid back going again?   In my case,
> the system still boots, and has /,  /boot, and /u as three Raid1
> filesystems.    Each filesystem is running degraded. 

Use "cat /proc/mdstat" or "mdadm --detail /dev/mdN" to determine which 
drive failed, where mdN is the wonky array. Then run:

   mdadm /dev/mdN --remove /dev/that/has/failed

To remove the failed drive, then replace it with the same size or larger 
and run:

   mdadm /dev/mdN --add /dev/shiny/new/replacement/drive

Then check the resync has begun via mdstat/mdadm. You can also monitor 
its progress with something like:

   watch -n5 cat /proc/mdstat

> Let's presume my 2nd drive is toast, and I've got a replacement.  What
> are the steps?    Seems like I can't do anything with mdadm while it's
> up, because the drives are busy.   

Why not? The mdadm command is designed for use in this situation. You 
might want to take a look at the Software RAID how to, which provides a 
lot more detail on manipulating running MD RAID arrays:

http://tldp.org/HOWTO/Software-RAID-HOWTO.html

Although it's not been updated in a while, the basic commands (create, 
assemble, add, remove etc.) really haven't changed in a long time.

Regards,
Bryn.