raid1 mystery & workaround
Paul Johnson
pauljohn32 at gmail.com
Sat Oct 31 17:04:17 UTC 2009
Hi, everybody.
I've found a workaround for a problem with a raid 1
array. I'm posting to share the "solution" and to
ask why this went wrong in the first place.
On an old test machine, I have 2 4-year-old
Seagate IDE drives in raid1 mirror for my home
partition. One failed and Seagate was very very
pleasant to replace it. I didn't even need a receipt!
They just went by the serial number to conclude I
was in warranty.
I followed "the usual" procedure for failed
drives. mdadm marked the drive as a failure and
it was removed from the array. Then I tried to
add it back into the raid 1.
One of the HOWTOs I relied on was this one:
http://www.howtoforge.com/replacing_hard_disks_in_a_raid1_array
Here's what went wrong. After being added, the new drive
went through a long "recovery" process--2 hours--but when
it finished, the new drive was marked as "spare" and the
raid 1 array continued to show only one drive was active.
Every time the system restarts, the new drive tries
to resync itself, it copies for 2 hours, but it never
enters the array. It is always spare.
In the end, gave up trying to fix /dev/md0.
I "guessed" a solution--create a new /dev/md1
device and refit the system to use that. I explain that
fix below, in case the same problem hits other
people.
But I'm still curious to know why it did not work.
Now the details:
The raid1 array was /dev/md0 and it used disks sdb1
and sdc1 and the one that failed was sdb1.
Here's what I saw while the new drive was being added:
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdc1[1] sdb1[2]
244195904 blocks [2/1] [_U]
[==================>..] recovery = 94.4% (230658240/244195904)
finish=6.9min speed=32396K/sec
unused devices: <none>
# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : 37e6e9b6:34cdfcb2:63afba50:8b88d6fc
Creation Time : Sat Aug 18 19:10:40 2007
Raid Level : raid1
Used Dev Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Thu Oct 29 00:35:50 2009
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Checksum : a557d3b3 - correct
Events : 6874
Number Major Minor RaidDevice State
this 2 8 17 2 spare /dev/sdb1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 spare /dev/sdb1
After the rebuild was done, here's the situation: the
new drive is a spare:
# mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.00
UUID : 37e6e9b6:34cdfcb2:63afba50:8b88d6fc
Creation Time : Sat Aug 18 19:10:40 2007
Raid Level : raid1
Used Dev Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Thu Oct 29 00:35:50 2009
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Checksum : a557d3c7 - correct
Events : 6874
Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 spare /dev/sdb1
# mdadm --query /dev/md0
/dev/md0: 232.88GiB raid1 2 devices, 1 spare. Use mdadm --detail for
more detail.
# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Sat Aug 18 19:10:40 2007
Raid Level : raid1
Array Size : 244195904 (232.88 GiB 250.06 GB)
Used Dev Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Oct 29 00:35:50 2009
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 97% complete
UUID : 37e6e9b6:34cdfcb2:63afba50:8b88d6fc
Events : 0.6874
Number Major Minor RaidDevice State
2 8 17 0 spare rebuilding /dev/sdb1
1 8 33 1 active sync /dev/sdc1
After that, rebuilding seems finished:
# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdc1[1] sdb1[2]
244195904 blocks [2/1] [_U]
But I have only 1 drive in the active array:
# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Sat Aug 18 19:10:40 2007
Raid Level : raid1
Array Size : 244195904 (232.88 GiB 250.06 GB)
Used Dev Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Thu Oct 29 00:43:21 2009
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
UUID : 37e6e9b6:34cdfcb2:63afba50:8b88d6fc
Events : 0.6880
Number Major Minor RaidDevice State
2 8 17 0 spare rebuilding /dev/sdb1
1 8 33 1 active sync /dev/sdc1
# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : 37e6e9b6:34cdfcb2:63afba50:8b88d6fc
Creation Time : Sat Aug 18 19:10:40 2007
Raid Level : raid1
Used Dev Size : 244195904 (232.88 GiB 250.06 GB)
Array Size : 244195904 (232.88 GiB 250.06 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Update Time : Thu Oct 29 00:44:02 2009
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Checksum : a557d5af - correct
Events : 6882
Number Major Minor RaidDevice State
this 2 8 17 2 spare /dev/sdb1
0 0 0 0 0 removed
1 1 8 33 1 active sync /dev/sdc1
2 2 8 17 2 spare /dev/sdb1
I tried a lot of ways to set this right.
I tried "grow" the array, set the number of spares
to 0, and so forth. No success.
After a lot of tries, I gave up trying to get /dev/md0 to work.
So I stopped it, and the used the "--assume-clean" option to
create a new array on md1. I found that suggestion here
http://neverusethisfont.com/blog/tags/mdadm/
# mdadm -S /dev/md0
# mdadm --create --assume-clean --level=1 --raid-devices=2 /dev/md1
/dev/sdc1 /dev/sdb1
That works! So I just needed to reset the configuration
to use that. First, grab the metadata
# mdadm --detail --scan
ARRAY /dev/md1 metadata=0.90 UUID=6a408f8b:515f605f:bfe78010:bc810f04
And revise the mdadm.conf file
# cat /etc/mdadm.conf
DEVICE /dev/sdb1 /dev/sdc1
ARRAY /dev/md1 level=raid1 num-devices=2
UUID=6a408f8b:515f605f:bfe78010:bc810f04 devices=/dev/sdc1,/dev/sdb1
And I changed /etc/fstab to point at md1, not md0.
But why did /dev/md0 hate me in the first place?
I wonder if it was personal :(
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the fedora-list
mailing list