botched RAID, now e2fsck or what?

Thu Dec 10 20:41:32 UTC 2009

On 2009-12-10, at 13:30, Lucian Șandor wrote:
> 2009/12/10 Andreas Dilger <adilger at sun.com>:
>>
>> Using "od -Ax -tx4" on a regular ext3 filesystem you can see the  
>> group descriptor table starting at offset 0x1000, and the block  
>> numbers basically just "count" up.  This may in fact be the easiest  
>> way to order the disks, if the group descriptor table is large  
>> enough to cover all of the disks:
>>
>> # od -Ax -tx4 /dev/hda1 | more
>> :
>> :
>> 001000 0000012c 0000012d 0000012e 02430000
>> 001010 000001f2 00000000 00000000 00000000
>> 001020 0000812c 0000812d 0000812e 2e422b21
>> 001030 0000000d 00000000 00000000 00000000
>> 001040 00010000 00010001 00010002 27630074
>> 001050 000000b8 00000000 00000000 00000000
>> 001060 0001812c 0001812d 0001812e 27a70b8a
>> 001070 00000231 00000000 00000000 00000000
>> 001080 00020000 00020001 00020002 2cc10000
>> 001090 00000008 00000000 00000000 00000000
>> 0010a0 0002812c 0002812d 0002812e 25660134
>> 0010b0 00000255 00000000 00000000 00000000
>> 0010c0 00030000 00030001 00030002 17a50003
>> 0010d0 000001c6 00000000 00000000 00000000
>> 0010e0 0003812c 0003812d 0003812e 27a70000
>> 0010f0 00000048 00000000 00000000 00000000
>> 001100 00040000 00040001 00040002 2f8b0000
>>
>> See nearly regular incrementing sequence every 0x20 bytes:
>>
>> 0000012c, 0000812c, 00010000, 0001812c, 00020000, 0002812c, 00030000,
>> 0003812c
>>
>>
>> Each group descriptor block (4kB = 0x1000) covers 16GB of  
>> filesystem space, so  64 blocks per 1TB of filesystem size.  If  
>> your RAID chunk size is not too large, and the filesystem IS large,  
>> you will be able to fully order your disks in the RAID set.  You  
>> can also verify the RAID chunk size by determining how many blocks  
>> of consecutive group descriptors are present before there is a  
>> "jump" where the group descriptor blocks were written to other  
>> disks before returning to the current disk.  Remember that one of  
>> the disks in the set will also need to store parity, so there will  
>> be some number of "garbage" blocks before the proper data resumes.
>
> This seems a great idea. The 4.5 TB array is huge (should have a 1100
> kB table), and likely its group descriptor table extends on all
> partitions. I already found the pattern, but the job requires
> programming, since it would be troubling to read megs of data over the
> hundreds of permutations. I will try coding it, but I hope that
> somebody else wrote it before. Isn't there any utility that will take
> a group descriptor table and verify its integrity without modifying
> it?

I think you are going about this incorrectly...  Run the "od" command  
on the raw component drives (e.g. /dev/sda, /dev/sdb, /dev/sdc, etc),  
not on the assembled MD RAID array (e.g. NOT /dev/md0).

The data blocks on the raw devices will be correct, with every 1/N  
chunks of space being used for parity information (so will look like  
garbage).  That won't prevent you from seeing the data in the group  
descriptor table and allowing you to see the order in which the disks  
are supposed to be AND the chunk size.

Since the group descriptor table is only a few kB from the start of  
the disk (I'm assuming you used whole-disk devices for the MD array,  
instead of DOS partitions) you can just use "od ... | less" and your  
eyes to see what is there.  No programming needed.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.