ext3 + fs > 2Tbyte

Andreas Dilger adilger at clusterfs.com
Fri Nov 4 02:35:47 UTC 2005


On Nov 04, 2005  12:17 +1100, Vincent McIntyre wrote:
> * boot with xraid device plugged in, kernel 2.6.7-1-686-smp
>     (packaged as 2.6.7-1.backports.org.1)
> * install a gpt disklabel with parted (-1.6.24 rather than 1.6.19)
> * make an ext2 filesystem as big as the disk with parted
> * mount - it mounts ok
> * umount
> * tune2fs -j (-1.38)
> * mount - it mounts ok (-2.12)
> * umount (-2.12)
> * reboot
> * try to mount - it fails.
>     (the filesystem is not mentioned in /etc/fstab, the system should
>      not be attempting to mount it of fsck it at boot time)
> 
> No files were written to the filesystem during the test sequence.

Hmm, I would expect at least the need to write something to the filesystem,
unless you are unlucky enough that the last group(s) aliases exactly over
the first superblock on disk, but is kept in the cache enough to remount
it before you reboot.

If you just to the mke2fs + reboot + mount does that fail?  Same with
just the tune2fs -j + reboot + remount?  Do you only use the parted
"mkfs" or do you actually use the mke2fs from e2fsprogs?

> I have not yet tried filesystems smaller than 2Tb across reboots.
> I expect it will work, but I will try that shortly to check.
> 
> 
> findsuper tells me there are superblocks, but fs_blk_sz changes (!?)

These are remnants of previous filesystems on the device, each with
slightly different offsets (maybe with and without a partition table,
or with different partition types).  In one case there was a small
1kB block filesystem on the disk in the past.

> # /root/e2fsprogs-1.38/misc/findsuper /dev/sdb1
> starting at 0, with 512 byte increments
>        thisoff     block fs_blk_sz  blksz grp last_mount
>          17920        17 586057719  4096    0 Thu Jan  1 10:00:00 1970

What is missing is the superblock at offset "1024".  What this tool
_should_ also print out is part of the superblock UUID so it is possible
to say which superblocks belong to a single filesystem.

With an ext3 filesystem you will also find copies of the superblock in
the journal, they will all be marked "grp 0" and are not valid backups.

>      134234624    131088 586057719  4096    1 Thu Jan  1 10:00:00 1970
>      134235648    131089 586057719  4096    1 Thu Jan  1 10:00:00 1970
>      209733120    204817   1023983  1024   25 Thu Jan  1 10:00:00 1970
>      226510336    221201   1023983  1024   27 Thu Jan  1 10:00:00 1970
>      402670080    393232 586057719  4096    3 Thu Jan  1 10:00:00 1970
>      402671104    393233 586057719  4096    3 Thu Jan  1 10:00:00 1970
>      411059712    401425   1023983  1024   49 Thu Jan  1 10:00:00 1970
>      671105536    655376 586057719  4096    5 Thu Jan  1 10:00:00 1970
>      671106560    655377 586057719  4096    5 Thu Jan  1 10:00:00 1970
>      679495168    663569   1023983  1024   81 Thu Jan  1 10:00:00 1970
>      939540992    917520 586057719  4096    7 Thu Jan  1 10:00:00 1970
>      939542016    917521 586057719  4096    7 Thu Jan  1 10:00:00 1970
>     1207976448   1179664 586057719  4096    9 Thu Jan  1 10:00:00 1970
>     1207977472   1179665 586057719  4096    9 Thu Jan  1 10:00:00 1970
>     3355460096   3276816 586057719  4096   25 Thu Jan  1 10:00:00 1970
>     3355461120   3276817 586057719  4096   25 Thu Jan  1 10:00:00 1970
>     3623895552   3538960 586057719  4096   27 Thu Jan  1 10:00:00 1970
>     3623896576   3538961 586057719  4096   27 Thu Jan  1 10:00:00 1970
>     6576685568   6422544 586057719  4096   49 Thu Jan  1 10:00:00 1970
>     6576686592   6422545 586057719  4096   49 Thu Jan  1 10:00:00 1970
>    10871652864  10616848 586057719  4096   81 Thu Jan  1 10:00:00 1970
>    10871653888  10616849 586057719  4096   81 Thu Jan  1 10:00:00 1970
>    16777232896  16384016 586057719  4096  125 Thu Jan  1 10:00:00 1970
>    16777233920  16384017 586057719  4096  125 Thu Jan  1 10:00:00 1970
> ^C
> This is not looking good...

There appear to be 2 filesystems of interest.  One has offset 0x4200 = 16896,
but is missing the primary superblock.  The other has offset 0x4600 = 17920.
Neither of these would allow you to mount the filesystem as-is, because the
superblock is not aligned at 1024 bytes from the start of the device.

I would suspect something wacky with the partitioning and/or the way that
parted is making the filesystem.

> Your nice od trick tells me slightly different locations for the
> superblock signatures -
> # od -Ax -tx4 /dev/sdb1 | \
>   grep "^[0-9a-f]*30 [0-9a-f]* [0-9a-f]* 000[1-3]ef53 "
> 004630 436a93dd 001e0000 0001ef53 00000001
> 8004630 00000000 001e0000 0001ef53 00000001
> c804630 00000000 001e0000 0001ef53 00000001
> d804630 00000000 001e0000 0001ef53 00000001
> 18004630 00000000 001e0000 0001ef53 00000001
> ^C
> 
> 0x004630 corresponds to byte offset 17968, 48 bytes away.
> Is this explainable by the position of the superblock signature within
> the disk block?

Yes, this hack is only looking for the ext[23] magic number, which is not
at the start of the superblock (0x30 = 48 bytes offset).

> So I tried a few e2fsck runs. I know I'm probably being dense but none
> of these worked:
> e2fsck -n -b 16        -B 4096 /dev/sdb1
> e2fsck -n -b 17        -B 4096 /dev/sdb1
> e2fsck -n -b 18        -B 4096 /dev/sdb1
> e2fsck -n -b 204816    -B 1024 /dev/sdb1
> e2fsck -n -b 204817    -B 1024 /dev/sdb1
> e2fsck -n -b 204818    -B 1024 /dev/sdb1
> e2fsck -n -b 221200    -B 1024 /dev/sdb1
> e2fsck -n -b 221201    -B 1024 /dev/sdb1
> e2fsck -n -b 221202    -B 1024 /dev/sdb1
> e2fsck -n -b 1179664   -B 4096 /dev/sdb1
> e2fsck -n -b 1179665   -B 4096 /dev/sdb1
> e2fsck -n -b 6422544   -B 4096 /dev/sdb1
> e2fsck -n -b 6422545   -B 4096 /dev/sdb1
> e2fsck -n -b 10616848  -B 4096 /dev/sdb1
> e2fsck -n -b 10616849  -B 4096 /dev/sdb1

No, I'd expect you need to do something with the device partitioning
to get the filesystem aligned properly.  They aren't even aligned on
a block boundary, there is a 512-byte offset.

> (The e2fsck manpage could be a tiny bit clearer in that - I think -
>  it means you to use -b <blocknumber>, not -b <offset_to_superblock>)

Send a patch to Ted.

I would recommend to do the following:
- make a partition
- reboot the system
- use mke2fs -j to make the filesystem
- test mount, unmount, reboot at this point

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.




More information about the Ext3-users mailing list