ext3 + fs > 2Tbyte

Fri Nov 4 05:19:00 UTC 2005

>> No files were written to the filesystem during the test sequence.
>
> Hmm, I would expect at least the need to write something to the filesystem,
> unless you are unlucky enough that the last group(s) aliases exactly over
> the first superblock on disk, but is kept in the cache enough to remount
> it before you reboot.

ok, I can add that to the scripts in my next round of tests.

> Do you only use the parted "mkfs" or do you actually use the mke2fs 
> from e2fsprogs? 
The script does this
   parted -s /dev/sdb1 print
   parted -s /dev/sdb1 mklabel gpt
   parted -s /dev/sdb1 print
   parted -s /dev/sdb1 mkpart primary 0 10
   parted -s /dev/sdb1 print
   parted -s /dev/sdb1 mke2fs 1 ext2
   parted -s /dev/sdb1 print

I did not try mke2fs before now because I don't think it worked when
I was trying to make FS larger than 2Tb. Can't recall now.

> If you just to the mke2fs + reboot + mount does that fail?

Yes. While you were typing,
  * I made a teeny 10 Mbyte filesystem (using parted, as above)
  * mounted
  * umounted
  * ran findsuper and od
  * reboot
  * ran parted /dev/sdb1 print
    (repeated, using strace)
  * ran an straced e2fsck /dev/sdb1
and got the same error.

I couldn't quite believe this so I tried it again. Same result.
Post reboot, I did things in slightly different order:

  * strace e2fsck -n /dev/sdb1
  e2fsck 1.38 (30-Jun-2005)
  Couldn't find ext2 superblock, trying backup blocks...
  /local/sbin/e2fsck: Bad magic number in super-block while trying to open 
/dev/sdb1

  The superblock could not be read or does not describe a correct ext2
  filesystem.  If the device is valid and it really contains an ext2
  filesystem (and not swap or ufs or something else), then the superblock
  is corrupt, and you might try running e2fsck with an alternate
  superblock:
     e2fsck -b 8193 <device>

  * /local/sbin/parted /dev/sdb print
  Disk geometry for /dev/sdb: 0.000-2289288.000 megabytes
  Disk label type: gpt
  Minor    Start       End     Filesystem  Name                  Flags
  1          0.017     10.000  ext2
  Information: Don't forget to update /etc/fstab, if necessary.

> Same with just the tune2fs -j + reboot + remount?

I switched to using mke2fs to create the filesystem, ie
  * I made a teeny 10 Mbyte partition (using parted)
  * mke2fs /dev/sdb1
  * mounted
  * umounted
  * ran findsuper and od
  * reboot
  * strace -o strace.e2fsck.postboot /local/sbin/e2fsck -n /dev/sdb1
  e2fsck 1.38 (30-Jun-2005)
  Couldn't find ext2 superblock, trying backup blocks...
  /local/sbin/e2fsck: Bad magic number in super-block while trying to open 
/dev/sdb1

  The superblock could not be read or does not describe a correct ext2
  filesystem.  If the device is valid and it really contains an ext2
  filesystem (and not swap or ufs or something else), then the superblock
  is corrupt, and you might try running e2fsck with an alternate
  superblock:
     e2fsck -b 8193 <device>

So it is starting to look like the GPT disklabel is causing a problem.

I switched to having parted make a msdos disklabel but kept everything
else the same - it worked fine.
  # strace -o strace.e2fsck.postboot /local/sbin/e2fsck -n /dev/sdb1
  e2fsck 1.38 (30-Jun-2005)
  /dev/sdb1: clean, 11/2000 files, 268/8000 blocks
  #

>> findsuper tells me there are superblocks, but fs_blk_sz changes (!?)
>
> These are remnants of previous filesystems on the device, each with
> slightly different offsets (maybe with and without a partition table,
> or with different partition types).  In one case there was a small
> 1kB block filesystem on the disk in the past.

ah, of course. I thought findsuper would respect the partition boundaries
and stop at the "end" of the filesystem. It did that pre-reboot, e.g. my
10Mbyte test above
   starting at 0, with 512 byte increments
        thisoff     block fs_blk_sz  blksz grp last_mount
           1024         1     10223  1024    0 Thu Jan  1 10:00:00 1970
        8389632      8193     10223  1024    1 Thu Jan  1 10:00:00 1970

       10468864: finished with errno 0

Post-reboot, I get this:
   starting at 0, with 512 byte increments
        thisoff     block fs_blk_sz  blksz grp last_mount
          17920        17     10223  1024    0 Thu Jan  1 10:00:00 1970
        8406528      8209     10223  1024    1 Thu Jan  1 10:00:00 1970
      134235648    131089 511999995  4096    1 Thu Jan  1 10:00:00 1970
      209733120    204817   1023983  1024   25 Thu Jan  1 10:00:00 1970
      226510336    221201   1023983  1024   27 Thu Jan  1 10:00:00 1970

To clean things up, I suppose I could dd /dev/zero into /dev/sdb?
It'll only take about 10 hours..

>> # /root/e2fsprogs-1.38/misc/findsuper /dev/sdb1
>> starting at 0, with 512 byte increments
>>        thisoff     block fs_blk_sz  blksz grp last_mount
>>          17920        17 586057719  4096    0 Thu Jan  1 10:00:00 1970
>
> What is missing is the superblock at offset "1024".  What this tool
> _should_ also print out is part of the superblock UUID so it is possible
> to say which superblocks belong to a single filesystem.
>
> With an ext3 filesystem you will also find copies of the superblock in
> the journal, they will all be marked "grp 0" and are not valid backups.

ok, thanks for explaining this.

> There appear to be 2 filesystems of interest.  One has offset 0x4200 = 16896,
> but is missing the primary superblock.  The other has offset 0x4600 = 17920.
> Neither of these would allow you to mount the filesystem as-is, because the
> superblock is not aligned at 1024 bytes from the start of the device.
>
> I would suspect something wacky with the partitioning and/or the way that
> parted is making the filesystem.

Most of this just the history of the fs creation tests I did I guess.
Remeber all these are just test filesystems on separate hardware.
I have not dared to run findsuper on the filesystem of interest yet,
I want to make sure I can actually recover a test FS first.

>> So I tried a few e2fsck runs. I know I'm probably being dense but none
>> of these worked:
>> e2fsck -n -b 16        -B 4096 /dev/sdb1
>> e2fsck -n -b 17        -B 4096 /dev/sdb1
....
>
> No, I'd expect you need to do something with the device partitioning
> to get the filesystem aligned properly.  They aren't even aligned on
> a block boundary, there is a 512-byte offset.

I noticed that when computing thisoff/blksz, but didn't make much of it.
Thanks for clearing that up.
I'll take a look at the manuals to see if I can force things to be
on a block boundary.

> I would recommend to do the following:
> - make a partition
> - reboot the system
> - use mke2fs -j to make the filesystem
> - test mount, unmount, reboot at this point

This reboot-after-partition thing is foreign to me (coming from solaris); 
it seems quite a poor design to need this. But let's run with it.

   parted -s /dev/sdb1 print
   parted -s /dev/sdb1 mklabel gpt
   parted -s /dev/sdb1 print
   parted -s /dev/sdb1 mkpart primary 0 10
   parted -s /dev/sdb1 print
   sleep 60
   reboot
   parted -s /dev/sdb1 print
   mke2fs -n -v /dev/sdb1
   mke2fs -q /dev/sdb1
     mke2fs gets stuck...
     I have to ^C it.

   # fdisk -l /dev/sdb
   You must set cylinders.
   You can do this from the extra functions menu.

   Disk /dev/sdb: 0 MB, 0 bytes
   255 heads, 63 sectors/track, 0 cylinders
   Units = cylinders of 16065 * 512 = 8225280 bytes

      Device Boot      Start         End      Blocks   Id  System
   /dev/sdb1               1      267350  2147483647+  ee  EFI GPT
   Partition 1 has different physical/logical beginnings (non-Linux?):
      phys=(0, 0, 1) logical=(0, 0, 2)
   Partition 1 has different physical/logical endings:
      phys=(1023, 254, 63) logical=(267349, 89, 4)

   # /local/sbin/parted /dev/sdb print
   Error: The primary GPT table is corrupt, but the backup appears ok, so
   that will be used.
   OK/Cancel? C
   Information: Don't forget to update /etc/fstab, if necessary.

   # /local/sbin/parted /dev/sdb print
   Error: The primary GPT table is corrupt, but the backup appears ok, so
   that will be used.
   OK/Cancel? OK
   Disk geometry for /dev/sdb: 0.000-2289288.000 megabytes
   Disk label type: gpt
   Minor    Start       End     Filesystem  Name                  Flags
   1          0.017     10.000  ext2
   Information: Don't forget to update /etc/fstab, if necessary.

   # strace -o strace.e2fsck.post-parted /local/sbin/e2fsck -n /dev/sdb1
   e2fsck 1.38 (30-Jun-2005)
   Couldn't find ext2 superblock, trying backup blocks...
   /local/sbin/e2fsck: Bad magic number in super-block while trying to open
   /dev/sdb1

   The superblock could not be read or does not describe a correct ext2
   filesystem.  If the device is valid and it really contains an ext2
   filesystem (and not swap or ufs or something else), then the superblock
   is corrupt, and you might try running e2fsck with an alternate
   superblock:
     e2fsck -b 8193 <device>

So it appears that support is lacking for GPT disklabels in e2fsprogs
and possibly the kernel as well.

I ran one more time,
   partition with parted, gpt label.
   reboot
   make 10Mbyte ext2 fs with parted
   mount, umount, findsuper, od - all this seems to work ok.
   reboot
   attempt to mount
    mount -text2 /dev/sdb1 /tmp/a
    mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
        or too many mounted file systems
        (aren't you trying to mount an extended partition,
        instead of some logical partition inside?)

I think this says there is something funky with the GPT disklabelling.

Thanks for your help,
Vince