How are alternate superblocks repaired?

Thu Sep 20 21:36:17 UTC 2007

Hi,

Using dumpe2fs I have been able to determine that all of my alternate ext3 superblocks are corrupted (not clean), and only the primary superblock is valid, i.e. mount works and the ordered journal is applied.  When the primary superblock gets flakey, i.e. the ext_attr Filesystem feature goes missing - not sure why this occurs.  At this point, the mount does not apply the journal using the primary superblock and mount completes without it.  Usually, I will resort to booting up the FC3 OS hard drive on which the ext3 filesystem resides to fix at least the primary superblock via fsck.

This situation is just the reverse of the normal assumptions the kernel and filesystem make in their design, i.e. that the alternate superblocks remain intact when the primary is hosed - not a good place to be, and evidence that the situation can occur.  I do not think that this is a kernel bug, but possibly an omission since it never spawns a kernel process (during idle time) to check the consistency of all of the superblocks in the filesystem, i.e.  self-diagnosing and repair, during idle time - surely this would improve the reliability of the filesystem.  Just a random thought I had while thinking about the problem.

When I have run fsck on boot up of the FC3 OS, that seems to repair the primary superblock, but the alternates are never repaired to be consistent with the primary superblock - that's all fsck ever seems to do.  Why does fsck not repair the alternate superblocks when it has opportunity to do so?  Shouldn't fsck at least detect the inconsistency with the kernel assuptions that alternate superblocks are valid, and only the primary superblock needs to be repaired after something catastrophic occurs?  Shouldn't the inconsistency be reported - at the very least?  Or, shouldn't there be an option to direct fsck to fix alternate superblock inconsistency, if so desired?  One would think so.

The Linux disk in question is an 80GB SATA drive, with an ext3 filesystem where the Filesystem features are: has_journal, ext_attr, filetype, sparse_super with the good primary superblock.

The alternate superblocks all are absent the ext_attr feature, and also, the primary maximum mount count is -1 when the primary superblock goes flakey.

Normally, I do not boot up the FC3 OS, but mount the disk from a Live CD to move data into the Live CD environment.  The FC3 kernel is a 2.6.10-1 version.  The Live CD kernels are either 2.6.15.6 or 2.6.20-16-generic.  Also, the Live CD e2fsprogs are 1.40 WIP for the 2.6.20-16-generic kernel vs. 1.38 for the 2.6.15.6 kernel (both of the 2.6 kernels are not FCn OS).  Interestingly, the problem (flakey primary superblock where the journal is not applied) does not manifest with the 2.6.15.6 kernel Live CD, but only with the 2.6.20-16-generic kernel Live CD which I usually run currently.

Recently, because I do not know the origin of the problem, I have resorted to issuing three sync commands from the Live CD environment after I have moved data to the FC3 ext3 journal filesystem (mounted with -o sync) prior to issuing a umount command.  At least the file system buffers will be flushed.  I do not know if not doing this previously may have contributed to the initial problem of the primary superblock going flakey or not.

Will the command: e2fsck -fp /dev/sdb2 repair the alternate superblocks, and if so, should it only be run from the Live CD environment?  Or, do I need to get into runlevel 1 as single user to issue the command after unmounting the hard drive in order to run it?

Or, will a dd command using skip and seek for the primary and alternate superblocks correct their corruption, as in the following example:

For the purpose of this example, here is a truncated list of the primary and 1st alternative superblocks from the output of the dumpe2fs command: Primary superblock at 0, Group descriptors at 1-5 Backup superblock at 32768, Group descriptors at 32769-32773

Given: FS blocksize=4096; primary superblock at=0; 1st alternative superblock at=32768 and size of superblock=1024 <=== Is this correct???

To copy the 1st backup superblock (assuming it is clean) to fix primary superblock:
# dd if=/dev/sdb2 of=/dev/sdbn bs=1024 skip=32768 count=1

To copy the primary superblock (assuming it is clean) to fix the 1st backup superblock:
# dd if=/dev/sdb2 of=/dev/sdbn bs=1024 seek=32768 count=1

I am leary of using the dd commands to effect the repairs to the alternate superblocks - will they work or hose the filesystem completely?  My guess is that they will hose the filesystem completely.  Is this correct, and why? 

Also, if the e2fsck -fp /dev/sdb2 command will not repair the alternate superblocks, what tool will - debugfs?  And how do I use it to make the repairs?

In the event that no tool will repair the alternate superblocks, what process can I use to effect the repairs to the alternate superblocks so that they can finally be in accord with the original design assumptions for the kernel and ext3 filesystem (consistent with the primary superblock)?

-- Tom