From shweta.vichare at tcs.com  Mon Feb  2 14:56:25 2009
From: shweta.vichare at tcs.com (Shweta Vichare)
Date: Mon, 2 Feb 2009 20:26:25 +0530
Subject: Query on EXT3 Online Resize
Message-ID: <OF451BBA25.0F8E958E-ON65257551.00521205-65257551.0052121A@tcs.com>


Hello,

Do we have any handy patch for onlne resize of EXT3 filesystems using
e2fsprogs 1.41 ( resize2fs ) for user space 32 bit and kernel space 64 bit

Shweta

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


From tytso at mit.edu  Mon Feb  2 15:01:27 2009
From: tytso at mit.edu (Theodore Tso)
Date: Mon, 2 Feb 2009 10:01:27 -0500
Subject: Query on EXT3 Online Resize
In-Reply-To: <OF451BBA25.0F8E958E-ON65257551.00521205-65257551.0052121A@tcs.com>
References: <OF451BBA25.0F8E958E-ON65257551.00521205-65257551.0052121A@tcs.com>
Message-ID: <20090202150127.GA14762@mit.edu>

On Mon, Feb 02, 2009 at 08:26:25PM +0530, Shweta Vichare wrote:
> 
> Do we have any handy patch for onlne resize of EXT3 filesystems using
> e2fsprogs 1.41 ( resize2fs ) for user space 32 bit and kernel space 64 bit

What specific version of e2fsprogs and (more importantly) the kernel
are you using?  It should Just Work, although there were some
compatibility bugs that were fixed sometime around 2.6.26 or 2.6.27 if
memory serves correctly (hmm... although only for ext4 if memory
serves correctly; I should double check and see if the bug was fixed
for ext3.)  It's not something which gets a lot of testing though, so
it's possible it got broken and no one noticed.

		   		    - Ted


From Mike.Miller at hp.com  Mon Feb  2 15:55:50 2009
From: Mike.Miller at hp.com (Miller, Mike (OS Dev))
Date: Mon, 2 Feb 2009 15:55:50 +0000
Subject: barrier and commit options?
In-Reply-To: <20090130220245.GA27950@mit.edu>
References: <20090130135329.GW20896@petole.demisel.net>
	<alpine.DEB.2.00.0901301614340.4084@bogon.housecafe.de>
	<49831B46.5080202@redhat.com>
	<0F5B06BAB751E047AB5C87D1F77A778859F9DD0800@GVW0547EXC.americas.hpqcorp.net>
	<49831F5E.6000506@redhat.com>
	<0F5B06BAB751E047AB5C87D1F77A778859F9DD0835@GVW0547EXC.americas.hpqcorp.net>
	<498324E7.3000705@redhat.com> <20090130220245.GA27950@mit.edu>
Message-ID: <0F5B06BAB751E047AB5C87D1F77A778859F9E41D5F@GVW0547EXC.americas.hpqcorp.net>

Theodore Tso wrote: 

> 
> Well, we still need the barrier on the block I/O elevantor 
> side to make sure that requests don't get reordered in the 
> block layer.  But what you're saying is that once the write 
> is posted to the array, it is guaranteed that it is on 
> "stable storage" (even if it is BBWC) such that if someone 
> hits the Big Red Switch at the exit to the data center, and 
> power is forcibly cut from the entire data center in case of 
> a fire, the battery will still keep the cache alive, at least 
> until the sprinklers go off, anyway, right?  :-)

That's an accurate accessment. ;-)

> 
> In that case, I suspect the right thing for the cciss array 
> to do is to ignore the barrier, but not to return an error.  

We agree and will fix the IO error.

> If you return an error, and refuse the write with barrier 
> operation (which is what the cciss driver seems to be doing 
> starting in 2.6.29-rcX), ext4 will retry the write without 
> the barrier, at which point we are vulnerable to the block 
> layer reordering things at the I/O scheduler layer.  In 
> effect, you're claiming that every single write to cciss is 
> implicitly a "barrier write" in that once it is received by 
> the device, it is guaranteed not to be lost even if the power 
> to the entire system is forcibly removed.

Of course, we can't cover all possible scenarios like the data center exploding or something crazy. But under _most_ circumstances the data will remain in cache for up to 72 hours of no power. So if there is a complete power outage the controller will write any cached data (in order) to the disks on the next power up.

-- mikem

> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
> 


From Curtis at GreenKey.net  Wed Feb  4 02:23:23 2009
From: Curtis at GreenKey.net (Curtis Doty)
Date: Tue, 3 Feb 2009 18:23:23 -0800 (PST)
Subject: ext4 resize/fsck
Message-ID: <20090204022324.07F876F06C@alopias.GreenKey.net>

Horsing around with ext4 again...on F-10.

This time a fsck was required after both an offline shrink and an online 
grow. Why?

----8<----
13:22]stratus~# resize2fs -M -p /dev/foo/bar
resize2fs 1.41.3 (12-Oct-2008)
Please run 'e2fsck -f /dev/foo/bar' first.

13:22]stratus~# e2fsck -C0 -f /dev/foo/bar
e2fsck 1.41.3 (12-Oct-2008)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
bar: 43186/172800 files (0.2% non-contiguous), 295511/1753088 blocks

13:24]stratus~# resize2fs -M -p /dev/foo/bar
resize2fs 1.41.3 (12-Oct-2008)
Resizing the filesystem on /dev/foo/bar to 426236 (4k) blocks.
Begin pass 2 (max = 101383)
Relocating blocks             XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 3 (max = 54)
Scanning inode table          XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Begin pass 4 (max = 5278)
Updating inode references     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The filesystem on /dev/foo/bar is now 426236 blocks long.

13:25]stratus~# fsck.ext4 -C0 -f /dev/foo/bar
e2fsck 1.41.3 (12-Oct-2008)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(17--18) -(33--34) -(2835--3234)
Fix<y>? yes

Free blocks count wrong for group #0 (0, counted=404).
Fix<y>? yes

Free blocks count wrong (138405, counted=138809).
Fix<y>? yes

bar: ***** FILE SYSTEM WAS MODIFIED *****
bar: 43186/44800 files (0.2% non-contiguous), 287427/426236 blocks
----8<----

Then a bit later, I shrunk the lv to 4G, and then mounted the filesystem 
(just for fun), and finally online expanded the fs into it.

----8<----
15:54]stratus~# lvreduce -L4G foo/bar
   WARNING: Reducing active and open logical volume to 4.00 GB
   THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce bar? [y/n]: y
   Reducing logical volume bar to 4.00 GB
   Logical volume bar successfully resized

15:54]stratus~# resize2fs -p /dev/foo/bar
resize2fs 1.41.3 (12-Oct-2008)
Filesystem at /dev/foo/bar is mounted on /home; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/foo/bar to 1048576 (4k) blocks.
The filesystem on /dev/foo/bar is now 1048576 blocks long.

15:55]stratus~# umount /home
15:55]stratus~# fsck.ext4 -C0 -f /dev/foo/bar
e2fsck 1.41.3 (12-Oct-2008)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Directories count wrong for group #16 (62, counted=0).
Fix<y>? yes

Directories count wrong for group #19 (12, counted=0).
Fix<y>? yes

Directories count wrong for group #25 (1, counted=0).
Fix<y>? yes

Directories count wrong for group #26 (2, counted=0).
Fix<y>? yes

Directories count wrong for group #29 (38, counted=0).
Fix<y>? yes

bar: ***** FILE SYSTEM WAS MODIFIED *****
bar: 43186/102400 files (0.2% non-contiguous), 291069/1048576 blocks
----8<----

Is this all normal? I can suppose a fsck is required after a shrink. But 
after an online expand, seems odd.

Prior to this little experiment, the lv/ext4fs were at 6.7G. Shrinking 
brought it to 1.7G (with 1.1G in use). And obvously, I ended with 4G even.

../C


From sandeen at redhat.com  Wed Feb  4 03:09:59 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Tue, 03 Feb 2009 21:09:59 -0600
Subject: ext4 resize/fsck
In-Reply-To: <20090204022324.07F876F06C@alopias.GreenKey.net>
References: <20090204022324.07F876F06C@alopias.GreenKey.net>
Message-ID: <49890707.4020007@redhat.com>

Curtis Doty wrote:
> Horsing around with ext4 again...on F-10.
> 
> This time a fsck was required after both an offline shrink and an online 
> grow. Why?

Could you please try again with 1.41.4 from rawhide (or koji:
http://kojipkgs.fedoraproject.org/packages/e2fsprogs/1.41.4/2.fc11/ -
might need to rebuild if there are any library dependency problems) and
see if this persists or is fixed?  Several resize fixes went into 1.41.4
that should take care of this.

I'll probably push 1.41.4 to f10 testing soon, if people are hitting
these problems.

Thanks,
-Eric


From Curtis at GreenKey.net  Wed Feb  4 03:38:27 2009
From: Curtis at GreenKey.net (Curtis Doty)
Date: Tue, 3 Feb 2009 19:38:27 -0800 (PST)
Subject: ext4 resize/fsck
In-Reply-To: <49890707.4020007@redhat.com>
References: <20090204022324.07F876F06C@alopias.GreenKey.net>
	<49890707.4020007@redhat.com>
Message-ID: <20090204033827.1E6646F06C@alopias.GreenKey.net>

9:09pm Eric Sandeen said:

> Curtis Doty wrote:
>> Horsing around with ext4 again...on F-10.
>>
>> This time a fsck was required after both an offline shrink and an online
>> grow. Why?
>
> Could you please try again with 1.41.4 from rawhide (or koji:
> http://kojipkgs.fedoraproject.org/packages/e2fsprogs/1.41.4/2.fc11/ -
> might need to rebuild if there are any library dependency problems) and
> see if this persists or is fixed?  Several resize fixes went into 1.41.4
> that should take care of this.
>
> I'll probably push 1.41.4 to f10 testing soon, if people are hitting
> these problems.
>

Is this an improvement or luck? Fewer issues, but still a ghost dir.

19:30]stratus~# lvextend -L6G foo/bar
   Extending logical volume bar to 6.00 GB
   Logical volume bar successfully resized

19:31]stratus~# resize2fs -p /dev/foo/bar
resize2fs 1.41.4 (27-Jan-2009)
Filesystem at /dev/foo/bar is mounted on /home; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/foo/bar to 1572864 (4k) blocks.
The filesystem on /dev/foo/bar is now 1572864 blocks long.

19:31]stratus~# umount /home
19:32]stratus~# fsck.ext4 -C0 -f /dev/foo/bar
e2fsck 1.41.4 (27-Jan-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Directories count wrong for group #37 (1, counted=0).
Fix<y>? yes

bar: ***** FILE SYSTEM WAS MODIFIED *****
bar: 43188/153600 files (0.1% non-contiguous), 294524/1572864 blocks

../C


From sandeen at redhat.com  Wed Feb  4 03:45:06 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Tue, 03 Feb 2009 21:45:06 -0600
Subject: ext4 resize/fsck
In-Reply-To: <20090204033827.1E6646F06C@alopias.GreenKey.net>
References: <20090204022324.07F876F06C@alopias.GreenKey.net>
	<49890707.4020007@redhat.com>
	<20090204033827.1E6646F06C@alopias.GreenKey.net>
Message-ID: <49890F42.1060007@redhat.com>

Curtis Doty wrote:
> 9:09pm Eric Sandeen said:
> 
>> Curtis Doty wrote:
>>> Horsing around with ext4 again...on F-10.
>>>
>>> This time a fsck was required after both an offline shrink and an online
>>> grow. Why?
>> Could you please try again with 1.41.4 from rawhide (or koji:
>> http://kojipkgs.fedoraproject.org/packages/e2fsprogs/1.41.4/2.fc11/ -
>> might need to rebuild if there are any library dependency problems) and
>> see if this persists or is fixed?  Several resize fixes went into 1.41.4
>> that should take care of this.
>>
>> I'll probably push 1.41.4 to f10 testing soon, if people are hitting
>> these problems.
>>
> 
> Is this an improvement or luck? Fewer issues, but still a ghost dir.
> 
> 19:30]stratus~# lvextend -L6G foo/bar
>    Extending logical volume bar to 6.00 GB
>    Logical volume bar successfully resized
> 
> 19:31]stratus~# resize2fs -p /dev/foo/bar
> resize2fs 1.41.4 (27-Jan-2009)
> Filesystem at /dev/foo/bar is mounted on /home; on-line resizing required
> old desc_blocks = 1, new_desc_blocks = 1
> Performing an on-line resize of /dev/foo/bar to 1572864 (4k) blocks.
> The filesystem on /dev/foo/bar is now 1572864 blocks long.
> 
> 19:31]stratus~# umount /home
> 19:32]stratus~# fsck.ext4 -C0 -f /dev/foo/bar
> e2fsck 1.41.4 (27-Jan-2009)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> Directories count wrong for group #37 (1, counted=0).
> Fix<y>? yes
> 
> bar: ***** FILE SYSTEM WAS MODIFIED *****
> bar: 43188/153600 files (0.1% non-contiguous), 294524/1572864 blocks
> 
> ../C
> 

I hope it's an improvement ;)

If you can reproduce it, you might capture an e2image of the fs prior to
resize, and we could probably investigate the issue pretty easily...

-Eric


From Curtis at GreenKey.net  Wed Feb  4 04:11:29 2009
From: Curtis at GreenKey.net (Curtis Doty)
Date: Tue, 3 Feb 2009 20:11:29 -0800 (PST)
Subject: ext4 resize/fsck
In-Reply-To: <49890F42.1060007@redhat.com>
References: <20090204022324.07F876F06C@alopias.GreenKey.net>
	<49890707.4020007@redhat.com>
	<20090204033827.1E6646F06C@alopias.GreenKey.net>
	<49890F42.1060007@redhat.com>
Message-ID: <20090204041129.CC0046F06C@alopias.GreenKey.net>

9:45pm Eric Sandeen said:

> Curtis Doty wrote:
>> 9:09pm Eric Sandeen said:
>>
>>> Curtis Doty wrote:
>>>> Horsing around with ext4 again...on F-10.
>>>>
>>>> This time a fsck was required after both an offline shrink and an online
>>>> grow. Why?
>>> Could you please try again with 1.41.4 from rawhide (or koji:
>>> http://kojipkgs.fedoraproject.org/packages/e2fsprogs/1.41.4/2.fc11/ -
>>> might need to rebuild if there are any library dependency problems) and
>>> see if this persists or is fixed?  Several resize fixes went into 1.41.4
>>> that should take care of this.
>>>
>>> I'll probably push 1.41.4 to f10 testing soon, if people are hitting
>>> these problems.
>>>
>>
>> Is this an improvement or luck? Fewer issues, but still a ghost dir.
>>
>> 19:30]stratus~# lvextend -L6G foo/bar
>>    Extending logical volume bar to 6.00 GB
>>    Logical volume bar successfully resized
>>
>> 19:31]stratus~# resize2fs -p /dev/foo/bar
>> resize2fs 1.41.4 (27-Jan-2009)
>> Filesystem at /dev/foo/bar is mounted on /home; on-line resizing required
>> old desc_blocks = 1, new_desc_blocks = 1
>> Performing an on-line resize of /dev/foo/bar to 1572864 (4k) blocks.
>> The filesystem on /dev/foo/bar is now 1572864 blocks long.
>>
>> 19:31]stratus~# umount /home
>> 19:32]stratus~# fsck.ext4 -C0 -f /dev/foo/bar
>> e2fsck 1.41.4 (27-Jan-2009)
>> Pass 1: Checking inodes, blocks, and sizes
>> Pass 2: Checking directory structure
>> Pass 3: Checking directory connectivity
>> Pass 4: Checking reference counts
>> Pass 5: Checking group summary information
>> Directories count wrong for group #37 (1, counted=0).
>> Fix<y>? yes
>>
>> bar: ***** FILE SYSTEM WAS MODIFIED *****
>> bar: 43188/153600 files (0.1% non-contiguous), 294524/1572864 blocks
>>
>> ../C
>>
>
> I hope it's an improvement ;)
>
> If you can reproduce it, you might capture an e2image of the fs prior to
> resize, and we could probably investigate the issue pretty easily...
>

Ak. Just re-shrunk offline and then re-grew online. With e2images in 
between each time. However, nothing was inconsistent these times!

Could it be that one ghost dir was indeed missed by 1.41.3 and 
caught/cleaned by 1.41.4? The symptom appears gone here now.

../C


From tytso at mit.edu  Wed Feb  4 06:26:14 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 4 Feb 2009 01:26:14 -0500
Subject: ext4 resize/fsck
In-Reply-To: <20090204041129.CC0046F06C@alopias.GreenKey.net>
References: <20090204022324.07F876F06C@alopias.GreenKey.net>
	<49890707.4020007@redhat.com>
	<20090204033827.1E6646F06C@alopias.GreenKey.net>
	<49890F42.1060007@redhat.com>
	<20090204041129.CC0046F06C@alopias.GreenKey.net>
Message-ID: <20090204062614.GA14762@mit.edu>

It might be fixed with this commit:

commit fdff73f094e7220602cc3f8959c7230517976412
Author: Theodore Ts'o <tytso at mit.edu>
Date:   Mon Jan 26 19:06:41 2009 -0500

    ext4: Initialize the new group descriptor when resizing the filesystem
    
    Make sure all of the fields of the group descriptor are properly
    initialized.  Previously, we allowed bg_flags field to be contain
    random garbage, which could trigger non-deterministic behavior,
    including a kernel OOPS.
    
    http://bugzilla.kernel.org/show_bug.cgi?id=12433
    
    Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
    Cc: stable at kernel.org

The patch was merged with mainline shortly after 2.6.29-rc3.

					- Ted


From puhuri at iki.fi  Wed Feb  4 08:41:32 2009
From: puhuri at iki.fi (Markus Peuhkuri)
Date: Wed, 04 Feb 2009 10:41:32 +0200
Subject: ext4 and  unexpected eh_depth 
Message-ID: <498954BC.9050805@iki.fi>

Hi, I'm running Debian lenny with linux-image-2.6.26-1-amd64  (deb
2.6.26-11).  I have a lvm stripe over three sata disks (3.5TB total)
that is shared over NFS, and I getting following errors

EXT4-fs error (device dm-0): ext4_ext_search_right: bad header in inode #269200: unexpected eh_depth - magic f30a,entries 18, max 340(0), depth 1(2)

An user is having errors in concatenating large files (100+GB): basicly seems that the the resulting file is right size
and ends with right data, but anyway he gets following error: 
cat: write error: Input/output error
on system that has imported partition over NFS. I'm not sure if the file he was accessing did had the same inode.


And once I got BUG below.  I cannot right now upgrade system as there are some long-running analysis running,
but can do some tests at some point, and upgrade in few days.


------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1161!
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ext4dev jbd2 crc16 dag(P) ipv6 dm_mod dagmem(P) loop snd_pcm snd_timer snd soundcore snd_page_alloc intel_rng i2c_i801 rng_core i2c_core parport_pc parport pcspkr iTCO_wdt container shpchp pci_hotplug i5000_edac button edac_core evdev ext3 jbd mbcache sd_mod ahci libata scsi_mod dock floppy ehci_hcd uhci_hcd e1000e thermal processor fan thermal_sys
Pid: 3501, comm: nfsd Tainted: P          2.6.26-1-amd64 #1
RIP: 0010:[<ffffffffa021e5f5>]  [<ffffffffa021e5f5>] :jbd2:jbd2_journal_dirty_metadata+0x5f/0xe3
RSP: 0018:ffff810009543c90  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff81007cd38880 RCX: 00000000ffffffc0
RDX: ffff81001f9e74c0 RSI: ffff81007cd38880 RDI: ffff8100425833a8
RBP: ffff810045b7c490 R08: ffff810034a5a4d8 R09: ffffffffa024be70
R10: 000000000000005c R11: ffff81007cd38880 R12: ffff81003790c000
R13: ffff8100425833a8 R14: 00000000000020dc R15: 0000000000000000
FS:  00007f13c048e6e0(0000) GS:ffffffff8053b000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f8404606210 CR3: 00000000049a0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process nfsd (pid: 3501, threadinfo ffff810009542000, task ffff81007f05e850)
Stack:  0000000000000000 ffff81000948af80 ffff8100425833a8 ffff81007cd38880
 ffffffffa024be70 ffffffffa0242122 ffff810034a5a4d8 ffff81000948af80
 ffff810005a34b80 ffff810004ca4c90 ffff81007d5c8c00 ffffffffa0230dab
Call Trace:
 [<ffffffffa0242122>] ? :ext4dev:__ext4_journal_dirty_metadata+0x1e/0x46
 [<ffffffffa0230dab>] ? :ext4dev:ext4_free_inode+0x2b7/0x324
 [<ffffffffa0233db2>] ? :ext4dev:ext4_delete_inode+0xb7/0xd5
 [<ffffffffa0233cfb>] ? :ext4dev:ext4_delete_inode+0x0/0xd5
 [<ffffffff802ace23>] ? generic_delete_inode+0xab/0x11f
 [<ffffffff802ac358>] ? d_delete+0x49/0xb1
 [<ffffffff802a2e5c>] ? vfs_unlink+0xe3/0x102
 [<ffffffffa02b9b79>] ? :nfsd:nfsd_unlink+0x1e9/0x267
 [<ffffffffa02c17d8>] ? :nfsd:nfsd3_proc_remove+0x9d/0xaa
 [<ffffffffa02b6245>] ? :nfsd:nfsd_dispatch+0xde/0x1b6
 [<ffffffffa026b55b>] ? :sunrpc:svc_process+0x408/0x6e9
 [<ffffffff80429a04>] ? __down_read+0x12/0xa1
 [<ffffffffa02b667c>] ? :nfsd:nfsd+0x0/0x2a4
 [<ffffffffa02b6810>] ? :nfsd:nfsd+0x194/0x2a4
 [<ffffffff80230196>] ? schedule_tail+0x27/0x5c
 [<ffffffff8020cf28>] ? child_rip+0xa/0x12
 [<ffffffffa02b667c>] ? :nfsd:nfsd+0x0/0x2a4
 [<ffffffff8020cf1e>] ? child_rip+0x0/0x12


Code: 03 25 00 00 20 00 48 85 c0 75 f1 f0 0f ba 2b 15 19 c0 85 c0 75 e8 83 7d 10 00 75 19 c7 45 10 01 00 00 00 41 8b 45 08 85 c0 7f 04 <0f> 0b eb fe ff c8 41 89 45 08 48 39 55 28 75 11 83 7d 0c 02 75 
RIP  [<ffffffffa021e5f5>] :jbd2:jbd2_journal_dirty_metadata+0x5f/0xe3
 RSP <ffff810009543c90>
---[ end trace 1336f55a961cc4ae ]---

# dumpe2fs /dev/work/wdata 
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          05867e70-54e2-48bf-8c67-5439e98c5982
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash test_filesystem 
Default mount options:    (none)
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              917536
Block count:              939525120
Reserved block count:     46976256
Free blocks:              507628950
Free inodes:              619199
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      799
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         32
Inode blocks per group:   2
Flex block group size:    16
Filesystem created:       Thu Dec 11 10:36:48 2008
Last mount time:          Mon Jan 26 13:05:07 2009
Last write time:          Sat Jan 31 19:53:25 2009
Mount count:              1
Maximum mount count:      26
Last checked:             Mon Jan 26 12:53:17 2009
Check interval:           15552000 (6 months)
Next check after:         Sat Jul 25 13:53:17 2009
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      2714a303-cb2a-4bbc-8159-29bf52c617ca
Journal backup:           inode blocks
Journal size:             128M

(rest of dumpe2fs output omitted: 32MiB, can put it available somewhere).

t. Markus


From tytso at mit.edu  Wed Feb  4 15:55:10 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 4 Feb 2009 10:55:10 -0500
Subject: ext4 and  unexpected eh_depth
In-Reply-To: <498954BC.9050805@iki.fi>
References: <498954BC.9050805@iki.fi>
Message-ID: <20090204155510.GG14762@mit.edu>

On Wed, Feb 04, 2009 at 10:41:32AM +0200, Markus Peuhkuri wrote:
> Hi, I'm running Debian lenny with linux-image-2.6.26-1-amd64  (deb
> 2.6.26-11).  I have a lvm stripe over three sata disks (3.5TB total)
> that is shared over NFS, and I getting following errors
> 
> EXT4-fs error (device dm-0): ext4_ext_search_right: bad header in inode #269200: unexpected eh_depth - magic f30a,entries 18, max 340(0), depth 1(2)

I can't recall the patch which fixed this, but I'm 95% certain we've
seen this before, and it's been fixed since 2.6.26; I think in 2.6.27
or 2.6.28.  Note that there have been a *huge* number of bug fixes for
ext4 since 2.6.26 and 2.6.27.  If you must use such an old kernel I'd
suggest moving to at least 2.6.27.x or 2.6.28.x after Greg pulls in
the latest set of bug fixes.  Critical bug fixes are still being back
ported to 2.6.27.x, although you won't see various performance
improvements unless you track a much newer kernel.  I'd suggest at
least 2.6.28.y at this point.

						- Ted


From Ralf.Hildebrandt at charite.de  Thu Feb  5 12:58:48 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Thu, 5 Feb 2009 13:58:48 +0100
Subject: Questions regarding journal replay
Message-ID: <20090205125847.GR23918@charite.de>

Today, I had to uncleanly shutdown one of our machines due to an error
in 2.6.28.3. Durin the boot sequence, the ext4 partition /home
experienced a journal replay. /home looks like this:

/dev/mapper/volg1-logv1 on /home type ext4 (rw,noexec,nodev,noatime,errors=remount-ro)

Filesystem              Size  Used Avail Use% Mounted on
/dev/mapper/volg1-logv1 2,4T  1,4T 1022G  58% /home

Filesystem                Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/volg1-logv1 19519488 8793310 10726178   46% /home

The journal replay too quite a while. About 800 seconds.

# dumpe2fs -h /dev/mapper/volg1-logv1
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          032613d3-6035-4872-bc0a-11db92feec5e
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype needs_recovery extent sparse_super large_file uninit_bg
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              19519488
Block count:              624605184
Reserved block count:     0
Free blocks:              267655114
Free inodes:              10726118
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      875
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         1024
Inode blocks per group:   32
Filesystem created:       Tue May  8 21:04:31 2007
Last mount time:          Thu Feb  5 11:08:27 2009
Last write time:          Thu Feb  5 11:08:27 2009
Mount count:              12
Maximum mount count:      -1
Last checked:             Sat Dec 27 23:16:47 2008
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:		            128
Journal inode:            8
First orphan inode:       17529831
Default directory hash:   tea
Directory Hash Seed:      44337061-e542-44bb-afb9-40597ccf1c6d
Journal backup:           inode blocks
Journal size:             128M

Questions:
==========

* Why does it take so long?
* What happens during that time?
* Is my journal maybe too big?

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Thu Feb  5 13:23:02 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Thu, 5 Feb 2009 14:23:02 +0100
Subject: External journal with ext4
Message-ID: <20090205132301.GS23918@charite.de>

Can I still use:

mke2fs  -O journal_dev   /dev/journaldevice
tune2fs -O ^has_journal  /dev/sda1
tune2fs -o journal_data -j -J device=/dev/journaldevice /dev/sda1

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Thu Feb  5 16:26:32 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Thu, 5 Feb 2009 17:26:32 +0100
Subject: External journal with ext4
In-Reply-To: <20090205132301.GS23918@charite.de>
References: <20090205132301.GS23918@charite.de>
Message-ID: <20090205162632.GF9737@charite.de>

* Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
> Can I still use:
> 
> mke2fs  -O journal_dev   /dev/journaldevice
> tune2fs -O ^has_journal  /dev/sda1
> tune2fs -j -J device=/dev/journaldevice /dev/sda1

Yes, one can.

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Curtis at GreenKey.net  Fri Feb  6 14:26:41 2009
From: Curtis at GreenKey.net (Curtis Doty)
Date: Fri, 6 Feb 2009 06:26:41 -0800 (PST)
Subject: Questions regarding journal replay
In-Reply-To: <20090205125847.GR23918@charite.de>
References: <20090205125847.GR23918@charite.de>
Message-ID: <20090206142641.9FE446F064@alopias.GreenKey.net>

Yesterday Ralf Hildebrandt said:

> The journal replay too quite a while. About 800 seconds.
>

Were there any other background iops on the underlying volume devices? 
Like maybe raid reconstruction?

../C


From Ralf.Hildebrandt at charite.de  Fri Feb  6 14:28:22 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Fri, 6 Feb 2009 15:28:22 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090206142641.9FE446F064@alopias.GreenKey.net>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
Message-ID: <20090206142822.GE31519@charite.de>

* Curtis Doty <Curtis at GreenKey.net>:
> Yesterday Ralf Hildebrandt said:
>
>> The journal replay too quite a while. About 800 seconds.
>>
>
> Were there any other background iops on the underlying volume 
> devices? Like maybe raid reconstruction?

I don't think so. The machine never powered off...

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From joschi at fliegergruppe-donzdorf.de  Mon Feb  9 12:47:02 2009
From: joschi at fliegergruppe-donzdorf.de (Jochen Rueter)
Date: Mon, 09 Feb 2009 13:47:02 +0100
Subject: un'stat'able files - fs corruption?
Message-ID: <499025C6.6070301@fliegergruppe-donzdorf.de>

Hello list,

I have some serious problems on my ext3 filesystem. Several folders
contain files, which cannot be accessed in any way,
not even a stat() on these files is possible:

[~]$ ls -l
-rwxrwxr-x 1 yvonne users    30208 2007-09-16 12:49 Stoffverteilungsplan 
tw kl4 07.doc
?--------- ? ?      ?            ?                ? Teddyb?r.docx 

?--------- ? ?      ?            ?                ? Termine f?r montag 
kiga.doc
-rwxrwxr-x 1 yvonne users    28672 2001-11-18 17:29 tiere bei den indios.doc
[~]$ rm Teddy*
Teddyb?r.docx: No such file or directory

In this example, e.g. the file named Teddyb?r.docx has these problems. I 
know it has some non-printable characters in its filename,
which seems to be somehow related to the problem, however I have other 
files containing such characters which can be accessed fine
when escaping the characters correctly. Also, those other files show up 
correctly in the output of 'ls', and 'stat()' works
on these files.

Also e2fsck does not find any errors on this filesystem.
Can anybody help me getting rid of these files?

Thanks alot,

Jochen


From forest at alittletooquiet.net  Mon Feb  9 13:33:05 2009
From: forest at alittletooquiet.net (Forest Bond)
Date: Mon, 9 Feb 2009 08:33:05 -0500
Subject: un'stat'able files - fs corruption?
In-Reply-To: <499025C6.6070301@fliegergruppe-donzdorf.de>
References: <499025C6.6070301@fliegergruppe-donzdorf.de>
Message-ID: <20090209133304.GJ12167@storm.local.network>

Hi,

On Mon, Feb 09, 2009 at 01:47:02PM +0100, Jochen Rueter wrote:
> Hello list,
>
> I have some serious problems on my ext3 filesystem. Several folders
> contain files, which cannot be accessed in any way,
> not even a stat() on these files is possible:
>
> [~]$ ls -l
> -rwxrwxr-x 1 yvonne users    30208 2007-09-16 12:49 Stoffverteilungsplan  
> tw kl4 07.doc
> ?--------- ? ?      ?            ?                ? Teddyb?r.docx 
>
> ?--------- ? ?      ?            ?                ? Termine f?r montag  
> kiga.doc
> -rwxrwxr-x 1 yvonne users    28672 2001-11-18 17:29 tiere bei den indios.doc
> [~]$ rm Teddy*
> Teddyb?r.docx: No such file or directory

[...]

I recall having a similar problem with UTF-8 filenames when I took an external
drive from a x86 machine and plugged it into a powerpc machine.  After spending
hours trying to fix this mysterious filesystem "corruption," I got home and
plugged it into the original machine.  Everything was back to normal.

I speculated at the time that the byte order of the machine was somehow
affecting filename encoding.  In hindsight, this doesn't make a lot of sense to
me (UTF-8 defines byte order).

Some bug in userspace?  I don't know.

-Forest
-- 
Forest Bond
http://www.alittletooquiet.net
http://www.pytagsfs.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20090209/c05c1a4d/attachment.sig>

From lists at nerdbynature.de  Thu Feb 12 06:35:36 2009
From: lists at nerdbynature.de (Christian Kujau)
Date: Wed, 11 Feb 2009 22:35:36 -0800 (PST)
Subject: un'stat'able files - fs corruption?
In-Reply-To: <499025C6.6070301@fliegergruppe-donzdorf.de>
References: <499025C6.6070301@fliegergruppe-donzdorf.de>
Message-ID: <alpine.DEB.2.00.0902112232020.6682@bogon.housecafe.de>

On Mon, 9 Feb 2009, Jochen Rueter wrote:
> ?--------- ? ?      ?            ?                ? Teddyb?r.docx 
> ?--------- ? ?      ?            ?                ? Termine f?r montag
[...]
> Also e2fsck does not find any errors on this filesystem.

Hm, strange that e2fsck (current version?) does not report any errors, 
because I somehow find it hard to believe that it should be related to the 
umlauts in the filename. Did you try moving the files via its inode#? 
Something like:

# ls -li
1234 ?--------- ? ?      ?            ?                ? Teddyb?r.docx
# find . -inum 1234 -exec mv '{}' Teddybaer.docx \;

...does that work?

Christian.
-- 
BOFH excuse #102:

Power company testing new voltage spike (creation) equipment


From vegard at svanberg.no  Thu Feb 12 09:54:40 2009
From: vegard at svanberg.no (Vegard Svanberg)
Date: Thu, 12 Feb 2009 10:54:40 +0100
Subject: Fsck takes too long on multiply-claimed blocks
Message-ID: <20090212095440.GG20749@svanberg.no>

After a power failure, a ~500G filesystem crashed. Fsck has been running
for days. The problem seems to be multiply-claimed blocks. Example:

File /directory/file.name/foo (inode #1234567, mod time Tue Feb 
10 08:14:40 2008)
  has 1800000 multiply-claimed block(s), shared with 1 file(s):
       
/directory/file.name/bar
(inode #1234567, mod time Wed Dec  1 15:30:00 2008)
Clone multiply-claimed blocks? y 

This takes like forever, probably due to the large number of
multiply-claimed blocks. This number can be from 6-2000000, where the
slow ones are fixed quicky and the large ones takes hours/days.

I was wondering if:

- I can get a list of the impacted files/inodes
- Wipe them with debugfs

Is this safe? How do I do it? Fsck says it's 538 inodes with this
problem. If I could get a file list and be able to wipe the inodes, I
could restore the missing files from backup and get the machine online
again quickly.

Hints/tips? Thanks!

-- 
Vegard Svanberg <vegard at svanberg.no> [*Takapa at IRC (EFnet)]


From joschi at fliegergruppe-donzdorf.de  Thu Feb 12 13:41:41 2009
From: joschi at fliegergruppe-donzdorf.de (Jochen Rueter)
Date: Thu, 12 Feb 2009 14:41:41 +0100
Subject: un'stat'able files - fs corruption?
In-Reply-To: <alpine.DEB.2.00.0902112232020.6682@bogon.housecafe.de>
References: <499025C6.6070301@fliegergruppe-donzdorf.de>
	<alpine.DEB.2.00.0902112232020.6682@bogon.housecafe.de>
Message-ID: <49942715.1040401@fliegergruppe-donzdorf.de>

My e2fsck is version e2fsck 1.40-WIP (14-Nov-2006), which is included in 
debian lenny.
Actually, ls seems even not to be able to determine the inode number:

21022170 -rwxrwxr-x 1 yvonne users    30208 2007-09-16 12:49 
Stoffverteilungsplan tw kl4 07.doc
       ? ?--------- ? ?      ?            ?                ? Termine f?r 
Dienstag kiga.doc

Maybe it's worth noting that this is running on arm: 2.6.21 #1 PREEMPT 
Tue May 8 21:05:53 CEST 2007 armv5tel GNU/Linux

Jochen

Christian Kujau schrieb:
> On Mon, 9 Feb 2009, Jochen Rueter wrote:
>   
>> ?--------- ? ?      ?            ?                ? Teddyb?r.docx 
>> ?--------- ? ?      ?            ?                ? Termine f?r montag
>>     
> [...]
>   
>> Also e2fsck does not find any errors on this filesystem.
>>     
>
> Hm, strange that e2fsck (current version?) does not report any errors, 
> because I somehow find it hard to believe that it should be related to the 
> umlauts in the filename. Did you try moving the files via its inode#? 
> Something like:
>
> # ls -li
> 1234 ?--------- ? ?      ?            ?                ? Teddyb?r.docx
> # find . -inum 1234 -exec mv '{}' Teddybaer.docx \;
>
> ...does that work?
>
> Christian.
>   


From tytso at mit.edu  Thu Feb 12 14:19:49 2009
From: tytso at mit.edu (Theodore Tso)
Date: Thu, 12 Feb 2009 09:19:49 -0500
Subject: Fsck takes too long on multiply-claimed blocks
In-Reply-To: <20090212095440.GG20749@svanberg.no>
References: <20090212095440.GG20749@svanberg.no>
Message-ID: <20090212141948.GB13040@mini-me.lan>

On Thu, Feb 12, 2009 at 10:54:40AM +0100, Vegard Svanberg wrote:
> After a power failure, a ~500G filesystem crashed. Fsck has been running
> for days. The problem seems to be multiply-claimed blocks. Example:
> 
> File /directory/file.name/foo (inode #1234567, mod time Tue Feb 
> 10 08:14:40 2008)
>   has 1800000 multiply-claimed block(s), shared with 1 file(s):
>        
> /directory/file.name/bar
> (inode #1234567, mod time Wed Dec  1 15:30:00 2008)
> Clone multiply-claimed blocks? y 
> 
> This takes like forever, probably due to the large number of
> multiply-claimed blocks.

You are using a version of e2fsprogs/e2fsck newer than 1.28, right?
If not, there's your problem; upgrade to something newer.  Older
e2fsck's had O(n**2) algorithms that made this very slow, causing this
pass to be CPU-bound.  It could be slow because of memory pressure
issues; the data structures for keeping track of all of those blocks
aren't small.

>I was wondering if:
>
> - I can get a list of the impacted files/inodes

Yes; you can;  they were listed by e2fsck during pass 1B, actually:

Look for entries like this:

Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 12: 25 26
Multiply-claimed block(s) in inode 13: 25 26 57 58
Multiply-claimed block(s) in inode 14: 57 58

> - Wipe them with debugfs

You could wipe them all out via debugfs's clri function, like this:

debugfs -R "clri <12> <13> <14>" /dev/sdXX

The angle brackets indicate that you are passing in an inode number,
instead of a pathname; and I've left it as an exercise to the reader
how to use your choice of tools (emacs, grep/awk, perl) to pull out
the necessary inode numbers from e2fsck's Pass1B output.

Then run e2fsck, and it will clear the resulting inodes.

To get the filenames, do this first, before the clri command:

debugfs -R "ncheck 12 13 14" /dev/sdXX

(No angle brackets are needed because ncheck only takes inode numbers
and converts them to pathnames.)

> Is this safe? How do I do it? Fsck says it's 538 inodes with this
> problem. If I could get a file list and be able to wipe the inodes, I
> could restore the missing files from backup and get the machine online
> again quickly.

However, it's not strictly necessary to wipe all 538 inodes.  It's
likely that you only need to wipe approximately half of them.  What
happened is that somehow, the disk drive got confused and wrote data
to the wrong location on disk.  Or, the journal was corrupted (one of
the reasons why ext4 has journal checksums) so inode table blocks got
written to the wrong place on disk.  So that means what you'll see is
something like this:
	
Multiply-claimed block(s) in inode 32: 200 201 203
Multiply-claimed block(s) in inode 33: 210 211 212 213 214
Multiply-claimed block(s) in inode 34: 215 216 217 218
	 ...
Multiply-claimed block(s) in inode 128: 200 201 203
Multiply-claimed block(s) in inode 129: 210 211 212 213 214
Multiply-claimed block(s) in inode 130: 215 216 217 218

You may not see 16 or 32 inodes in each group of duplicate inodes
(there are 32 inodes in each 4k block, 16 inodes per 4k block if you
are using 256 byte inodes), since some inodes may have been deleted or
never allocated before.

In any case, only one set of inodes will be correct; after you
determine which one set seems correct given the mapping between
pathnames and file contents, you can clri the other set.

Or if that's too much effort, you can clri them all and recover them
from backups....

							- Ted


From tytso at mit.edu  Thu Feb 12 14:19:49 2009
From: tytso at mit.edu (Theodore Tso)
Date: Thu, 12 Feb 2009 09:19:49 -0500
Subject: Fsck takes too long on multiply-claimed blocks
In-Reply-To: <20090212095440.GG20749@svanberg.no>
References: <20090212095440.GG20749@svanberg.no>
Message-ID: <20090212141948.GB13040@mini-me.lan>

On Thu, Feb 12, 2009 at 10:54:40AM +0100, Vegard Svanberg wrote:
> After a power failure, a ~500G filesystem crashed. Fsck has been running
> for days. The problem seems to be multiply-claimed blocks. Example:
> 
> File /directory/file.name/foo (inode #1234567, mod time Tue Feb 
> 10 08:14:40 2008)
>   has 1800000 multiply-claimed block(s), shared with 1 file(s):
>        
> /directory/file.name/bar
> (inode #1234567, mod time Wed Dec  1 15:30:00 2008)
> Clone multiply-claimed blocks? y 
> 
> This takes like forever, probably due to the large number of
> multiply-claimed blocks.

You are using a version of e2fsprogs/e2fsck newer than 1.28, right?
If not, there's your problem; upgrade to something newer.  Older
e2fsck's had O(n**2) algorithms that made this very slow, causing this
pass to be CPU-bound.  It could be slow because of memory pressure
issues; the data structures for keeping track of all of those blocks
aren't small.

>I was wondering if:
>
> - I can get a list of the impacted files/inodes

Yes; you can;  they were listed by e2fsck during pass 1B, actually:

Look for entries like this:

Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 12: 25 26
Multiply-claimed block(s) in inode 13: 25 26 57 58
Multiply-claimed block(s) in inode 14: 57 58

> - Wipe them with debugfs

You could wipe them all out via debugfs's clri function, like this:

debugfs -R "clri <12> <13> <14>" /dev/sdXX

The angle brackets indicate that you are passing in an inode number,
instead of a pathname; and I've left it as an exercise to the reader
how to use your choice of tools (emacs, grep/awk, perl) to pull out
the necessary inode numbers from e2fsck's Pass1B output.

Then run e2fsck, and it will clear the resulting inodes.

To get the filenames, do this first, before the clri command:

debugfs -R "ncheck 12 13 14" /dev/sdXX

(No angle brackets are needed because ncheck only takes inode numbers
and converts them to pathnames.)

> Is this safe? How do I do it? Fsck says it's 538 inodes with this
> problem. If I could get a file list and be able to wipe the inodes, I
> could restore the missing files from backup and get the machine online
> again quickly.

However, it's not strictly necessary to wipe all 538 inodes.  It's
likely that you only need to wipe approximately half of them.  What
happened is that somehow, the disk drive got confused and wrote data
to the wrong location on disk.  Or, the journal was corrupted (one of
the reasons why ext4 has journal checksums) so inode table blocks got
written to the wrong place on disk.  So that means what you'll see is
something like this:
	
Multiply-claimed block(s) in inode 32: 200 201 203
Multiply-claimed block(s) in inode 33: 210 211 212 213 214
Multiply-claimed block(s) in inode 34: 215 216 217 218
	 ...
Multiply-claimed block(s) in inode 128: 200 201 203
Multiply-claimed block(s) in inode 129: 210 211 212 213 214
Multiply-claimed block(s) in inode 130: 215 216 217 218

You may not see 16 or 32 inodes in each group of duplicate inodes
(there are 32 inodes in each 4k block, 16 inodes per 4k block if you
are using 256 byte inodes), since some inodes may have been deleted or
never allocated before.

In any case, only one set of inodes will be correct; after you
determine which one set seems correct given the mapping between
pathnames and file contents, you can clri the other set.

Or if that's too much effort, you can clri them all and recover them
from backups....

							- Ted


From ross at biostat.ucsf.edu  Fri Feb 13 02:09:26 2009
From: ross at biostat.ucsf.edu (Ross Boylan)
Date: Thu, 12 Feb 2009 18:09:26 -0800
Subject: ext2_check_mount_point: No such file ...
Message-ID: <1234490966.4722.73.camel@iron.psg.net>

I am trying to shrink an ext3 filesystem mounted on top of software
RAID.  The ultimate goal is to shrink the RAID to make room for a new
installation that will use LVM over RAID.  I get the error in the
subject, and wonder what I need to do to avoid it.  Details follow:

Shrinking needs to be done offline, right?

After being unable to get Knoppix started I used the break=bottom to
stop the boot process while I was still in the initrd.  When I
dismounted the filesystem I discovered my initrd didn't have the ext3
utilities.  I remount the file system and copied /sbin, /lib, and /etc
onto my ramdisk.  Without /etc fsck complained about being unable to
find fstab; afterwords, it ran.

However, both fsck and resize2fs complained
ext2_check_mount_point: No such file or directory while determining
whether /dev/md1 is mounted.

Adding the -f flag did not help.

There is no /etc/mtab file.

I realize this is all pretty dodgy, but is there a way I can deal with
this problem?  What file or directory is it looking for?
-- 
Ross Boylan                                      wk:  (415) 514-8146
185 Berry St #5700                               ross at biostat.ucsf.edu
Dept of Epidemiology and Biostatistics           fax: (415) 514-8150
University of California, San Francisco
San Francisco, CA 94107-1739                     hm:  (415) 550-1062


From darkonc at gmail.com  Fri Feb 13 08:10:19 2009
From: darkonc at gmail.com (Stephen Samuel)
Date: Fri, 13 Feb 2009 00:10:19 -0800
Subject: un'stat'able files - fs corruption?
In-Reply-To: <49942715.1040401@fliegergruppe-donzdorf.de>
References: <499025C6.6070301@fliegergruppe-donzdorf.de>
	<alpine.DEB.2.00.0902112232020.6682@bogon.housecafe.de>
	<49942715.1040401@fliegergruppe-donzdorf.de>
Message-ID: <6cd50f9f0902130010n2b8b4485w41c476430acdef12@mail.gmail.com>

When you ran the FSCK, did you force the check ( -f )?
Usually fsck will refuse to run a full test if it thinks that the filesystem
was unmounted
cleanly or if it thinks that running the log will be sufficient cleanup.

Also, have you moved this filesystem between machines (and, most notably,
between architectures)?

On Thu, Feb 12, 2009 at 5:41 AM, Jochen Rueter <
joschi at fliegergruppe-donzdorf.de> wrote:

> My e2fsck is version e2fsck 1.40-WIP (14-Nov-2006), which is included in
> debian lenny.
> Actually, ls seems even not to be able to determine the inode number:
>
> 21022170 -rwxrwxr-x 1 yvonne users    30208 2007-09-16 12:49
> Stoffverteilungsplan tw kl4 07.doc
>      ? ?--------- ? ?      ?            ?                ? Termine f?r
> Dienstag kiga.doc
>
> Maybe it's worth noting that this is running on arm: 2.6.21 #1 PREEMPT Tue
> May 8 21:05:53 CEST 2007 armv5tel GNU/Linux
>
> Jochen
>
> Christian Kujau schrieb:
>
>> On Mon, 9 Feb 2009, Jochen Rueter wrote:
>>
>>
>>> ?--------- ? ?      ?            ?                ? Teddyb?r.docx
>>> ?--------- ? ?      ?            ?                ? Termine f?r montag
>>>
>>>
>> [...]
>>
>>
>>> Also e2fsck does not find any errors on this filesystem.
>>>
>>>
>>

-- 
Stephen Samuel http://www.bcgreen.com
778-861-7641
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20090213/f59100b6/attachment.htm>

From sandeen at redhat.com  Fri Feb 13 16:39:59 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Fri, 13 Feb 2009 10:39:59 -0600
Subject: ext2_check_mount_point: No such file ...
In-Reply-To: <1234490966.4722.73.camel@iron.psg.net>
References: <1234490966.4722.73.camel@iron.psg.net>
Message-ID: <4995A25F.5020709@redhat.com>

Ross Boylan wrote:
> I am trying to shrink an ext3 filesystem mounted on top of software
> RAID.  The ultimate goal is to shrink the RAID to make room for a new
> installation that will use LVM over RAID.  I get the error in the
> subject, and wonder what I need to do to avoid it.  Details follow:
> 
> Shrinking needs to be done offline, right?
> 
> After being unable to get Knoppix started I used the break=bottom to
> stop the boot process while I was still in the initrd.  When I
> dismounted the filesystem I discovered my initrd didn't have the ext3
> utilities.  I remount the file system and copied /sbin, /lib, and /etc
> onto my ramdisk.  Without /etc fsck complained about being unable to
> find fstab; afterwords, it ran.
> 
> However, both fsck and resize2fs complained
> ext2_check_mount_point: No such file or directory while determining
> whether /dev/md1 is mounted.
> 
> Adding the -f flag did not help.
> 
> There is no /etc/mtab file.
> 
> I realize this is all pretty dodgy, but is there a way I can deal with
> this problem?  What file or directory is it looking for?

try "cp /proc/mounts to /etc/mtab" perhaps.  Maybe the tools should
check both (if they don't already, I haven't actually checked yet)  :)

-eric


From tytso at mit.edu  Sat Feb 14 14:36:42 2009
From: tytso at mit.edu (Theodore Tso)
Date: Sat, 14 Feb 2009 09:36:42 -0500
Subject: ext2_check_mount_point: No such file ...
In-Reply-To: <4995A25F.5020709@redhat.com>
References: <1234490966.4722.73.camel@iron.psg.net>
	<4995A25F.5020709@redhat.com>
Message-ID: <20090214143642.GF26628@mini-me.lan>

On Fri, Feb 13, 2009 at 10:39:59AM -0600, Eric Sandeen wrote:
> > However, both fsck and resize2fs complained
> > ext2_check_mount_point: No such file or directory while determining
> > whether /dev/md1 is mounted.
> > 
> > Adding the -f flag did not help.
> > 
> > There is no /etc/mtab file.
> > 
> > I realize this is all pretty dodgy, but is there a way I can deal with
> > this problem?  What file or directory is it looking for?
> 
> try "cp /proc/mounts to /etc/mtab" perhaps.  Maybe the tools should
> check both (if they don't already, I haven't actually checked yet)  :)

The tools do check both already.  So the easist solution is

mount -t proc proc /proc

							- Ted


From ross at biostat.ucsf.edu  Sat Feb 14 18:40:14 2009
From: ross at biostat.ucsf.edu (Ross Boylan)
Date: Sat, 14 Feb 2009 10:40:14 -0800
Subject: ext2_check_mount_point: No such file ...
In-Reply-To: <4995A25F.5020709@redhat.com>
References: <1234490966.4722.73.camel@iron.psg.net>
	<4995A25F.5020709@redhat.com>
Message-ID: <1234636814.19527.17.camel@corn.betterworld.us>

On Fri, 2009-02-13 at 10:39 -0600, Eric Sandeen wrote:
> Ross Boylan wrote:
> > I am trying to shrink an ext3 filesystem mounted on top of software
> > RAID.  The ultimate goal is to shrink the RAID to make room for a new
> > installation that will use LVM over RAID.  I get the error in the
> > subject, and wonder what I need to do to avoid it.  Details follow:
> > 
> > Shrinking needs to be done offline, right?
> > 
> > After being unable to get Knoppix started I used the break=bottom to
> > stop the boot process while I was still in the initrd.  When I
> > dismounted the filesystem I discovered my initrd didn't have the ext3
> > utilities.  I remount the file system and copied /sbin, /lib, and /etc
> > onto my ramdisk.  Without /etc fsck complained about being unable to
> > find fstab; afterwords, it ran.
> > 
> > However, both fsck and resize2fs complained
> > ext2_check_mount_point: No such file or directory while determining
> > whether /dev/md1 is mounted.
> > 
> > Adding the -f flag did not help.
> > 
> > There is no /etc/mtab file.
> > 
> > I realize this is all pretty dodgy, but is there a way I can deal with
> > this problem?  What file or directory is it looking for?
> 
> try "cp /proc/mounts to /etc/mtab" perhaps.  Maybe the tools should
> check both (if they don't already, I haven't actually checked yet)  :)

Thanks; that worked.  Unfortunately, after resizing the filesystem I had
trouble resizing the underlying partitions.  I backed up, and now I'm
going to do a fresh install.


From sandeen at redhat.com  Sat Feb 14 20:09:30 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Sat, 14 Feb 2009 14:09:30 -0600
Subject: ext2_check_mount_point: No such file ...
In-Reply-To: <1234636814.19527.17.camel@corn.betterworld.us>
References: <1234490966.4722.73.camel@iron.psg.net>	
	<4995A25F.5020709@redhat.com>
	<1234636814.19527.17.camel@corn.betterworld.us>
Message-ID: <499724FA.3050703@redhat.com>

Ross Boylan wrote:
> On Fri, 2009-02-13 at 10:39 -0600, Eric Sandeen wrote:
>> Ross Boylan wrote:

...

>>> There is no /etc/mtab file.
>>>
>>> I realize this is all pretty dodgy, but is there a way I can deal with
>>> this problem?  What file or directory is it looking for?
>> try "cp /proc/mounts to /etc/mtab" perhaps.  Maybe the tools should
>> check both (if they don't already, I haven't actually checked yet)  :)
> 
> Thanks; that worked.  Unfortunately, after resizing the filesystem I had
> trouble resizing the underlying partitions.  I backed up, and now I'm
> going to do a fresh install.

Just to double check; did you have to mount /proc first?

-Eric


From adilger at sun.com  Tue Feb 17 20:47:35 2009
From: adilger at sun.com (Andreas Dilger)
Date: Tue, 17 Feb 2009 13:47:35 -0700
Subject: Fsck takes too long on multiply-claimed blocks
In-Reply-To: <20090212141948.GB13040@mini-me.lan>
References: <20090212095440.GG20749@svanberg.no>
	<20090212141948.GB13040@mini-me.lan>
Message-ID: <20090217204735.GC3199@webber.adilger.int>

On Feb 12, 2009  09:19 -0500, Theodore Ts'o wrote:
> On Thu, Feb 12, 2009 at 10:54:40AM +0100, Vegard Svanberg wrote:
> > After a power failure, a ~500G filesystem crashed. Fsck has been running
> > for days. The problem seems to be multiply-claimed blocks. Example:
> > 
> > File /directory/file.name/foo (inode #1234567, mod time Tue Feb 
> > 10 08:14:40 2008)
> >   has 1800000 multiply-claimed block(s), shared with 1 file(s):
> >        
> > /directory/file.name/bar
> > (inode #1234567, mod time Wed Dec  1 15:30:00 2008)
> > Clone multiply-claimed blocks? y 
> > 
> > This takes like forever, probably due to the large number of
> > multiply-claimed blocks.
> 
> You are using a version of e2fsprogs/e2fsck newer than 1.28, right?
> If not, there's your problem; upgrade to something newer.  Older
> e2fsck's had O(n**2) algorithms that made this very slow, causing this
> pass to be CPU-bound.  It could be slow because of memory pressure
> issues; the data structures for keeping track of all of those blocks
> aren't small.

The "inode badness" patch in the Lustre e2fsprogs does a reasonably
good job at handling this.  It will automatically mark one/both
of these inodes as "fatally corrupted" and delete it/them.  That will
not happen if only a handful of blocks are shared, so would not delete
files in cases with e.g. simple bitflips and such.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


From darkonc at gmail.com  Fri Feb 20 08:29:37 2009
From: darkonc at gmail.com (Stephen Samuel)
Date: Fri, 20 Feb 2009 00:29:37 -0800
Subject: fast builds for ext3 filesystems.
Message-ID: <6cd50f9f0902200029m3e56d030w6b8e1f06fae2b02d@mail.gmail.com>

I'm investigating ways of doing fast builds on a system.
The machines that we're building are essentially identical, but the hardware
is just short of random.

(we rebuild  systems from donated machines for donation to non-profits, and
thrift-store sales).

currently we use the oem install process, but I'm having problems with the
current system, so I decided to
implement my idea for a fast build process.

I've got a build that was copied onto a 6GB partition, then I made a
partimage backup of the system.

On new systems, I restore the 6GB partition onto the (almost always larger)
partition on the new disk (might be between 15GB and 80GB) then use
resize2fs to fit the filesystem into the new partition.

The last thing I do is run a script to reset the UUIDs for fstab and grub.

Question is:  what are the disadvantages of using partimage to install the
new system?
I'm thinking that the only real disadvantage would be performance problems
associated with the placemt of OS data on the expanded filesystem. How bad
would that be, and are there other issues to look at?

My script to reset the uuids on the new system is below.  Am I missing any
critical locations for changing the UUID?


=================
# presumes that mounted filestem for /dev/sdXX is at /tmp/sdXX

rootfs=/dev/sda8
swapfs=/dev/sda6
rootdev=${rootfs/%[0-9]/}
rootdev=${rootfs/%[0-9]/} # 2 digit partition numbers?
grub-install --root-directory=/tmp/${rootfs#/dev/} $rootdev

tune2fs -U random  $rootfs

fs_uuid=05ea19df-a029-4fb3-9ef7-2c497e641a60
sw_uuid=675bf141-9964-4593-9a29-2c0d40c129d5
cd /tmp/${rootfs#/dev/}

new_fs_uuid=`vol_id --uuid $rootfs`
new_sw_uuid=`vol_id --uuid $swapfs`

sed -i  "s/$fs_uuid/$new_fs_uuid/g;s/$sw_uuid/$new_sw_uuid/g"
/tmp/${rootfs#/dev/}/etc/fstab
sed -i  "s/$fs_uuid/$new_fs_uuid/g;s/$sw_uuid/$new_sw_uuid/g"
/tmp/${rootfs#/dev/}/boot/grub/menu.lst

#clear out ethN udev cache
sed -i '/^# PCI device /,$d'
/tmp/${rootfs#/dev/}/etc/udev/rules.d/70-persistent-net.rules
=========================


-- 
Stephen Samuel http://www.bcgreen.com
778-861-7641
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20090220/855c0094/attachment.htm>

From ross at biostat.ucsf.edu  Fri Feb 20 19:13:09 2009
From: ross at biostat.ucsf.edu (Ross Boylan)
Date: Fri, 20 Feb 2009 11:13:09 -0800
Subject: advice on partitioning
Message-ID: <1235157189.18050.13.camel@iron.psg.net>

I use Cyrus, which writes each email message to a separate file on disk.
I have a lot of mail.

Is it OK to put this on the general /var partition, or are the optimal
parameters so different that it really should be separate?

I've encrypted /var, so making another encrypted partition means another
prompt to deal with on startup.  I'm hoping to avoid that.  Either way,
everything would be on the same physical disks (software RAID1).

I'd also appreciate any hints about which parameters I should tune.

Thanks.
-- 
Ross Boylan                                      wk:  (415) 514-8146
185 Berry St #5700                               ross at biostat.ucsf.edu
Dept of Epidemiology and Biostatistics           fax: (415) 514-8150
University of California, San Francisco
San Francisco, CA 94107-1739                     hm:  (415) 550-1062


From lakshmipathi.g at gmail.com  Sun Feb 22 11:19:33 2009
From: lakshmipathi.g at gmail.com (lakshmi pathi)
Date: Sun, 22 Feb 2009 16:49:33 +0530
Subject: new features in giis4.4
Message-ID: <ae2f51270902220319r19e08e26mfae90b9a914857d7@mail.gmail.com>

Hi,
giis4.4 has following features included,

1)can recover files deleted on a specific date or deleted before or
after a specific date or even within specific date range.
2)Files can be recovered with their original access permission types
and file owner and
group details.
3)A user-friendly configuration file was added,which supports adding
new directories even
after installation.
4)Large directories are supported.

Any issues/comments please let me know.

Download url:
www.giis.co.in/

Cheers,
Lakshmipathi.G


From magawake at gmail.com  Tue Feb 24 04:36:24 2009
From: magawake at gmail.com (Mag Gam)
Date: Mon, 23 Feb 2009 23:36:24 -0500
Subject: newbie filesystem question
Message-ID: <1cbd6f830902232036k30612e7bw3262f81134fcd4b2@mail.gmail.com>

Since there are experts here, I though this would be the best place to
ask the question:

As I understand, ext2 and ext3 we preallocate inodes when a filesystem
is being created. It basically writes "zeros" to the volume.  (please
correct me if I am wrong)

Once the filesystem is created it creates an inode table which keeps
all the inode information. The inode table changes when there are
changes on the filesystem (I/O).

I was wondering, how come some other filesystems have a dynamic inode
table? Where you can have infinite number of inodes?


Sorry, if this is a dumb question. I am trying to learn some Unix basics.

TIA


From jjneely at ncsu.edu  Wed Feb 25 16:13:37 2009
From: jjneely at ncsu.edu (Jack Neely)
Date: Wed, 25 Feb 2009 11:13:37 -0500
Subject: ext3 kernel panic
Message-ID: <20090225161337.GJ14821@virge.linuxczar.net>

Folks,

I have a RHEL 3 production Cyrus IMAP server that has kernel paniced
twice in as many weeks with something similar to the below.  The /imap
partition (where the IO was in question) is about 98% full with about
6.2G free and is hosted on a fiber connected EMC Clariion lun.
Currently running kernel 2.4.21-15.0.4.ELsmp.

I know things are old here, and the machine is on the upgrade list, but
I need to do due diligence to figure out how serious this error is.
Does anyone have any advice why this is happening?

Thanks,
Jack Neely


Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
f88d2863
*pde = 299d3001
*pte = 00000000
Oops: 0002
audit autofs openafs e1000 iptable_filter ip_tables floppy microcode emcphr emcpmpap emcpmpaa emcpmpc emcpmp sg emcp emcpsf loop lvm-mod keybdev mousedev hid 
CPU:    0
EIP:    0060:[<f88d2863>]    Tainted: P 
EFLAGS: 00010213

EIP is at ext3_orphan_del [ext3] 0x73 (2.4.21-15.0.4.ELsmp/i686)
eax: c520d360   ebx: e3fc8900   ecx: e3fc8aac   edx: 00000000
esi: 00000000   edi: c520d000   ebp: c520d360   esp: eed8fddc
ds: 0068   es: 0068   ss: 0068
Process imapd (pid: 7628, stackpage=eed8f000)
Stack: e4de9940 eed8e000 c69b6200 e9ffde00 f88b8445 000f8094 c520d0fc e1ee4e80 
       00000007 00000292 e3fc8900 00000000 eed8e000 f88cc72c c69b6200 e3fc8900 
       e4de9940 eed8e000 e9ffde00 f88cc84c e4de9940 e3fc8900 ee6d7500 e3fc8900 
Call Trace:   [<f88b8445>] journal_start_Rsmp_25661df5 [jbd] 0xa5 (0xeed8fdec)
[<f88cc72c>] start_transaction [ext3] 0x8c (0xeed8fe10)
[<f88cc84c>] ext3_delete_inode [ext3] 0x8c (0xeed8fe28)
[<f88cc7c0>] ext3_delete_inode [ext3] 0x0 (0xeed8fe3c)
[<c017ccd0>] iput [kernel] 0x150 (0xeed8fe44)
[<c0179afa>] dput [kernel] 0xca (0xeed8fe60)
[<c0161a5b>] __fput [kernel] 0xbb (0xeed8fe74)
[<c015fc0e>] filp_close [kernel] 0x8e (0xeed8fe90)
[<c012bf6c>] put_files_struct [kernel] 0x6c (0xeed8feac)
[<c012c85a>] do_exit [kernel] 0x1ba (0xeed8fec8)
[<c012cbbb>] do_group_exit [kernel] 0x8b (0xeed8fee4)
[<c013656b>] get_signal_to_deliver [kernel] 0x20b (0xeed8fef8)
[<c010bdd4>] do_signal [kernel] 0x64 (0xeed8ff20)
[<c0175a56>] sys_select [kernel] 0x296 (0xeed8ff60)

Code: 89 42 04 89 10 c7 41 04 00 00 00 00 89 8b ac 01 00 00 89 8b

Kernel panic: Fatal exception

-- 
Jack Neely <jjneely at ncsu.edu>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89


From sandeen at redhat.com  Wed Feb 25 16:22:46 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 25 Feb 2009 10:22:46 -0600
Subject: ext3 kernel panic
In-Reply-To: <20090225161337.GJ14821@virge.linuxczar.net>
References: <20090225161337.GJ14821@virge.linuxczar.net>
Message-ID: <49A57056.4050506@redhat.com>

Jack Neely wrote:
> Folks,
> 
> I have a RHEL 3 production Cyrus IMAP server that has kernel paniced
> twice in as many weeks with something similar to the below.  The /imap
> partition (where the IO was in question) is about 98% full with about
> 6.2G free and is hosted on a fiber connected EMC Clariion lun.
> Currently running kernel 2.4.21-15.0.4.ELsmp.
> 
> I know things are old here, and the machine is on the upgrade list, but
> I need to do due diligence to figure out how serious this error is.
> Does anyone have any advice why this is happening?
> 
> Thanks,
> Jack Neely
> 

That's not only an old distro, but an un-updated installation.  I'd get
the latest RHEL3 kernel and peruse the changelog, for starters, to see
if it looks like this might have been fixed (although I don't see
anything offhand).

-Eric

> Unable to handle kernel NULL pointer dereference at virtual address 00000004
>  printing eip:
> f88d2863
> *pde = 299d3001
> *pte = 00000000
> Oops: 0002
> audit autofs openafs e1000 iptable_filter ip_tables floppy microcode emcphr emcpmpap emcpmpaa emcpmpc emcpmp sg emcp emcpsf loop lvm-mod keybdev mousedev hid 
> CPU:    0
> EIP:    0060:[<f88d2863>]    Tainted: P 
> EFLAGS: 00010213
> 
> EIP is at ext3_orphan_del [ext3] 0x73 (2.4.21-15.0.4.ELsmp/i686)
> eax: c520d360   ebx: e3fc8900   ecx: e3fc8aac   edx: 00000000
> esi: 00000000   edi: c520d000   ebp: c520d360   esp: eed8fddc
> ds: 0068   es: 0068   ss: 0068
> Process imapd (pid: 7628, stackpage=eed8f000)
> Stack: e4de9940 eed8e000 c69b6200 e9ffde00 f88b8445 000f8094 c520d0fc e1ee4e80 
>        00000007 00000292 e3fc8900 00000000 eed8e000 f88cc72c c69b6200 e3fc8900 
>        e4de9940 eed8e000 e9ffde00 f88cc84c e4de9940 e3fc8900 ee6d7500 e3fc8900 
> Call Trace:   [<f88b8445>] journal_start_Rsmp_25661df5 [jbd] 0xa5 (0xeed8fdec)
> [<f88cc72c>] start_transaction [ext3] 0x8c (0xeed8fe10)
> [<f88cc84c>] ext3_delete_inode [ext3] 0x8c (0xeed8fe28)
> [<f88cc7c0>] ext3_delete_inode [ext3] 0x0 (0xeed8fe3c)
> [<c017ccd0>] iput [kernel] 0x150 (0xeed8fe44)
> [<c0179afa>] dput [kernel] 0xca (0xeed8fe60)
> [<c0161a5b>] __fput [kernel] 0xbb (0xeed8fe74)
> [<c015fc0e>] filp_close [kernel] 0x8e (0xeed8fe90)
> [<c012bf6c>] put_files_struct [kernel] 0x6c (0xeed8feac)
> [<c012c85a>] do_exit [kernel] 0x1ba (0xeed8fec8)
> [<c012cbbb>] do_group_exit [kernel] 0x8b (0xeed8fee4)
> [<c013656b>] get_signal_to_deliver [kernel] 0x20b (0xeed8fef8)
> [<c010bdd4>] do_signal [kernel] 0x64 (0xeed8ff20)
> [<c0175a56>] sys_select [kernel] 0x296 (0xeed8ff60)
> 
> Code: 89 42 04 89 10 c7 41 04 00 00 00 00 89 8b ac 01 00 00 89 8b
> 
> Kernel panic: Fatal exception
> 


From Ralf.Hildebrandt at charite.de  Wed Feb 25 16:24:26 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 17:24:26 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090206142822.GE31519@charite.de>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
Message-ID: <20090225162426.GA26291@charite.de>

* Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
> * Curtis Doty <Curtis at GreenKey.net>:
> > Yesterday Ralf Hildebrandt said:
> >
> >> The journal replay too quite a while. About 800 seconds.
> >>
> >
> > Were there any other background iops on the underlying volume 
> > devices? Like maybe raid reconstruction?
> 
> I don't think so. The machine never powered off...

Again, 2.6.28.7 failed us and now we're encountering another journal
replay. Taking ages. This sucks.

Questions:

How can I find out (during normal operation) HOW MUCH of the
journal is actually in use?

How can I resize the journal to be smaller, thus making a journal
replay faster?

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From sandeen at redhat.com  Wed Feb 25 16:31:42 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 25 Feb 2009 10:31:42 -0600
Subject: Questions regarding journal replay
In-Reply-To: <20090225162426.GA26291@charite.de>
References: <20090205125847.GR23918@charite.de>	<20090206142641.9FE446F064@alopias.GreenKey.net>	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
Message-ID: <49A5726E.6030703@redhat.com>

Ralf Hildebrandt wrote:
> * Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
>> * Curtis Doty <Curtis at GreenKey.net>:
>>> Yesterday Ralf Hildebrandt said:
>>>
>>>> The journal replay too quite a while. About 800 seconds.
>>>>
>>> Were there any other background iops on the underlying volume 
>>> devices? Like maybe raid reconstruction?
>> I don't think so. The machine never powered off...
> 
> Again, 2.6.28.7 failed us and now we're encountering another journal
> replay. Taking ages. This sucks.
> 
> Questions:
> 
> How can I find out (during normal operation) HOW MUCH of the
> journal is actually in use?
> 
> How can I resize the journal to be smaller, thus making a journal
> replay faster?
> 

It'd be better to get to the bottom of the problem ... maybe iostat
while it's happening to see if IO is actually happening; run blktrace to
see where IO is going, do a few sysrq-t's to see where threads are at, etc.

Can you find a way to reproduce this at will?

Journal replay should *never* take this long, AFAIK.

-Eric


From Ralf.Hildebrandt at charite.de  Wed Feb 25 17:20:36 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 18:20:36 +0100
Subject: dumpe2fs and external journal: Illegal inode number while reading
	journal inode
Message-ID: <20090225172036.GD26291@charite.de>

I created an ext4 fs with an external journal.
I wanted to check how big the journal was, and tried:

# dumpe2fs -h /dev/mapper/volg1-logv1
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          032613d3-6035-4872-bc0a-11db92feec5e
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal resize_inode dir_index filetype
needs_recovery extent sparse_super large_file uninit_bg
Filesystem flags:         signed_directory_hash 
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              19519488
Block count:              624605184
Reserved block count:     0
Free blocks:              257817321
Free inodes:              10481629
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      875
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         1024
Inode blocks per group:   32
Filesystem created:       Tue May  8 21:04:31 2007
Last mount time:          Wed Feb 25 18:01:47 2009
Last write time:          Wed Feb 25 18:01:47 2009
Mount count:              19
Maximum mount count:      -1
Last checked:             Sat Dec 27 23:16:47 2008
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:		            128
Journal UUID:             1a7063f5-8965-40f2-9feb-e37d6ac467e9
Journal device:		            0x6806
First orphan inode:       622943
Default directory hash:   tea
Directory Hash Seed:      44337061-e542-44bb-afb9-40597ccf1c6d
Journal backup:           inode blocks
dumpe2fs: Illegal inode number while reading journal inode

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Wed Feb 25 17:23:34 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 18:23:34 +0100
Subject: Questions regarding journal replay
In-Reply-To: <49A5726E.6030703@redhat.com>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
Message-ID: <20090225172334.GF26291@charite.de>

* Eric Sandeen <sandeen at redhat.com>:
> Ralf Hildebrandt wrote:
> > * Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
> >> * Curtis Doty <Curtis at GreenKey.net>:
> >>> Yesterday Ralf Hildebrandt said:
> >>>
> >>>> The journal replay too quite a while. About 800 seconds.
> >>>>
> >>> Were there any other background iops on the underlying volume 
> >>> devices? Like maybe raid reconstruction?
> >> I don't think so. The machine never powered off...
> > 
> > Again, 2.6.28.7 failed us and now we're encountering another journal
> > replay. Taking ages. This sucks.
> > 
> > Questions:
> > 
> > How can I find out (during normal operation) HOW MUCH of the
> > journal is actually in use?
> > 
> > How can I resize the journal to be smaller, thus making a journal
> > replay faster?
> > 
> 
> It'd be better to get to the bottom of the problem ... maybe iostat
> while it's happening to see if IO is actually happening; run blktrace to
> see where IO is going, do a few sysrq-t's to see where threads are at, etc.

We had 24GB of reading from the journal device (or 12GB if it's
512byte blocks). I wonder why?

> Can you find a way to reproduce this at will?

Yes. My users will kill me, though.

> Journal replay should *never* take this long, AFAIK.

Amen

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From tytso at mit.edu  Wed Feb 25 17:34:59 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 25 Feb 2009 12:34:59 -0500
Subject: Questions regarding journal replay
In-Reply-To: <49A5726E.6030703@redhat.com>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
Message-ID: <20090225173459.GO7064@mit.edu>

On Wed, Feb 25, 2009 at 10:31:42AM -0600, Eric Sandeen wrote:
> 
> It'd be better to get to the bottom of the problem ... maybe iostat
> while it's happening to see if IO is actually happening; run blktrace to
> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
> 
> Can you find a way to reproduce this at will?
> 
> Journal replay should *never* take this long, AFAIK.

Indeed.  The journal is 128 megs, as I recall.  So even if the journal
was completely full, if it's taking 800 seconds, that's a write rate
of 0.16 Mb/S (164 kb/second).   That is indeed way too slow.  

I assume this wasn't your boot partition, so the journal replay was
being done by e2fsck, right?  Or are you guys skipping e2fsck and the
journal replay was happening when you mounted the partition?  If the
journal replay is happening via e2fsck, is fsck running any other
filesystem checks in parallel? 

Also, what is the geometry of your raid?  How many disks, what RAID
level, and what is the chunk size?  The journal replay is done a
filesystem block at a time, so it could be that it's turning into a
large number of read-modify-writes, which is trashing your performance
if the chunk size is really large.

The other thing that might explain the performan problem is if the
somehow the number of multiple outstanding requests allowed by the
hard drive has been clamped down to a very small number, and so a
large number of small read/write requests is really killing
performance.  The system dmesg log might have some hidden clues about
that.

						- Ted


From sandeen at redhat.com  Wed Feb 25 17:36:25 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 25 Feb 2009 11:36:25 -0600
Subject: Questions regarding journal replay
In-Reply-To: <20090225172334.GF26291@charite.de>
References: <20090205125847.GR23918@charite.de>	<20090206142641.9FE446F064@alopias.GreenKey.net>	<20090206142822.GE31519@charite.de>	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225172334.GF26291@charite.de>
Message-ID: <49A58199.2060101@redhat.com>

Ralf Hildebrandt wrote:
> * Eric Sandeen <sandeen at redhat.com>:
>> Ralf Hildebrandt wrote:
>>> * Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
>>>> * Curtis Doty <Curtis at GreenKey.net>:
>>>>> Yesterday Ralf Hildebrandt said:
>>>>>
>>>>>> The journal replay too quite a while. About 800 seconds.
>>>>>>
>>>>> Were there any other background iops on the underlying volume 
>>>>> devices? Like maybe raid reconstruction?
>>>> I don't think so. The machine never powered off...
>>> Again, 2.6.28.7 failed us and now we're encountering another journal
>>> replay. Taking ages. This sucks.
>>>
>>> Questions:
>>>
>>> How can I find out (during normal operation) HOW MUCH of the
>>> journal is actually in use?
>>>
>>> How can I resize the journal to be smaller, thus making a journal
>>> replay faster?
>>>
>> It'd be better to get to the bottom of the problem ... maybe iostat
>> while it's happening to see if IO is actually happening; run blktrace to
>> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
> 
> We had 24GB of reading from the journal device (or 12GB if it's
> 512byte blocks). I wonder why?

24GB of reading from the journal device (during that 800s of replay
during mount?), and your journal is 128M ... well that's odd.

You say journal device; is this an external journal?  I didn't think so
from your first email, but is it?

>> Can you find a way to reproduce this at will?
> 
> Yes. My users will kill me, though.

No spare box, eh :(

>> Journal replay should *never* take this long, AFAIK.
> 
> Amen
> 

so let's figure it out :)

-Eric


From Ralf.Hildebrandt at charite.de  Wed Feb 25 17:39:07 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 18:39:07 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090225173459.GO7064@mit.edu>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225173459.GO7064@mit.edu>
Message-ID: <20090225173907.GG26291@charite.de>

* Theodore Tso <tytso at mit.edu>:
> On Wed, Feb 25, 2009 at 10:31:42AM -0600, Eric Sandeen wrote:
> > 
> > It'd be better to get to the bottom of the problem ... maybe iostat
> > while it's happening to see if IO is actually happening; run blktrace to
> > see where IO is going, do a few sysrq-t's to see where threads are at, etc.
> > 
> > Can you find a way to reproduce this at will?
> > 
> > Journal replay should *never* take this long, AFAIK.
> 
> Indeed.  The journal is 128 megs, as I recall.  So even if the journal
> was completely full, if it's taking 800 seconds, that's a write rate
> of 0.16 Mb/S (164 kb/second).   That is indeed way too slow.  

The problem seems to be with the external journal which I recently
changed to. It's a 32GB partition. My timings seem to indicate that
ALL OF IT was being replayed

> I assume this wasn't your boot partition, so the journal replay was
> being done by e2fsck, right?

Yes

> Or are you guys skipping e2fsck and the journal replay was happening
> when you mounted the partition?

Both. We tried both ways :)

> If the journal replay is happening via e2fsck, is fsck running any
> other filesystem checks in parallel? 

No, it's running alone.

> Also, what is the geometry of your raid?  How many disks, what RAID
> level, and what is the chunk size?  The journal replay is done a
> filesystem block at a time, so it could be that it's turning into a
> large number of read-modify-writes, which is trashing your performance
> if the chunk size is really large.

The RAID is made up from one logical volume, consisting of two drives
sda and sdb, each containing 6 disks in a hardware RAID5 setup.
 
> The other thing that might explain the performan problem is if the
> somehow the number of multiple outstanding requests allowed by the
> hard drive has been clamped down to a very small number, and so a
> large number of small read/write requests is really killing
> performance.  The system dmesg log might have some hidden clues about
> that.

dmesg is silent

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Wed Feb 25 17:40:38 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 18:40:38 +0100
Subject: Questions regarding journal replay
In-Reply-To: <49A58199.2060101@redhat.com>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
	<20090225172334.GF26291@charite.de> <49A58199.2060101@redhat.com>
Message-ID: <20090225174038.GH26291@charite.de>

* Eric Sandeen <sandeen at redhat.com>:

> >> It'd be better to get to the bottom of the problem ... maybe iostat
> >> while it's happening to see if IO is actually happening; run blktrace to
> >> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
> > 
> > We had 24GB of reading from the journal device (or 12GB if it's
> > 512byte blocks). I wonder why?
> 
> 24GB of reading from the journal device (during that 800s of replay
> during mount?), and your journal is 128M ... well that's odd.

After my initial report I removed the journal and created an external
journal on a 32GB partition. Hoping it would be faster, since
accoriding to the docs. the journal size is limited to 128MB.
 
> You say journal device; is this an external journal?  I didn't think so
> from your first email, but is it?

It is now.

# dumpe2fs -h /dev/cciss/c0d0p6
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name:   journal_device
Last mounted on:          <not available>
Filesystem UUID:          1a7063f5-8965-40f2-9feb-e37d6ac467e9
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      journal_dev
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              0
Block count:              8488436
Reserved block count:     0
Free blocks:              0
Free inodes:              0
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         0
Inode blocks per group:   0
Filesystem created:       Thu Feb  5 14:05:36 2009
Last mount time:          n/a
Last write time:          Thu Feb  5 14:15:26 2009
Mount count:              0
Maximum mount count:      30
Last checked:             Thu Feb  5 14:05:36 2009
Check interval:           15552000 (6 months)
Next check after:         Tue Aug  4 15:05:36 2009
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:		            256
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4
Directory Hash Seed:      fddb247a-97df-4582-bfcd-816ef8c17ab2

Journal block size:       4096
Journal length:           8488436
Journal first block:      2
Journal sequence:         0x0027c611
Journal start:            2
Journal number of users:  1
Journal users:            032613d3-6035-4872-bc0a-11db92feec5e

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Wed Feb 25 17:42:14 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 18:42:14 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090225174038.GH26291@charite.de>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
	<20090225172334.GF26291@charite.de> <49A58199.2060101@redhat.com>
	<20090225174038.GH26291@charite.de>
Message-ID: <20090225174214.GI26291@charite.de>

* Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:

> After my initial report I removed the journal and created an external
> journal on a 32GB partition. Hoping it would be faster, since
> accoriding to the docs. the journal size is limited to 128MB.

That should read:

Hoping it would be faster, since -- according to the docs -- the journal
size is limited to 128MB.

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From sandeen at redhat.com  Wed Feb 25 17:44:10 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 25 Feb 2009 11:44:10 -0600
Subject: Questions regarding journal replay
In-Reply-To: <20090225173907.GG26291@charite.de>
References: <20090205125847.GR23918@charite.de>	<20090206142641.9FE446F064@alopias.GreenKey.net>	<20090206142822.GE31519@charite.de>	<20090225162426.GA26291@charite.de>	<49A5726E.6030703@redhat.com>
	<20090225173459.GO7064@mit.edu> <20090225173907.GG26291@charite.de>
Message-ID: <49A5836A.7050508@redhat.com>

Ralf Hildebrandt wrote:
> * Theodore Tso <tytso at mit.edu>:
>> On Wed, Feb 25, 2009 at 10:31:42AM -0600, Eric Sandeen wrote:
>>> It'd be better to get to the bottom of the problem ... maybe iostat
>>> while it's happening to see if IO is actually happening; run blktrace to
>>> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
>>>
>>> Can you find a way to reproduce this at will?
>>>
>>> Journal replay should *never* take this long, AFAIK.
>> Indeed.  The journal is 128 megs, as I recall.  So even if the journal
>> was completely full, if it's taking 800 seconds, that's a write rate
>> of 0.16 Mb/S (164 kb/second).   That is indeed way too slow.  
> 
> The problem seems to be with the external journal which I recently
> changed to. It's a 32GB partition. My timings seem to indicate that
> ALL OF IT was being replayed
> 

But you also saw this with an internal journal?

Perhaps you have uncovered 2 bugs ... :)

TBH external journals probably aren't tested that much (though they
certainly should work)

I'll give it a quick sanity test on ext4.

-Eric


From sandeen at redhat.com  Wed Feb 25 18:08:17 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 25 Feb 2009 12:08:17 -0600
Subject: dumpe2fs and external journal: Illegal inode number while
 reading journal inode
In-Reply-To: <20090225172036.GD26291@charite.de>
References: <20090225172036.GD26291@charite.de>
Message-ID: <49A58911.5030805@redhat.com>

Ralf Hildebrandt wrote:
> I created an ext4 fs with an external journal.
> I wanted to check how big the journal was, and tried:
> 
> # dumpe2fs -h /dev/mapper/volg1-logv1
> dumpe2fs 1.41.3 (12-Oct-2008)
...
> dumpe2fs: Illegal inode number while reading journal inode
> 

this should be fixed by:


commit a11d0746b4fb2ac41dcb5e7acf31942b1e8925e2
Author: Theodore Ts'o <tytso at mit.edu>
Date:   Sat Nov 15 15:05:51 2008 -0500

    dumpe2fs: Only print inline journal information if the journal is
internal

    Currently dumpe2fs displays an error if run on a filesystem with an
    external journal.

    Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>

in e2fsprogs-1.41.4

-Eric


From Ralf.Hildebrandt at charite.de  Wed Feb 25 18:11:08 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 19:11:08 +0100
Subject: Questions regarding journal replay
In-Reply-To: <49A5836A.7050508@redhat.com>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225173459.GO7064@mit.edu>
	<20090225173907.GG26291@charite.de> <49A5836A.7050508@redhat.com>
Message-ID: <20090225181108.GA8554@charite.de>

* Eric Sandeen <sandeen at redhat.com>:

> > The problem seems to be with the external journal which I recently
> > changed to. It's a 32GB partition. My timings seem to indicate that
> > ALL OF IT was being replayed
> > 
> 
> But you also saw this with an internal journal?

Yes.
 
> Perhaps you have uncovered 2 bugs ... :)
> 
> TBH external journals probably aren't tested that much (though they
> certainly should work)
> 
> I'll give it a quick sanity test on ext4.

They DO work, but apparently the docs are wrong! I mean, no sane
person needs 32GB of journal

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From jjneely at ncsu.edu  Wed Feb 25 18:22:34 2009
From: jjneely at ncsu.edu (Jack Neely)
Date: Wed, 25 Feb 2009 13:22:34 -0500
Subject: ext3 kernel panic
In-Reply-To: <49A57056.4050506@redhat.com>
References: <20090225161337.GJ14821@virge.linuxczar.net>
	<49A57056.4050506@redhat.com>
Message-ID: <20090225182234.GK14821@virge.linuxczar.net>

On Wed, Feb 25, 2009 at 10:22:46AM -0600, Eric Sandeen wrote:
> Jack Neely wrote:
> > Folks,
> > 
> > I have a RHEL 3 production Cyrus IMAP server that has kernel paniced
> > twice in as many weeks with something similar to the below.  The /imap
> > partition (where the IO was in question) is about 98% full with about
> > 6.2G free and is hosted on a fiber connected EMC Clariion lun.
> > Currently running kernel 2.4.21-15.0.4.ELsmp.
> > 
> > I know things are old here, and the machine is on the upgrade list, but
> > I need to do due diligence to figure out how serious this error is.
> > Does anyone have any advice why this is happening?
> > 
> > Thanks,
> > Jack Neely
> > 
> 
> That's not only an old distro, but an un-updated installation.  I'd get
> the latest RHEL3 kernel and peruse the changelog, for starters, to see
> if it looks like this might have been fixed (although I don't see
> anything offhand).
> 
> -Eric
> 

I'm caught between a rock and a hard place due to the EMC PowerPath
binary only kernel crack.  Which makes it painful to both me and my
customers to regularly upgrade the kernel.  Not to mention the EMC
supportability matrix of doom.

I have 11 other imap servers configured identically that are not
regularly panicing.  I'm trying to figure out what specifically could be
affecting this one machine or that isn't affecting the others.  The only
change log entry that seems close is:

    - fix O_SYNC EIO error propagation through ext3/jbd (Stephen
      Tweedie)

from kernel-2.4.21-34.EL.  Is that anywhere close?

Jack

-- 
Jack Neely <jjneely at ncsu.edu>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89


From tytso at mit.edu  Wed Feb 25 18:44:48 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 25 Feb 2009 13:44:48 -0500
Subject: Questions regarding journal replay
In-Reply-To: <20090225174214.GI26291@charite.de>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
	<20090225172334.GF26291@charite.de> <49A58199.2060101@redhat.com>
	<20090225174038.GH26291@charite.de>
	<20090225174214.GI26291@charite.de>
Message-ID: <20090225184448.GP7064@mit.edu>

On Wed, Feb 25, 2009 at 06:42:14PM +0100, Ralf Hildebrandt wrote:
> * Ralf Hildebrandt <Ralf.Hildebrandt at charite.de>:
> 
> > After my initial report I removed the journal and created an external
> > journal on a 32GB partition. Hoping it would be faster, since
> > accoriding to the docs. the journal size is limited to 128MB.
> 
> That should read:
> 
> Hoping it would be faster, since -- according to the docs -- the journal
> size is limited to 128MB.
> 

Increasing the journal size may speed up certain filesystem workloads
which are causing the journal to wrap very frequently.  However,
increasing the journal *will* increase the time to replay the journal....

How long did the journal replay take when you were using the 128MB
internal inode?  Was the 800 seconds to replay for the the case when
you were using the internal journal or the external journal?

    	       	   	    	       - Ted


From tytso at mit.edu  Wed Feb 25 18:46:17 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 25 Feb 2009 13:46:17 -0500
Subject: Questions regarding journal replay
In-Reply-To: <20090225173907.GG26291@charite.de>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225173459.GO7064@mit.edu>
	<20090225173907.GG26291@charite.de>
Message-ID: <20090225184617.GQ7064@mit.edu>

On Wed, Feb 25, 2009 at 06:39:07PM +0100, Ralf Hildebrandt wrote:
> 
> The RAID is made up from one logical volume, consisting of two drives
> sda and sdb, each containing 6 disks in a hardware RAID5 setup.

Do you know what the chunk size or strip size is for your hardware
RAID5?

					- Ted


From tytso at mit.edu  Wed Feb 25 18:48:16 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 25 Feb 2009 13:48:16 -0500
Subject: dumpe2fs and external journal: Illegal inode number while
	reading journal inode
In-Reply-To: <20090225172036.GD26291@charite.de>
References: <20090225172036.GD26291@charite.de>
Message-ID: <20090225184816.GR7064@mit.edu>

On Wed, Feb 25, 2009 at 06:20:36PM +0100, Ralf Hildebrandt wrote:
> I created an ext4 fs with an external journal.
> I wanted to check how big the journal was, and tried:
> 
> # dumpe2fs -h /dev/mapper/volg1-logv1
> Journal backup:           inode blocks
> dumpe2fs: Illegal inode number while reading journal inode

This bug was fixed in e2fsprogs 1.41.4.  (By commenting out the code
that printed the journal size; I was in a hurry to get 1.41.4 out the
door.)

You can get the size of an exernal journal by running dumpe2fs on the
external journal.

						- Ted


From Ralf.Hildebrandt at charite.de  Wed Feb 25 18:50:57 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 19:50:57 +0100
Subject: dumpe2fs and external journal: Illegal inode number while
	reading journal inode
In-Reply-To: <20090225184816.GR7064@mit.edu>
References: <20090225172036.GD26291@charite.de> <20090225184816.GR7064@mit.edu>
Message-ID: <20090225185057.GB8554@charite.de>

* Theodore Tso <tytso at mit.edu>:

> This bug was fixed in e2fsprogs 1.41.4.  (By commenting out the code
> that printed the journal size; I was in a hurry to get 1.41.4 out the
> door.)
> 
> You can get the size of an exernal journal by running dumpe2fs on the
> external journal.

Yes, I found out.
Anyway, I'm back to a 128MB internal journal now.

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From tytso at mit.edu  Wed Feb 25 18:52:48 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 25 Feb 2009 13:52:48 -0500
Subject: Questions regarding journal replay
In-Reply-To: <20090225181108.GA8554@charite.de>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225173459.GO7064@mit.edu>
	<20090225173907.GG26291@charite.de> <49A5836A.7050508@redhat.com>
	<20090225181108.GA8554@charite.de>
Message-ID: <20090225185248.GA1363@mit.edu>

On Wed, Feb 25, 2009 at 07:11:08PM +0100, Ralf Hildebrandt wrote:
> > TBH external journals probably aren't tested that much (though they
> > certainly should work)
> > 
> > I'll give it a quick sanity test on ext4.
> 
> They DO work, but apparently the docs are wrong! I mean, no sane
> person needs 32GB of journal

The docs don't warn against needing that large of a journal, yes.  One
of the things which never got finished (although it was in the
original design of the jbd layer) was the ability to share the journal
across multiple filesystems.  This would mean that it might more sense
to have a single large journal.  Probably not 32GB in size, though. 

Did you find some documentation that actually recommend that large of
an external journal?

					- Ted


From Ralf.Hildebrandt at charite.de  Wed Feb 25 19:02:15 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 20:02:15 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090225185248.GA1363@mit.edu>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225173459.GO7064@mit.edu>
	<20090225173907.GG26291@charite.de> <49A5836A.7050508@redhat.com>
	<20090225181108.GA8554@charite.de> <20090225185248.GA1363@mit.edu>
Message-ID: <20090225190215.GD8554@charite.de>

* Theodore Tso <tytso at mit.edu>:

> The docs don't warn against needing that large of a journal, yes.  One
> of the things which never got finished (although it was in the
> original design of the jbd layer) was the ability to share the journal
> across multiple filesystems.  This would mean that it might more sense
> to have a single large journal.  Probably not 32GB in size, though. 
> 
> Did you find some documentation that actually recommend that large of
> an external journal?

It says the "journal has a  maximum size of 128M"

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Wed Feb 25 19:01:31 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 20:01:31 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090225184448.GP7064@mit.edu>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
	<20090225172334.GF26291@charite.de> <49A58199.2060101@redhat.com>
	<20090225174038.GH26291@charite.de>
	<20090225174214.GI26291@charite.de> <20090225184448.GP7064@mit.edu>
Message-ID: <20090225190131.GC8554@charite.de>

* Theodore Tso <tytso at mit.edu>:

> Increasing the journal size may speed up certain filesystem workloads
> which are causing the journal to wrap very frequently.  However,
> increasing the journal *will* increase the time to replay the journal....

Indeed. This is a Maildir-style mailbox server. Many small writes,
reads and deletes.

> How long did the journal replay take when you were using the 128MB
> internal inode?

800s

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From sandeen at redhat.com  Wed Feb 25 19:21:06 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 25 Feb 2009 13:21:06 -0600
Subject: ext3 kernel panic
In-Reply-To: <20090225182234.GK14821@virge.linuxczar.net>
References: <20090225161337.GJ14821@virge.linuxczar.net>	<49A57056.4050506@redhat.com>
	<20090225182234.GK14821@virge.linuxczar.net>
Message-ID: <49A59A22.6020509@redhat.com>

Jack Neely wrote:

> I'm caught between a rock and a hard place due to the EMC PowerPath
> binary only kernel crack.  Which makes it painful to both me and my
> customers to regularly upgrade the kernel.  Not to mention the EMC
> supportability matrix of doom.
> 
> I have 11 other imap servers configured identically that are not
> regularly panicing.  I'm trying to figure out what specifically could be
> affecting this one machine or that isn't affecting the others.  The only
> change log entry that seems close is:
> 
>     - fix O_SYNC EIO error propagation through ext3/jbd (Stephen
>       Tweedie)
> 
> from kernel-2.4.21-34.EL.  Is that anywhere close?

I kind of doubt it; as I said, I don't see anything in the changelogs
that looks immediately relevant...

-Eric


From sandeen at redhat.com  Wed Feb 25 19:28:35 2009
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 25 Feb 2009 13:28:35 -0600
Subject: Questions regarding journal replay
In-Reply-To: <20090225174038.GH26291@charite.de>
References: <20090205125847.GR23918@charite.de>	<20090206142641.9FE446F064@alopias.GreenKey.net>	<20090206142822.GE31519@charite.de>	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com>	<20090225172334.GF26291@charite.de>
	<49A58199.2060101@redhat.com> <20090225174038.GH26291@charite.de>
Message-ID: <49A59BE3.6070906@redhat.com>

Ralf Hildebrandt wrote:
> * Eric Sandeen <sandeen at redhat.com>:
> 
>>>> It'd be better to get to the bottom of the problem ... maybe iostat
>>>> while it's happening to see if IO is actually happening; run blktrace to
>>>> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
>>> We had 24GB of reading from the journal device (or 12GB if it's
>>> 512byte blocks). I wonder why?
>> 24GB of reading from the journal device (during that 800s of replay
>> during mount?), and your journal is 128M ... well that's odd.
> 
> After my initial report I removed the journal and created an external
> journal on a 32GB partition. Hoping it would be faster, since
> accoriding to the docs. the journal size is limited to 128MB.
>  
>> You say journal device; is this an external journal?  I didn't think so
>> from your first email, but is it?
> 
> It is now.

...

> Journal block size:       4096
> Journal length:           8488436
> Journal first block:      2
> Journal sequence:         0x0027c611
> Journal start:            2
> Journal number of users:  1
> Journal users:            032613d3-6035-4872-bc0a-11db92feec5e

Ok we might be getting a little off-track here.  Your journal is indeed
32G in size.  But you also saw this with an internal journal, which
should be limited to 128M, and yet you still saw a very long replay, right?

-Eric


From Ralf.Hildebrandt at charite.de  Wed Feb 25 19:31:08 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 20:31:08 +0100
Subject: Questions regarding journal replay
In-Reply-To: <49A59BE3.6070906@redhat.com>
References: <20090205125847.GR23918@charite.de>
	<20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
	<20090225172334.GF26291@charite.de> <49A58199.2060101@redhat.com>
	<20090225174038.GH26291@charite.de> <49A59BE3.6070906@redhat.com>
Message-ID: <20090225193108.GE8554@charite.de>

* Eric Sandeen <sandeen at redhat.com>:

> > Journal block size:       4096
> > Journal length:           8488436
> > Journal first block:      2
> > Journal sequence:         0x0027c611
> > Journal start:            2
> > Journal number of users:  1
> > Journal users:            032613d3-6035-4872-bc0a-11db92feec5e
> 
> Ok we might be getting a little off-track here.  Your journal is indeed
> 32G in size.  But you also saw this with an internal journal, which
> should be limited to 128M, and yet you still saw a very long replay, right?

800s for 128M, yes
-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From tytso at mit.edu  Wed Feb 25 21:11:18 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 25 Feb 2009 16:11:18 -0500
Subject: Questions regarding journal replay
In-Reply-To: <20090225190215.GD8554@charite.de>
References: <20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225173459.GO7064@mit.edu>
	<20090225173907.GG26291@charite.de> <49A5836A.7050508@redhat.com>
	<20090225181108.GA8554@charite.de> <20090225185248.GA1363@mit.edu>
	<20090225190215.GD8554@charite.de>
Message-ID: <20090225211118.GC1363@mit.edu>

On Wed, Feb 25, 2009 at 08:02:15PM +0100, Ralf Hildebrandt wrote:
> > 
> > Did you find some documentation that actually recommend that large of
> > an external journal?
> 
> It says the "journal has a  maximum size of 128M"

That's clearly not right.  Where did you see that?  We should make
sure it gets fixed...

					- Ted


From tytso at mit.edu  Wed Feb 25 21:15:31 2009
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 25 Feb 2009 16:15:31 -0500
Subject: Questions regarding journal replay
In-Reply-To: <20090225190131.GC8554@charite.de>
References: <20090206142641.9FE446F064@alopias.GreenKey.net>
	<20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
	<20090225172334.GF26291@charite.de> <49A58199.2060101@redhat.com>
	<20090225174038.GH26291@charite.de>
	<20090225174214.GI26291@charite.de> <20090225184448.GP7064@mit.edu>
	<20090225190131.GC8554@charite.de>
Message-ID: <20090225211531.GE1363@mit.edu>

On Wed, Feb 25, 2009 at 08:01:31PM +0100, Ralf Hildebrandt wrote:
> * Theodore Tso <tytso at mit.edu>:
> 
> > Increasing the journal size may speed up certain filesystem workloads
> > which are causing the journal to wrap very frequently.  However,
> > increasing the journal *will* increase the time to replay the journal....
> 
> Indeed. This is a Maildir-style mailbox server. Many small writes,
> reads and deletes.
> 
> > How long did the journal replay take when you were using the 128MB
> > internal inode?
> 
> 800s

So maybe I missed it, but about how long did it take with your 32GB
external journal?

						- Ted


From Ralf.Hildebrandt at charite.de  Wed Feb 25 21:47:14 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Wed, 25 Feb 2009 22:47:14 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090225211531.GE1363@mit.edu>
References: <20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de> <49A5726E.6030703@redhat.com>
	<20090225172334.GF26291@charite.de> <49A58199.2060101@redhat.com>
	<20090225174038.GH26291@charite.de>
	<20090225174214.GI26291@charite.de> <20090225184448.GP7064@mit.edu>
	<20090225190131.GC8554@charite.de> <20090225211531.GE1363@mit.edu>
Message-ID: <20090225214714.GK8554@charite.de>

* Theodore Tso <tytso at mit.edu>:
> On Wed, Feb 25, 2009 at 08:01:31PM +0100, Ralf Hildebrandt wrote:
> > * Theodore Tso <tytso at mit.edu>:
> > 
> > > Increasing the journal size may speed up certain filesystem workloads
> > > which are causing the journal to wrap very frequently.  However,
> > > increasing the journal *will* increase the time to replay the journal....
> > 
> > Indeed. This is a Maildir-style mailbox server. Many small writes,
> > reads and deletes.
> > 
> > > How long did the journal replay take when you were using the 128MB
> > > internal inode?
> > 
> > 800s
> 
> So maybe I missed it, but about how long did it take with your 32GB
> external journal?

One hour
:(

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Fri Feb 27 16:40:41 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Fri, 27 Feb 2009 17:40:41 +0100
Subject: Questions regarding journal replay
In-Reply-To: <20090225211118.GC1363@mit.edu>
References: <20090206142822.GE31519@charite.de>
	<20090225162426.GA26291@charite.de>
	<49A5726E.6030703@redhat.com> <20090225173459.GO7064@mit.edu>
	<20090225173907.GG26291@charite.de> <49A5836A.7050508@redhat.com>
	<20090225181108.GA8554@charite.de> <20090225185248.GA1363@mit.edu>
	<20090225190215.GD8554@charite.de> <20090225211118.GC1363@mit.edu>
Message-ID: <20090227164041.GE7136@charite.de>

* Theodore Tso <tytso at mit.edu>:
> On Wed, Feb 25, 2009 at 08:02:15PM +0100, Ralf Hildebrandt wrote:
> > > 
> > > Did you find some documentation that actually recommend that large of
> > > an external journal?
> > 
> > It says the "journal has a  maximum size of 128M"
> 
> That's clearly not right.  Where did you see that?  We should make
> sure it gets fixed...

The journal options in the tune2fs man page say:

****** CITE *********

  size=journal-size

Create a journal stored in the filesystem of size journal-size megabytes.
The size of the journal must be at least 1024 filesystem blocks (i.e.,
1MB if using 1k blocks, 4MB if using 4k blocks, etc.) and may be no more
than 102,400 filesystem blocks.  There must be enough free space in the
filesystem to create a journal of that size.

  device=external-journal

Attach the filesystem to the journal block device located on
external-journal.  The external journal must have been already created
using the command

  mke2fs -O journal_dev external-journal
		  
Note that external-journal must be formatted with the same block size as
filesystems which will be using it.  In addition, while there is support
for attaching multiple filesystems to a single external journal, the
Linux kernel and e2fsck(8) do not currently support shared exter? nal
journals yet.

****** CITE *********

It would be nice if the manpage included a sentence like:

If an external journal is used, the whole journal block device will be
used as journal.

The sentence "... be no more than 102,400 filesystem blocks." gives the
impression (to me, that is) that the same restrcition applies to an
external-journal as well!

Yes, I know *NOW* that's not the case.

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin


From Ralf.Hildebrandt at charite.de  Fri Feb 27 16:44:28 2009
From: Ralf.Hildebrandt at charite.de (Ralf Hildebrandt)
Date: Fri, 27 Feb 2009 17:44:28 +0100
Subject: tune2fs options
Message-ID: <20090227164428.GG7136@charite.de>

Is there any way of making operations like dropping and adding a
journal:

tune2fs -O ^has_journal  /dev/local/my_dev
and
tune2fs -o journal_data -j -J device=LABEL=my-journal-device /dev/local/my_dev

more verbose? It would be nice to know what's going on, since just
sitting there can be quite unnerving!

In the end, it all worked ok, but actually seeing progress can be
soothing.

-- 
Ralf Hildebrandt                                Ralf.Hildebrandt at charite.de
Charite - Universit?tsmedizin Berlin            Tel.  +49 (0)30-450 570-155
Gesch?ftsbereich IT | Abt. Netzwerk             Fax.  +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin