From subscribe at sydes.la  Tue May  3 02:20:30 2005
From: subscribe at sydes.la (Jason Sydes)
Date: Mon, 2 May 2005 19:20:30 -0700
Subject: several ext3 and mysql kernel crashes
Message-ID: <20050503022030.GH23016@hq.newdream.net>

Hi Ext3!

I'm running about 30 dedicated MySQL machines under quite decent loads,
and they are occassionally crashing.  I've been logging console messages
recently in an effort to find the cause, and some appear to be related
to 

I perused your lists and found the message I'm replying to.

If you don't mind, I've included messages and ksymoops from two crashes
that I had recently.  Both were different.  I'm not sure if you have
fixes for them in the new kernel, so I'll be upgrading a few machines
tonight.

I'm running 2.6.10 with the "data=journal" mount option.
Is that the best / safest option for running with MySQL?

In any case, I'm logging all console messages now, so hopefully I can
have more ksymoops output for you soon enough.

I've included the output for each below.

Thank you for your time!
Jason



First Machine ("Scratchy")
==========================

Assertion failure in __journal_drop_transaction() at 
fs/jbd/checkpoint.c:613: "transaction->t_forget == NULL"
------------[ cut here ]------------
kernel BUG at fs/jbd/checkpoint.c:613!
invalid operand: 0000 [#1]
SMP 
CPU:    2
EIP:    0060:[<c01f8404>]    Not tainted VLI
EFLAGS: 00010282
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189) 
EIP is at __journal_drop_transaction+0x128/0x290
eax: 00000071   ebx: d1abf680   ecx: c04ea524   edx: 00000286
esi: f6877400   edi: 00000013   ebp: d1abf680   esp: f5e59dc0
ds: 007b   es: 007b   ss: 0068
Process kjournald (pid: 1086, threadinfo=f5e58000 task=f6801a60)
Stack: c04295a0 c0429764 c0429567 00000265 c0429805 f6877400 f188cf8c   
c01f71c1 
       f6877400 d1abf680 f6877400 f6877414 f6877414 00000000 f68774c0   
f6877454 
       f687743c d1abf6b8 f6877414 f6877414 ed3021b8 f6877478 f5e58000   
00000000 
Call Trace:
 [<c01f71c1>] journal_commit_transaction+0xf09/0xf68
 [<c0163281>] rcu_check_quiescent_state+0x55/0x64
 [<c016328b>] rcu_check_quiescent_state+0x5f/0x64
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c015014b>] find_busiest_group+0xeb/0x2d8
 [<c0150c3b>] scheduler_tick+0x443/0x450
 [<c015c30c>] run_timer_softirq+0x150/0x158
 [<c015882a>] __do_softirq+0x6a/0xd4
 [<c01687b5>] irq_exit+0x2d/0x30
 [<c014b5b2>] smp_apic_timer_interrupt+0xce/0xd4
 [<c01411ec>] apic_timer_interrupt+0x1c/0x30
 [<c015bcc3>] del_timer_sync+0xa3/0xdc
 [<c01f9013>] kjournald+0xd3/0x228
 [<c01f8f40>] kjournald+0x0/0x228
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c01f8f30>] commit_timeout+0x0/0xc
 [<c013e849>] kernel_thread_helper+0x5/0xc
Code: 95 42 c0 83 c4 14 90 83 7b 24 00 74 2a 68 05 98 42 c0 68 65 02 00 
00 68 67 95 42 c0 68 64 97 42 c0 68 a0 95 42 c0 e8 50 c4 f5 ff <0f> 0b  
65 02 67 95 42 c0 83 c4 14 90 83 7b 2c 00 74 2a 68 40 98 


scratchy: 07:17pm# ksymoops -m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189 
/tmp/scratchy.apr30th.or.something.edited 
ksymoops 2.4.5 on i686
2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189.
Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o
/lib/modules/2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189/ 
(default)
     -m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189 
(specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
kernel BUG at fs/jbd/checkpoint.c:613!
invalid operand: 0000 [#1]
CPU:    2
EIP:    0060:[<c01f8404>]    Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189) 
eax: 00000071   ebx: d1abf680   ecx: c04ea524   edx: 00000286
esi: f6877400   edi: 00000013   ebp: d1abf680   esp: f5e59dc0
ds: 007b   es: 007b   ss: 0068
Stack: c04295a0 c0429764 c0429567 00000265 c0429805 f6877400 f188cf8c
c01f71c1 
       f6877400 d1abf680 f6877400 f6877414 f6877414 00000000 f68774c0
f6877454 
       f687743c d1abf6b8 f6877414 f6877414 ed3021b8 f6877478 f5e58000
00000000 
 [<c01f71c1>] journal_commit_transaction+0xf09/0xf68
 [<c0163281>] rcu_check_quiescent_state+0x55/0x64
 [<c016328b>] rcu_check_quiescent_state+0x5f/0x64
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c015014b>] find_busiest_group+0xeb/0x2d8
 [<c0150c3b>] scheduler_tick+0x443/0x450
 [<c015c30c>] run_timer_softirq+0x150/0x158
 [<c015882a>] __do_softirq+0x6a/0xd4
 [<c01687b5>] irq_exit+0x2d/0x30
 [<c014b5b2>] smp_apic_timer_interrupt+0xce/0xd4
 [<c01411ec>] apic_timer_interrupt+0x1c/0x30
 [<c015bcc3>] del_timer_sync+0xa3/0xdc
 [<c01f9013>] kjournald+0xd3/0x228
 [<c01f8f40>] kjournald+0x0/0x228
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c01f8f30>] commit_timeout+0x0/0xc
 [<c013e849>] kernel_thread_helper+0x5/0xc
Code: 95 42 c0 83 c4 14 90 83 7b 24 00 74 2a 68 05 98 42 c0 68 65 02 00
00 68 67 95 42 c0 68 64 97 42 c0 68 a0 95 42 c0 e8 50 c4 f5 ff <0f> 0b
65 02 67 95 42 c0 83 c4 14 90 83 7b 2c 00 74 2a 68 40 98 


>>EIP; c01f8404 <__journal_drop_transaction+128/290>   <=====

>>ebx; d1abf680 <pg0+1147f680/3f9be400>
>>ecx; c04ea524 <log_wait+4/c>
>>esi; f6877400 <pg0+36237400/3f9be400>
>>ebp; d1abf680 <pg0+1147f680/3f9be400>
>>esp; f5e59dc0 <pg0+35819dc0/3f9be400>

Code;  c01f83d9 <__journal_drop_transaction+fd/290>
00000000 <_EIP>:
Code;  c01f83d9 <__journal_drop_transaction+fd/290>
   0:   95                        xchg   %eax,%ebp
Code;  c01f83da <__journal_drop_transaction+fe/290>
   1:   42                        inc    %edx
Code;  c01f83db <__journal_drop_transaction+ff/290>
   2:   c0 83 c4 14 90 83 7b      rolb   $0x7b,0x839014c4(%ebx)
Code;  c01f83e2 <__journal_drop_transaction+106/290>
   9:   24 00                     and    $0x0,%al
Code;  c01f83e4 <__journal_drop_transaction+108/290>
   b:   74 2a                     je     37 <_EIP+0x37> c01f8410
<__journal_drop_transaction+134/290>
Code;  c01f83e6 <__journal_drop_transaction+10a/290>
   d:   68 05 98 42 c0            push   $0xc0429805
Code;  c01f83eb <__journal_drop_transaction+10f/290>
  12:   68 65 02 00 00            push   $0x265
Code;  c01f83f0 <__journal_drop_transaction+114/290>
  17:   68 67 95 42 c0            push   $0xc0429567
Code;  c01f83f5 <__journal_drop_transaction+119/290>
  1c:   68 64 97 42 c0            push   $0xc0429764
Code;  c01f83fa <__journal_drop_transaction+11e/290>
  21:   68 a0 95 42 c0            push   $0xc04295a0
Code;  c01f83ff <__journal_drop_transaction+123/290>
  26:   e8 50 c4 f5 ff            call   fff5c47b <_EIP+0xfff5c47b>
c0154854 <printk+0/14>
Code;  c01f8404 <__journal_drop_transaction+128/290>   <=====
  2b:   0f 0b                     ud2a      <=====
Code;  c01f8406 <__journal_drop_transaction+12a/290>
  2d:   65 02 67 95               add    %gs:0xffffff95(%edi),%ah
Code;  c01f840a <__journal_drop_transaction+12e/290>
  31:   42                        inc    %edx
Code;  c01f840b <__journal_drop_transaction+12f/290>
  32:   c0 83 c4 14 90 83 7b      rolb   $0x7b,0x839014c4(%ebx)
Code;  c01f8412 <__journal_drop_transaction+136/290>
  39:   2c 00                     sub    $0x0,%al
Code;  c01f8414 <__journal_drop_transaction+138/290>
  3b:   74 2a                     je     67 <_EIP+0x67> c01f8440
<__journal_drop_transaction+164/290>
Code;  c01f8416 <__journal_drop_transaction+13a/290>
  3d:   68                        .byte 0x68
Code;  c01f8417 <__journal_drop_transaction+13b/290>
  3e:   40                        inc    %eax
Code;  c01f8418 <__journal_drop_transaction+13c/290>
  3f:   98                        cwtl   


1 error issued.  Results may not be reliable.
scratchy: 07:17pm# 




Second Machine ("Tib")
======================

Unable to handle kernel NULL pointer dereference at virtual address
00000004
^M printing eip:
^Mc01fab35
^M*pgd = c040fa1800000000
^M*pmd = 0000000000000000
^MOops: 0000 [#1]
^MSMP
^MCPU:    2
^MEIP:    0060:[<c01fab35>]    Not tainted VLI
^MEFLAGS: 00010246
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189)
^MEIP is at __journal_remove_journal_head+0x9/0x130
^Meax: 00000000   ebx: 00000000   ecx: f7d4f200   edx: 00000014
^Mesi: f1920320   edi: 00000013   ebp: c0d6f280   esp: f46d5dcc
^Mds: 007b   es: 007b   ss: 0068
^MProcess kjournald (pid: 1091, threadinfo=f46d4000 task=f597ca60)
^MStack: f1920320 da3ee14c c01fac83 f1920320 f1920320 c01f70fb f1920320
f7d4f400
^M       f7d4f414 f7d4f414 00000000 f7d4f4c0 f7d4f454 f7d4f43c c0d6f2b8
f7d4f414
^M       f7d4f414 eba83db8 f7d4f478 f46d4000 00000000 00000ebc d7c53144
00000000
^MCall Trace:
^M [<c01fac83>] journal_remove_journal_head+0x27/0x44
^M [<c01f70fb>] journal_commit_transaction+0xe43/0xf68
^M [<c0199b57>] d_callback+0x27/0x2c
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c01f9013>] kjournald+0xd3/0x228
^M [<c01f8f40>] kjournald+0x0/0x228
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c0165f6c>] autoremove_wake_function+0x0/0x40
^M [<c01f8f30>] commit_timeout+0x0/0xc
^M [<c013e849>] kernel_thread_helper+0x5/0xc
^MCode: 74 06 8b 5a 28 ff 43 04 8b 02 a9 00 00 10 00 75 08 0f 0b 19 02
c0 9b 42 c0 f0 0f ba 32 14 89 d8 5b c3 56 53 8b 74 24 0c 8b 5e 28 <83>
7b 04 00 7d 29 68 e0 a4 42 c0 68 e3 06 00 00 68 cc 9c 42 c0




tib: 07:00pm# ksymoops -m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189/root/oops
ksymoops 2.4.5 on i686
2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189.
Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o
/lib/modules/2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189/
(default)
     -m
/boot/System.map-2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189(specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory  
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Unable to handle kernel NULL pointer dereference at virtual address
00000004
c01fab35
*pgd = c040fa1800000000
Oops: 0000 [#1]
CPU:    2
EIP:    0060:[<c01fab35>]    Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
(2.6.10-grsec+gg3+e+fhs6b+nfs+gr0501+++p4+c4a+gr6b-reslog-v6.189)
eax: 00000000   ebx: 00000000   ecx: f7d4f200   edx: 00000014
esi: f1920320   edi: 00000013   ebp: c0d6f280   esp: f46d5dcc
ds: 007b   es: 007b   ss: 0068
Stack: f1920320 da3ee14c c01fac83 f1920320 f1920320 c01f70fb f1920320
f7d4f400
       f7d4f414 f7d4f414 00000000 f7d4f4c0 f7d4f454 f7d4f43c c0d6f2b8
f7d4f414
       f7d4f414 eba83db8 f7d4f478 f46d4000 00000000 00000ebc d7c53144
00000000
 [<c01fac83>] journal_remove_journal_head+0x27/0x44
 [<c01f70fb>] journal_commit_transaction+0xe43/0xf68
 [<c0199b57>] d_callback+0x27/0x2c
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c01f9013>] kjournald+0xd3/0x228
 [<c01f8f40>] kjournald+0x0/0x228
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c0165f6c>] autoremove_wake_function+0x0/0x40
 [<c01f8f30>] commit_timeout+0x0/0xc
 [<c013e849>] kernel_thread_helper+0x5/0xc
Code: 74 06 8b 5a 28 ff 43 04 8b 02 a9 00 00 10 00 75 08 0f 0b 19 02 c0
9b 42 c0 f0 0f ba 32 14 89 d8 5b c3 56 53 8b 74 24 0c 8b 5e 28 <83> 7b
04 00 7d 29 68 e0 a4 42 c0 68 e3 06 00 00 68 cc 9c 42 c0


>>EIP; c01fab35 <__journal_remove_journal_head+9/130>   <=====

>>ecx; f7d4f200 <pg0+3770f200/3f9be400>
>>esi; f1920320 <pg0+312e0320/3f9be400>
>>ebp; c0d6f280 <pg0+72f280/3f9be400>
>>esp; f46d5dcc <pg0+34095dcc/3f9be400>

Code;  c01fab0a <journal_grab_journal_head+2e/50>
00000000 <_EIP>:
Code;  c01fab0a <journal_grab_journal_head+2e/50>
   0:   74 06                     je     8 <_EIP+0x8> c01fab12
<journal_grab_journal_head+36/50>
Code;  c01fab0c <journal_grab_journal_head+30/50>
   2:   8b 5a 28                  mov    0x28(%edx),%ebx
Code;  c01fab0f <journal_grab_journal_head+33/50>
   5:   ff 43 04                  incl   0x4(%ebx)
Code;  c01fab12 <journal_grab_journal_head+36/50>
   8:   8b 02                     mov    (%edx),%eax
Code;  c01fab14 <journal_grab_journal_head+38/50>
   a:   a9 00 00 10 00            test   $0x100000,%eax
Code;  c01fab19 <journal_grab_journal_head+3d/50>
   f:   75 08                     jne    19 <_EIP+0x19> c01fab23
<journal_grab_journal_head+47/50>    
Code;  c01fab1b <journal_grab_journal_head+3f/50>
  11:   0f 0b                     ud2a
Code;  c01fab1d <journal_grab_journal_head+41/50>
  13:   19 02                     sbb    %eax,(%edx)
Code;  c01fab1f <journal_grab_journal_head+43/50>
  15:   c0 9b 42 c0 f0 0f ba      rcrb   $0xba,0xff0c042(%ebx)
Code;  c01fab26 <journal_grab_journal_head+4a/50>
  1c:   32 14 89                  xor    (%ecx,%ecx,4),%dl
Code;  c01fab29 <journal_grab_journal_head+4d/50>
  1f:   d8 5b c3                  fcomps 0xffffffc3(%ebx)
Code;  c01fab2c <__journal_remove_journal_head+0/130>
  22:   56                        push   %esi
Code;  c01fab2d <__journal_remove_journal_head+1/130>
  23:   53                        push   %ebx
Code;  c01fab2e <__journal_remove_journal_head+2/130>
  24:   8b 74 24 0c               mov    0xc(%esp,1),%esi
Code;  c01fab32 <__journal_remove_journal_head+6/130>
  28:   8b 5e 28                  mov    0x28(%esi),%ebx
Code;  c01fab35 <__journal_remove_journal_head+9/130>   <=====
  2b:   83 7b 04 00               cmpl   $0x0,0x4(%ebx)   <=====
Code;  c01fab39 <__journal_remove_journal_head+d/130>
  2f:   7d 29                     jge    5a <_EIP+0x5a> c01fab64
<__journal_remove_journal_head+38/130>
Code;  c01fab3b <__journal_remove_journal_head+f/130>
  31:   68 e0 a4 42 c0            push   $0xc042a4e0
Code;  c01fab40 <__journal_remove_journal_head+14/130>
  36:   68 e3 06 00 00            push   $0x6e3
Code;  c01fab45 <__journal_remove_journal_head+19/130>
  3b:   68 cc 9c 42 c0            push   $0xc0429ccc












On Mon, May 02, 2005 at 06:17:35PM -0700, Jason Sydes wrote:
> Mike Fedyk <mfedyk at matchmail.com> writes:
> 
> > Nicolas Kowalski wrote:
> >> Mike Fedyk <mfedyk at matchmail.com> writes:
> >>
> >>>Nicolas Kowalski wrote:
> >>>
> >>>>I will try to reproduce these errors on a non-production server now.
> >>>
> >>>Beautiful.
> >>>
> >>>It might be good if you put a stack_dump() call just after the
> >>>printk() call in the ext3 source.
> >> I apologize, (I am not familiar with kernel debugging), but when
> >> compiling the kernel with this call inserted after the printk in the
> >> sources, it fails with an resolved symbol error. ...
> >> fs/fs.o: In function `__jbd_unexpected_dirty_buffer':
> >> fs/fs.o(.text+0x3ab8a): undefined reference to `stack_dump'
> >> ...
> >> I must be missing an option, but which one ?
> >
> > Oh crap.  It's called dump_stack().
> 
> Ok. I had another similar error this morning:
> 
> Unexpected dirty buffer encountered at do_get_write_access:618 (08:11
> blocknr 920701)
> dba1fddc dba1fe04 c017565e c03054a0 c0305483 c030373b 0000026a c03fc5e0 
>        000e0c7d d1072580 dba1fe4c c016f76b c030373b 0000026a d34f1d80
> d1072580 
>        df4c1e94 d34f1d80 c01701dd 00000000 00000000 00000003 df4c1e00
> d3615430 
> Call Trace:    [<c017565e>] [<c016f76b>] [<c01701dd>] [<c016fc10>]
> [<c0167c88>]
>   [<c0167c4e>] [<c0167e21>] [<c0167c74>] [<c012f67a>] [<c012fb14>]
> [<c01657e2>]
>   [<c013c807>] [<c0108be3>]
> 
> 
> ksymoops gives me:
> 
> Trace; c017565e <__jbd_unexpected_dirty_buffer+3a/74>
> Trace; c016f76b <do_get_write_access+10f/570>
> Trace; c01701dd <journal_dirty_metadata+15d/188>
> Trace; c016fc10 <journal_get_write_access+44/68>
> Trace; c0167c88 <do_journal_get_write_access+14/58>
> Trace; c0167c4e <walk_page_buffers+52/78>
> Trace; c0167e21 <ext3_prepare_write+155/208>
> Trace; c0167c74 <do_journal_get_write_access+0/58>
> Trace; c012f67a <do_generic_file_write+226/408>
> Trace; c012fb14 <generic_file_write+f4/10c>
> Trace; c01657e2 <ext3_file_write+1e/9c>
> Trace; c013c807 <sys_write+93/108>
> Trace; c0108be3 <system_call+33/38>
> 
> 
> Does this help ?
> 
> -- 
> Nicolas
> 
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
> 



From evilninja at gmx.net  Sun May  8 01:20:33 2005
From: evilninja at gmx.net (Christian)
Date: Sun, 08 May 2005 03:20:33 +0200
Subject: 2.6.12-rc3-mm2 benchmarks
Message-ID: <427D6961.6080000@gmx.net>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

[!! i've Cc'ed several fs lists, please remove when when replying !!]

hi all,

from time to time i do some benchmarks for several filesystems and several
crypto-algorithms too, details here:

http://nerdbynature.de/bench/

latest results here:

http://nerdbynature.de/bench/prinz/2.6.12-rc3-mm2/bonnie.html
http://nerdbynature.de/bench/prinz/2.6.12-rc3-mm2/tiobench.txt

Christian.
- --
BOFH excuse #173:

Recursive traversal of loopback mount points
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCfWlhC/PVm5+NVoYRAmCBAJ9D+UrpvNJ+AoJijJwCN3DVs1Da/QCgkMoC
Ea5VVCQ1Q2XrJNahJQoif1c=
=m8tN
-----END PGP SIGNATURE-----



From hans.yperman at gmail.com  Thu May 12 22:35:16 2005
From: hans.yperman at gmail.com (Hans Yperman)
Date: Fri, 13 May 2005 00:35:16 +0200
Subject: Smashing EXT3 for fun and profit (or: how to loose all your data)
Message-ID: <ad981dd70505121535583720c8@mail.gmail.com>

Hello everyone,

I've just lost my whole EXT3 linux partition by what was probably a
bug.  For your reading pleasure, and in the hope there is enough
information to fix this problem in the future, here the story of a
violent ending:

This tragic history starts actually on windows: MS Word had wiped out
an important file on a floppy, and I got the task of retrieving what
was possible.  Using Linux, I made an image with dd,and put it on the
now extinct EXT3 partition. I used an undelete programma ,  and then
mounted the image with a loopback device:
mount -o loop /tmp/image.img /floppy
  As it turns out,the undeleter managed to screw up the FAT, and the
loopback device complains about reading past the end of the device. 
After fixing the floppy on another computer, I come back to the linux
computer.  The console is full of error messages.

What happened?  A first bug: Linux remounted the loopback-device
read-only because  of the bad FAT on the image.  BUT this did not work
out right: not only the loopback device, but the whole EXT3-partition
were now read-only. Every little write action results in an error,
hence all the messages.  I did not really think much of it at that
point, and just did a
mount -o remount,rw /

At this point, I am already screwed, but I don't realize it yet:  The
computer works completely normal from here on.  The problem happens
the next time I boot: fsck complains about problems (weird, fsck is
not supposed to run for EXT3).  Specifically, fsck complains about
double-allocated blocks, does a pass 1B and 1C (I'd never seen these
before either), dumps pages and pages and pages of block numbers,
get's very very veeeeryyy slow, and crashes.  I restart fsck.  This
time it starts asking me tons of yes/no questions because it wants to
know what to do with the double-allocated block.  I yes them all
(There is no real right answer anyhow) and reboot.

And that was it:  init starts, and complains about not having an
/etc/inittab (and asks me which runlevel to start.  Never seen that
before either). Then it crashes.   Booting with knoppix reveals lots
and lost of damaged files.  Everything that was cached seems to be
damaged, and some random files are also dead (my gues is ext3 screwed
up while updating atimes or something like that).  Game over.

I guess these 2 facts need fixing:
1) loopback devices should not pass errors over to their underlying filesystems.
2) ext3 suicidally allows remounting read-write when parts of its data
are invalid.

Now I don't complain much.  I have a 1 day old backup of my home
directory (thanks, unison). I lost all my tweaks to /etc, but, well,
the hard drive image was copied/resized from computer to computer to
computer, and initially started its life under linux 2.0.35 on a
pentium 133Mhz.  A rewrite was probably a good idea anyway.  I lost
all my MP3's, but a very nice girl promised me to help me re-rip them
all from my CD's. (Thanks to ext3 I get to spend some time with a very
sexy girl.  Lots of it by talking and laughing while we wait for lame
to end.  I actually start to think my hard drive should get erased
more often ;-)  ).

Other people might not like loosing a whole partition, so I mail this
sad story to you all.  A bit of advice: if you ever see ext3
complaining about being read-only, press the reset button.  It might
save your partition.

I did not test my claim of the loopback being the bug, as I am busy
reinstalling right now (on EXT2 this time).

Have a nice day, everyone,

Hans.



From theman at josephdwagner.info  Fri May 13 19:55:55 2005
From: theman at josephdwagner.info (Joseph D. Wagner)
Date: Fri, 13 May 2005 14:55:55 -0500
Subject: Smashing EXT3 for fun and profit (or: how to loose all your
	data)
In-Reply-To: <ad981dd70505121535583720c8@mail.gmail.com>
Message-ID: <200505131955.j4DJtWaB003721@josephdwagner.info>

> I guess these 2 facts need fixing:
> 1) loopback devices should not pass errors over
> to their underlying filesystems.

I have a test partition setup for these circumstances.  I'll try to reproduce the read-write/read-only error spreading to an underlying file system when the loopback file system has the error.  However, I will have to double check with the file system designers.  There may be a good reason it behaves this way.

> 2) ext3 suicidally allows remounting read-write
> when parts of its data are invalid.

When you are logged in as root, it will let you whatever suicidal -- or imho stupid -- things you tell it to do.  That is not going to change.

It actually takes something serious to bring down a file system mid-stride, not just an atime update.  In other words, by the time Linux is remounting your file system as read-only, something is already fubar.  The remount as read-only is really only a stop-gap measure to prevent further damage while you save your work -- on other partitions -- and reboot.

If all you have is one honkin' / (root) partition, you may just want to change that behavior to panic.  After all, if you only have 1 partition, there's no where else to save your work.

So long as you're redoing your partitions, be sure to separate out /tmp, /var, and just to be safe /home too, so next time all you lose is the one bad partition.

Joseph D. Wagner




From tytso at mit.edu  Sat May 14 02:28:03 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Fri, 13 May 2005 22:28:03 -0400
Subject: Smashing EXT3 for fun and profit (or: how to loose all your
	data)
In-Reply-To: <ad981dd70505121535583720c8@mail.gmail.com>
References: <ad981dd70505121535583720c8@mail.gmail.com>
Message-ID: <20050514022803.GA26057@thunk.org>

On Fri, May 13, 2005 at 12:35:16AM +0200, Hans Yperman wrote:
> This tragic history starts actually on windows: MS Word had wiped out
> an important file on a floppy, and I got the task of retrieving what
> was possible.  Using Linux, I made an image with dd,and put it on the
> now extinct EXT3 partition. I used an undelete programma ,  and then
> mounted the image with a loopback device:
> mount -o loop /tmp/image.img /floppy
>   As it turns out,the undeleter managed to screw up the FAT, and the
> loopback device complains about reading past the end of the device. 
> After fixing the floppy on another computer, I come back to the linux
> computer.  The console is full of error messages.

What version of the kernel are you using?  What undelete program were
you using?  Most undelete programs don't require that you mount the
filesystem; in fact, they often require that you *don't* mount them.

> What happened?  A first bug: Linux remounted the loopback-device
> read-only because  of the bad FAT on the image.  BUT this did not work
> out right: not only the loopback device, but the whole EXT3-partition
> were now read-only. Every little write action results in an error,
> hence all the messages.  I did not really think much of it at that
> point, and just did a
> mount -o remount,rw /

Without the logs, it sounds like the ext3 filesystem got corrupted,
and so it was mounted remounted read-only.  How this happened is not
clear, and you didn't give us enough information to determine that;
but it's consistent with e2fsck displaying errors.

> At this point, I am already screwed, but I don't realize it yet:  The
> computer works completely normal from here on.  The problem happens
> the next time I boot: fsck complains about problems (weird, fsck is
> not supposed to run for EXT3).  

When the kernel discovered a filesystem corruption, it marks the
filesystem as containing errors, and remounts it read-only.  When fsck
will run, it will note the fact that filesystem has problems, and try
to fix it.

> Specifically, fsck complains about
> double-allocated blocks, does a pass 1B and 1C (I'd never seen these
> before either), dumps pages and pages and pages of block numbers,
> get's very very veeeeryyy slow, and crashes.  I restart fsck.  This
> time it starts asking me tons of yes/no questions because it wants to
> know what to do with the double-allocated block.  I yes them all
> (There is no real right answer anyhow) and reboot.

What version of e2fsck are you running?  It must be an ancient one if
got really slow like that.  You wouldn't be running Debian
Obsolete^H^H^H^H^H^H^H Stable, are you?

> And that was it:  init starts, and complains about not having an
> /etc/inittab (and asks me which runlevel to start.  Never seen that
> before either). Then it crashes.   Booting with knoppix reveals lots
> and lost of damaged files.  Everything that was cached seems to be
> damaged, and some random files are also dead (my gues is ext3 screwed
> up while updating atimes or something like that).  Game over.

The filesystem was probably screwed up much earlier than that.
Probably something with the undelete program was run, or perhaps
because you remounted the filesystem read-write after errors were
uncovered, but it's going to be hard to reconstruct without a lot more
details.  (What specific messages were printed by the kernel
describing the errors, exactly what version of the kernel, e2fsprogs,
and undelete program you were using, etc.)

I will say that while remounting a filesystem read/write after errors
is dangerous, the fact that e2fsck displayed pages and pages of block
numbers tends to indicate that that there was something more that went
wrong.  Merely remounting a filesystem read/write might result in a
some multiply claimed blocks, which pass 1b/1c/1d are designed to
resolve, but how many you have depends on how many files are written
and how badly corrupted were the block allocation bitmaps.  

Assuming that you didn't run the system for very long before you
rebooted, or didn't write a lot of files during this interim, it seems
somewhat unlikely that it would have resulted in "pages and pages and
pages" of block numbers.  That would tend to argue that portions of
the inode table got written to the wrong location, which is generally
caused by a hardware error.  It might have been caused by the undelete
program, but that seems hard to believe.  But then again, I don't know
which undelete program you used, and it does seem very surprising that
the undelete program would work with a mounted filesystem, so that
part sounds like another user error (but not one that would be
expected to cause major filesystem corruption).  So the bottom line is
I can't really tell you what could have happened with the limited
facts that you've given me.

> I guess these 2 facts need fixing:
> 1) loopback devices should not pass errors over to their underlying filesystems.

Loopback devices don't pass errors back over to their underlying
filesystems.  

> 2) ext3 suicidally allows remounting read-write when parts of its data
> are invalid.

Linux will allow you to do many things that might be, well,
ill-advised.  When the kernel printed all of the warnings, it warned
you that the filesystem had errors.  Remounting it read/write was a
really bad idea --- but then again, so is running the command
"dd if=/dev/zero of=/dev/hda1" as root.


> Other people might not like loosing a whole partition, so I mail this
> sad story to you all.  A bit of advice: if you ever see ext3
> complaining about being read-only, press the reset button.  It might
> save your partition.

Or run e2fsck manually yourself; there are a number of things that you
can do.  Blindly remounting the filesystem read/write is certainly not
one of them.  Saving all of the error messages from the kernel
describing the filesystem corruption is a really good idea.  As is
saving the messages from e2fsck, so people can figure out what
happened after the fact.

The one good thing is that you kept good backups, so you didn't lose
that much; I definitely commend that.  :-)

						- Ted



From dclunie at dclunie.com  Sun May 15 13:56:53 2005
From: dclunie at dclunie.com (David Clunie)
Date: Sun, 15 May 2005 09:56:53 -0400
Subject: Intermittent ext3 corruption on external firewire Micronet 1.5Tb
 RAID on FC3
Message-ID: <42875525.8010202@dclunie.com>

Hi

I have a Firewire connected Micronet 1.5TB RAID with a single
large ext3 filesystem on one partition on a dual Xeon system.

I am checking out from an extremely large cvs repository
(don't ask) to this drive over the course of many days, and
intermittently I get bad blocks and the filesystem goes
read-only. This is not related to any power failure or
anything similar. The RAID is currently about 40% full;
this started to happen around the 15% mark as I recall.

I checked the RAID firmware setup, found that caching was
set to write-back, and changed it to write-through to
see if that would help (since I gather the Linux kernel
presumes write-through, though why it should make a
difference in the absence of a reboot or power failure
I don't understand).

This reduced the frequency of the error from once a night
to once every couple of nights; interestingly mostly at
about 04:03 AM or so. Looking at cron.daily, only mrtg
and sa seem to be starting up at about that time.

I suspect the timing is related to a change in the pattern
of disk activity rather than anything else.

I have no reason to suspect that there is anything actually
wrong with the RAID itself, which just appears as a really
big firewire external disk. It is new however, so this
can't be ruled out.

My next step is to just turn off journaling and see if
doing this with just ext2 works OK. Journaling doesn't
seem to be doing much good as I am stuck regularly running
ordinary fsck's with all these errors anyway !

I just thought I would ask if anyone else has had a similar
experience, and whether such issues are known to be with ext3,
or the firewire interface, or both together.

PS. I did actually create the partition and did the mkfs on
an AMD64 FC3 system at a different site, though that is not the
system to which the RAID is currently connected. Just mention
that in case this makes a difference, but I presume an fsck
would have noticed and fixed anything fundamentally wrong in
this regard.

David

May 15 04:03:30 localhost kernel: Aborting journal on device sdd1.
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1): ext3_journal_start_sb: Detected aborted journal
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get: inode 63343526: bad block 165510584
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1) in start_transaction: Journal has aborted
May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1) in start_transaction: Journal has aborted
May 15 04:03:30 localhost kernel: inode_doinit_with_dentry:  getxattr returned 5 for dev=sdd1 ino=63343526
May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get: inode 63343381: bad block 141623810
May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1): ext3_xattr_get: inode 63947123: bad block 203323361

Linux localhost.localdomain 2.6.9-1.667smp #1 SMP Tue Nov 2 14:59:52 EST 2004 i686 i686 i386 GNU/Linux



From theman at josephdwagner.info  Sun May 15 22:48:44 2005
From: theman at josephdwagner.info (Joseph D. Wagner)
Date: Sun, 15 May 2005 17:48:44 -0500
Subject: Intermittent ext3 corruption on external firewire Micronet
	1.5Tb RAID on FC3
In-Reply-To: <42875525.8010202@dclunie.com>
Message-ID: <200505152248.j4FMmI7b031303@josephdwagner.info>

> May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63343526: bad block 165510584
> May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63343381: bad block 141623810
> May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> ext3_xattr_get: inode 63947123: bad block 203323361

These errors cannot be caused by a bug in the file system.  It is possible, although highly unlikely, that a bug in the device driver could generate these errors.

The most likely cause is that there actually are bad blocks on your new 1.5TB file system.

Do us all a favor and run:

Badblocks -v -b block_size /dev/device

And let us know about the results.

Joseph D. Wagner




From anandtiwari at softhome.net  Mon May 16 23:39:00 2005
From: anandtiwari at softhome.net (anandtiwari at softhome.net)
Date: Mon, 16 May 2005 17:39:00 -0600
Subject: Ext3 journal corruption
In-Reply-To: <42875525.8010202@dclunie.com> 
References: <42875525.8010202@dclunie.com>
Message-ID: <courier.42892F14.00007E5F@softhome.net>

Hi all, 

I was having a ext3 filesystem with writeback. yesterday my system crashed 
and now when i try to mount it, it gives me "Invalid argument". Following is 
the command line
#mount -t ext3 /dev/hda1 /mnt/home 

i tried debugging it and later i found out, its was complaining about 
journaling inode. Is there any way to recover my files, i did clone the disk 
and mounted it as ext2 after few tries but there was nothing in it.
any help or pointers will be appreciated, 

Thanks
anand



From tytso at mit.edu  Tue May 17 01:08:26 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Mon, 16 May 2005 21:08:26 -0400
Subject: Ext3 journal corruption
In-Reply-To: <courier.42892F14.00007E5F@softhome.net>
References: <42875525.8010202@dclunie.com>
	<courier.42892F14.00007E5F@softhome.net>
Message-ID: <20050517010826.GC11282@thunk.org>

On Mon, May 16, 2005 at 05:39:00PM -0600, anandtiwari at softhome.net wrote:
> Hi all, 
> 
> I was having a ext3 filesystem with writeback. yesterday my system crashed 
> and now when i try to mount it, it gives me "Invalid argument". Following 
> is the command line
> #mount -t ext3 /dev/hda1 /mnt/home 
> 
> i tried debugging it and later i found out, its was complaining about 
> journaling inode. Is there any way to recover my files, i did clone the 
> disk and mounted it as ext2 after few tries but there was nothing in it.
> any help or pointers will be appreciated, 

1)  Run e2fsck to correct any filesystem errors.  This may remove the journal inode.

2)  If it didn't, to be safe, remove the journal: 
	"tune2fs -O ^has_journal /dev/hdXX" 

3)  Then recreate the journal:  "tune2fs -j /dev/hdXX"

					 Ted



From anandtiwari at softhome.net  Tue May 17 02:05:16 2005
From: anandtiwari at softhome.net (Anand Tiwari)
Date: Mon, 16 May 2005 20:05:16 -0600
Subject: Ext3 journal corruption
References: <42875525.8010202@dclunie.com>
	<courier.42892F14.00007E5F@softhome.net>
	<20050517010826.GC11282@thunk.org>
Message-ID: <001e01c55a84$dff2ee70$fa00a8c0@darkstar>

ok, but just curious, if it is not cleanly umounted, mount shouldnt be able
to mount it as ext2fs.

----- Original Message ----- 
From: "Theodore Ts'o" <tytso at mit.edu>
To: <anandtiwari at softhome.net>
Cc: <ext3-users at redhat.com>
Sent: Monday, May 16, 2005 7:08 PM
Subject: Re: Ext3 journal corruption


> On Mon, May 16, 2005 at 05:39:00PM -0600, anandtiwari at softhome.net wrote:
> > Hi all,
> >
> > I was having a ext3 filesystem with writeback. yesterday my system
crashed
> > and now when i try to mount it, it gives me "Invalid argument".
Following
> > is the command line
> > #mount -t ext3 /dev/hda1 /mnt/home
> >
> > i tried debugging it and later i found out, its was complaining about
> > journaling inode. Is there any way to recover my files, i did clone the
> > disk and mounted it as ext2 after few tries but there was nothing in it.
> > any help or pointers will be appreciated,
>
> 1)  Run e2fsck to correct any filesystem errors.  This may remove the
journal inode.
>
> 2)  If it didn't, to be safe, remove the journal:
> "tune2fs -O ^has_journal /dev/hdXX"
>
> 3)  Then recreate the journal:  "tune2fs -j /dev/hdXX"
>
> Ted



From adilger at clusterfs.com  Tue May 17 06:04:44 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 17 May 2005 00:04:44 -0600
Subject: Intermittent ext3 corruption on external firewire Micronet
	1.5Tb RAID on FC3
In-Reply-To: <42875525.8010202@dclunie.com>
References: <42875525.8010202@dclunie.com>
Message-ID: <20050517060444.GJ1499@schnapps.adilger.int>

On May 15, 2005  09:56 -0400, David Clunie wrote:
> I have a Firewire connected Micronet 1.5TB RAID with a single
> large ext3 filesystem on one partition on a dual Xeon system.

For some kernels (maybe even current ones) it is possible that
there is a problem with IO beyond 1 TB.

What I would do (if you don't mind overwriting the disk, presumably
not if it is just new and doesn't contain important data) is to
write a small test program to write the byte offset at the start of
every 4kB block on the disk, then read them all back and verify it
is correct.

This will tell you if there is aliasing in the block device (possibly
e.g. an int used instead of __u32 or sector_t).
 
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From theman at josephdwagner.info  Tue May 17 07:42:15 2005
From: theman at josephdwagner.info (Joseph D. Wagner)
Date: Tue, 17 May 2005 02:42:15 -0500
Subject: Intermittent ext3 corruption on external firewire Micronet1.5Tb
	RAID on FC3
In-Reply-To: <20050517060444.GJ1499@schnapps.adilger.int>
Message-ID: <200505170741.j4H7fnl2031520@josephdwagner.info>

> What I would do (if you don't mind overwriting the disk, presumably
> not if it is just new and doesn't contain important data) is to
> write a small test program to write the byte offset at the start of
> every 4kB block on the disk, then read them all back and verify it
> is correct.

That's what badblocks is for when doing a destructive write test.

Joseph D. Wagner



From adilger at clusterfs.com  Tue May 17 08:44:37 2005
From: adilger at clusterfs.com ('Andreas Dilger')
Date: Tue, 17 May 2005 02:44:37 -0600
Subject: Intermittent ext3 corruption on external firewire Micronet1.5Tb
	RAID on FC3
In-Reply-To: <200505170741.j4H7fnl2031520@josephdwagner.info>
References: <20050517060444.GJ1499@schnapps.adilger.int>
	<200505170741.j4H7fnl2031520@josephdwagner.info>
Message-ID: <20050517084437.GN1499@schnapps.adilger.int>

On May 17, 2005  02:42 -0500, Joseph D. Wagner wrote:
> > What I would do (if you don't mind overwriting the disk, presumably
> > not if it is just new and doesn't contain important data) is to
> > write a small test program to write the byte offset at the start of
> > every 4kB block on the disk, then read them all back and verify it
> > is correct.
> 
> That's what badblocks is for when doing a destructive write test.

Looking at the badblocks man page, I don't think this is true (though
I could be wrong).  If badblocks is only writing out a repetetive
pattern, and only verifying in 64-block chunks this will not detect
device block address aliasing because (a) the pattern doesn't depend
on the offset so will verify correctl, and (b) 64 blocks is likely
aligned to the same offset as where the device would concievably wrap.

Having this feature as part of badblocks (e.g. add "-t offset" pattern)
is probably a great place to do this because it is widely available
and already has most of the framework for this.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From cchan at outblaze.com  Thu May 19 11:05:34 2005
From: cchan at outblaze.com (Christopher Chan)
Date: Thu, 19 May 2005 19:05:34 +0800
Subject: ext3 journal problems
Message-ID: <428C72FE.3080506@outblaze.com>

This  caused a crash on a 2.6.10-1.12_FC2smp kernel

May 19 09:56:35 spf1 kernel: Assertion failure in log_do_checkpoint() at 
fs/jbd/checkpoint.c:361: "drop_count != 0 || cleanup_ret != 0"
May 19 09:56:35 spf1 kernel: ------------[ cut here ]------------
May 19 09:56:37 spf1 kernel: kernel BUG at fs/jbd/checkpoint.c:361!
May 19 09:56:37 spf1 kernel: invalid operand: 0000 [#1]
May 19 09:56:37 spf1 kernel: SMP
May 19 09:56:37 spf1 kernel: Modules linked in: md5 ipv6 autofs4 e100 
mii ipt_REJECT iptable_filter ip_tables microcode dm_mod ohci_hcd ext3 
jbd raid1 raid0
May 19 09:56:37 spf1 kernel: CPU:    0
May 19 09:56:37 spf1 kernel: EIP:    0060:[<f8839cd9>]    Not tainted VLI
May 19 09:56:37 spf1 kernel: EFLAGS: 00010202   (2.6.10-1.12_FC2smp)
May 19 09:56:37 spf1 kernel: EIP is at log_do_checkpoint+0x106/0x146 [jbd]
May 19 09:56:37 spf1 kernel: eax: 0000006e   ebx: eaadccbc   ecx: 
e453ab90   edx: f883d756
May 19 09:56:37 spf1 kernel: esi: f6a11a00   edi: 00000000   ebp: 
c091d5e0   esp: e453ab8c
May 19 09:56:37 spf1 kernel: ds: 007b   es: 007b   ss: 0068
May 19 09:56:37 spf1 kernel: Process cleanup (pid: 29696, 
threadinfo=e453a000 task=e907b060)
May 19 09:56:37 spf1 kernel: Stack: f883d756 f883c91d f883d742 00000169 
f883d811 034aa511 dd06f92c eaadccbc
May 19 09:56:38 spf1 kernel:        00000000 00000000 c628f3bc f5be8764 
c0154c62 00001000 f6a11c00 f14c3498
May 19 09:56:38 spf1 kernel:        f5d8c360 00000001 f14c3498 f5c87480 
f5d8c360 f5d8c290 f14c3498 f8870c79
May 19 09:56:38 spf1 kernel: Call Trace:
May 19 09:56:38 spf1 kernel:  [<c0154c62>] __getblk+0x24/0x42
May 19 09:56:38 spf1 kernel:  [<f8870c79>] 
ext3_do_update_inode+0x2fb/0x322 [ext3]
May 19 09:56:38 spf1 kernel:  [<f8836d2d>] 
journal_get_write_access+0x25/0x2c [jbd]
May 19 09:56:38 spf1 kernel:  [<f8870f28>] 
ext3_mark_iloc_dirty+0x10/0x18 [ext3]
May 19 09:56:38 spf1 kernel:  [<f8870fe4>] 
ext3_mark_inode_dirty+0x33/0x3a [ext3]
May 19 09:56:38 spf1 kernel:  [<f886e658>] ext3_splice_branch+0xeb/0x18c 
[ext3]
May 19 09:56:38 spf1 kernel:  [<f8836cec>] 
do_get_write_access+0x54f/0x56b [jbd]
May 19 09:56:38 spf1 kernel:  [<c0154c35>] __find_get_block+0xb5/0xbe
May 19 09:56:38 spf1 kernel:  [<c0125009>] __mod_timer+0xf1/0xfb
May 19 09:56:38 spf1 kernel:  [<f8839899>] 
__log_wait_for_space+0xa4/0xc7 [jbd]
May 19 09:56:38 spf1 kernel:  [<f8836383>] start_this_handle+0x2f8/0x33e 
[jbd]
May 19 09:56:38 spf1 kernel:  [<c011a545>] __wake_up+0x29/0x3c
May 19 09:56:38 spf1 kernel:  [<f8836478>] journal_start+0x78/0x9e [jbd]
May 19 09:56:38 spf1 kernel:  [<f886edea>] ext3_prepare_write+0x32/0xf4 
[ext3]
May 19 09:56:38 spf1 kernel:  [<c013a424>] 
generic_file_buffered_write+0x1a3/0x499
May 19 09:56:38 spf1 kernel:  [<c01685d0>] inode_update_time+0x6e/0x96
May 19 09:56:38 spf1 kernel:  [<c013aaa8>] 
__generic_file_aio_write_nolock+0x38e/0x3bc
May 19 09:56:38 spf1 kernel:  [<c013ab0f>] 
generic_file_aio_write_nolock+0x39/0x7f
May 19 09:56:38 spf1 kernel:  [<c013acf5>] generic_file_aio_write+0x6e/0xbe
May 19 09:56:38 spf1 kernel:  [<f886cdab>] ext3_file_write+0x19/0x8a [ext3]
May 19 09:56:38 spf1 kernel:  [<c0152bea>] do_sync_write+0x97/0xc9
May 19 09:56:38 spf1 kernel:  [<c01622e8>] poll_freewait+0x33/0x3a
May 19 09:56:38 spf1 kernel:  [<c012ec8e>] autoremove_wake_function+0x0/0x2d
May 19 09:56:38 spf1 kernel:  [<c011a49f>] scheduler_tick+0x3b3/0x3c9
May 19 09:56:38 spf1 kernel:  [<c0152cd4>] vfs_write+0xb8/0xe4
May 19 09:56:38 spf1 kernel:  [<c0152d9e>] sys_write+0x3c/0x62
May 19 09:56:38 spf1 kernel:  [<c0103ccb>] syscall_call+0x7/0xb
May 19 09:56:38 spf1 kernel: Code: ff ff 83 7c 24 10 00 75 2d 85 c0 75 
29 68 11 d8 83 f8 68 69 01 00 00 68 42 d7 83 f8 68 1d c9 83 f8 68 56 d7 
83 f8 e8 63 45 8e c7
 <0f> 0b 69 01 42 d7 83 f8 83 c4 14 39 6e 40 75 0a 83 7e 40 00 0f



From theman at josephdwagner.info  Thu May 19 17:19:11 2005
From: theman at josephdwagner.info (Joseph D. Wagner)
Date: Thu, 19 May 2005 12:19:11 -0500
Subject: ext3 journal problems
In-Reply-To: <428C72FE.3080506@outblaze.com>
Message-ID: <200505191718.j4JHIhmh016701@josephdwagner.info>

> May 19 09:56:37 spf1 kernel: kernel BUG at fs/jbd/checkpoint.c:361!

fs/jbd is not ext3.

Please direct this to the jbd people.

Joseph D. Wagner



From adilger at clusterfs.com  Thu May 19 17:36:21 2005
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 19 May 2005 11:36:21 -0600
Subject: ext3 journal problems
In-Reply-To: <200505191718.j4JHIhmh016701@josephdwagner.info>
References: <428C72FE.3080506@outblaze.com>
	<200505191718.j4JHIhmh016701@josephdwagner.info>
Message-ID: <20050519173621.GG1499@schnapps.adilger.int>

On May 19, 2005  12:19 -0500, Joseph D. Wagner wrote:
> > May 19 09:56:37 spf1 kernel: kernel BUG at fs/jbd/checkpoint.c:361!
> 
> fs/jbd is not ext3.
> 
> Please direct this to the jbd people.

??? Maybe you are thinking of "jfs", but jbd is developed by Stephen
explicitly for ext3.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From mvolaski at aecom.yu.edu  Thu May 19 17:40:34 2005
From: mvolaski at aecom.yu.edu (Maurice Volaski)
Date: Thu, 19 May 2005 13:40:34 -0400
Subject: mke2fs options for very large filesystems
In-Reply-To: <20050208170005.3F34E72E1E@hormel.redhat.com>
References: <20050208170005.3F34E72E1E@hormel.redhat.com>
Message-ID: <a06210201beb277a42ac3@[129.98.90.227]>

>Yes, if you are creating larger files.  By default e2fsck assumes the average
>file size is 8kB and allocates a corresponding number of inodes there.  If,
>for example, you are storing lots of larger files there (digital photos, MP3s,
>etc) that are in the MB range you can use "-t largefile" or "-t largefile4"
>to specify an average file size of 1MB or 4MB respectively.  You can also
>use -i or -N (see man page) to override the default bytes-per-inode value.

Wouldn't -T largefile already be making choices about the default 
bytes-per-inode?

How could I make my own determination about what values are most 
appropriate for -i and -N? My filesystems are generally several 
hundreds of gigabytes, filled with files that average about one 
megabyte in size.
-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University



From mvolaski at aecom.yu.edu  Thu May 19 17:49:28 2005
From: mvolaski at aecom.yu.edu (Maurice Volaski)
Date: Thu, 19 May 2005 13:49:28 -0400
Subject: [Q] Where does all the space go?
Message-ID: <a06210203beb27fdd1806@[129.98.90.227]>

I created a filesystem as follows:

mke2fs -j -O dir_index -O sparse_super -T largefile /dev/drbd/6

Here's the the output from df

Filesystem            Size  Used Avail Use%
/dev/drbd/6           475G   33M  452G   1%

It seems that ext3 has taken 23 GB, which is about 5% of the total 
disk size, for itself. Is that right?

If that is, indeed, the case, why does df just list 33M as being used?
-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University



From menscher at uiuc.edu  Thu May 19 17:55:04 2005
From: menscher at uiuc.edu (Damian Menscher)
Date: Thu, 19 May 2005 12:55:04 -0500 (CDT)
Subject: [Q] Where does all the space go?
In-Reply-To: <a06210203beb27fdd1806@[129.98.90.227]>
References: <a06210203beb27fdd1806@[129.98.90.227]>
Message-ID: <Pine.LNX.4.62.0505191253100.1637@lx2.physics.uiuc.edu>

On Thu, 19 May 2005, Maurice Volaski wrote:

> mke2fs -j -O dir_index -O sparse_super -T largefile /dev/drbd/6
>
> Filesystem            Size  Used Avail Use%
> /dev/drbd/6           475G   33M  452G   1%
>
> It seems that ext3 has taken 23 GB, which is about 5% of the total disk size, 
> for itself. Is that right?

It's not reserved for the filesystem, but rather for root.  Read about 
the -m option in the manpage to adjust that 5%.

> If that is, indeed, the case, why does df just list 33M as being used?

I think the 33M is the space used by the journal, or the filesystem 
itself.

Damian Menscher
-- 
-=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=-
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-



From kwijibo at zianet.com  Thu May 19 17:55:02 2005
From: kwijibo at zianet.com (kwijibo at zianet.com)
Date: Thu, 19 May 2005 11:55:02 -0600
Subject: [Q] Where does all the space go?
In-Reply-To: <a06210203beb27fdd1806@[129.98.90.227]>
References: <a06210203beb27fdd1806@[129.98.90.227]>
Message-ID: <428CD2F6.8030105@zianet.com>

Investigate the -m option of mkfs.ext2/3 or tune2fs.
The default is 5%.

Maurice Volaski wrote:
> I created a filesystem as follows:
> 
> mke2fs -j -O dir_index -O sparse_super -T largefile /dev/drbd/6
> 
> Here's the the output from df
> 
> Filesystem            Size  Used Avail Use%
> /dev/drbd/6           475G   33M  452G   1%
> 
> It seems that ext3 has taken 23 GB, which is about 5% of the total disk 
> size, for itself. Is that right?
> 
> If that is, indeed, the case, why does df just list 33M as being used?



From theman at josephdwagner.info  Thu May 19 23:52:01 2005
From: theman at josephdwagner.info (Joseph D. Wagner)
Date: Thu, 19 May 2005 18:52:01 -0500
Subject: ext3 journal problems
In-Reply-To: <20050519173621.GG1499@schnapps.adilger.int>
References: <428C72FE.3080506@outblaze.com>
	<200505191718.j4JHIhmh016701@josephdwagner.info>
	<20050519173621.GG1499@schnapps.adilger.int>
Message-ID: <20050519234943.M23142@josephdwagner.info>

> ??? Maybe you are thinking of "jfs", but jbd is developed by Stephen
> explicitly for ext3.

Oops.  My bad.  Sorry, I'm new to this file system development thing.  I'm 
find it to be quite a steep learning curve.

Joseph D. Wagner



From cchan at outblaze.com  Fri May 20 02:00:35 2005
From: cchan at outblaze.com (Christopher Chan)
Date: Fri, 20 May 2005 10:00:35 +0800
Subject: ext3 journal problems
In-Reply-To: <20050519234943.M23142@josephdwagner.info>
References: <428C72FE.3080506@outblaze.com>
	<200505191718.j4JHIhmh016701@josephdwagner.info>
	<20050519173621.GG1499@schnapps.adilger.int>
	<20050519234943.M23142@josephdwagner.info>
Message-ID: <428D44C3.4040206@outblaze.com>

Joseph D. Wagner wrote:
>>??? Maybe you are thinking of "jfs", but jbd is developed by Stephen
>>explicitly for ext3.
> 
> 
> Oops.  My bad.  Sorry, I'm new to this file system development thing.  I'm 
> find it to be quite a steep learning curve.
> 
> Joseph D. Wagner
> 

No problem. Please make sure of your homework. I don't run anything but 
ext3 on the box with the problem.



From mvolaski at aecom.yu.edu  Fri May 20 17:14:23 2005
From: mvolaski at aecom.yu.edu (Maurice Volaski)
Date: Fri, 20 May 2005 13:14:23 -0400
Subject: [Q] Where does all the space go?
In-Reply-To: <20050520160006.C8245736F1@hormel.redhat.com>
References: <20050520160006.C8245736F1@hormel.redhat.com>
Message-ID: <a06210202beb3bd0c6f28@[129.98.90.227]>

>It's not reserved for the filesystem, but rather for root.  Read about
>the -m option in the manpage to adjust that 5%.


>Investigate the -m option of mkfs.ext2/3 or tune2fs.
>The default is 5%.

Thanks for the info.

I found a post previously that claims it is required to prevent high 
levels of fragmentation as well as "other, very important" reasons. I 
wonder how accurate this statement is.

>Ummm, the 5% reservation is to prevent the high levels of
>fragmentation that occur when the filesystem is near full (something
>that I wish Windows would adopt as standard too ;-).  It is also to
>keep your system from "hanging" if system/root processes need to
>write to the filesystem, so they aren't at the "mercy" of users
>filling it up.  And there are a few other, very important reasons
>too.

-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University



From theman at josephdwagner.info  Fri May 20 17:50:14 2005
From: theman at josephdwagner.info (Joseph D. Wagner)
Date: Fri, 20 May 2005 12:50:14 -0500
Subject: [Q] Where does all the space go?
In-Reply-To: <a06210202beb3bd0c6f28@[129.98.90.227]>
Message-ID: <200505201749.j4KHnisY013617@josephdwagner.info>

> I found a post previously that claims it is required to prevent high
> levels of fragmentation as well as "other, very important" reasons. I
> wonder how accurate this statement is.

Very accurate.  Fragmentation increases exponentially.  The harder it is for the file system to find contiguous space for a file (as the file system gets more and more full) the exponentially worse fragmentation gets.  There's argument on exactly what the cut off point should be -- 90%, 95%, etc -- but by the time your TB file system is that full, you've got more serious problems anyway.

There's several studies on this out there on the web, somewhere.

Joseph D. Wagner




From tytso at mit.edu  Sat May 21 02:40:45 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Fri, 20 May 2005 22:40:45 -0400
Subject: mke2fs options for very large filesystems
In-Reply-To: <a06210201beb277a42ac3@[129.98.90.227]>
References: <20050208170005.3F34E72E1E@hormel.redhat.com>
	<a06210201beb277a42ac3@[129.98.90.227]>
Message-ID: <20050521024045.GC6708@thunk.org>

On Thu, May 19, 2005 at 01:40:34PM -0400, Maurice Volaski wrote:
> Wouldn't -T largefile already be making choices about the default 
> bytes-per-inode?
> 
> How could I make my own determination about what values are most 
> appropriate for -i and -N? My filesystems are generally several 
> hundreds of gigabytes, filled with files that average about one 
> megabyte in size.

Well, "mke2s -i 1048576" will create an inode for every megabyte
(1,048,576 byte) of space on the filesystem.  However, once you create
a filesystem, it's not possible to increase the number of inodes in
that filesystem afterwards.  Also, symbolic links also take up inodes,
as do block and character devices.  So in general you want to
overallocate inodes somewhat.  For example, if you specify "mke2fs -i
524288" then you will be creating twice as many inodes, since you are
asking mke2fs to create an inode for every 512k of space.

						- Ted



From dbond at nrggos.com.au  Sun May 22 22:53:28 2005
From: dbond at nrggos.com.au (Darryl Bond)
Date: Mon, 23 May 2005 08:53:28 +1000
Subject: FSCK of corrupted ext3 filesystem
Message-ID: <42910D68.4090303@nrggos.com.au>

Hello,
I have a 1.3TB ext3 filesystem that has been in service for about 3 months.
About 6 days ago the Emulex fibrechannel controller logged a SCSI error 
and the filesystem changed to RO.
It appears that the filesystem instantly changes to RO and prevents the 
journal from working, therefore invalidating the filesystem.
The filesystem was unmounted and a remount was attempted. The mount 
failed due to errors and an fsck came up with errors.

Top output looks like this:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4562 root          25   0  780m   214m  236 R 99.9         42.6   
6211:44 fsck.ext3

The fsck has been running for 6 days without printing anything to the 
screen. It seems to be working as an strace produces the following.

Process 4562 attached - interrupt to quit
_llseek(5, 5979127808, [5979127808], SEEK_SET) = 0
read(5, "\377\276\340oY\\i\17\346N\231\370\216\v\276\361\255\245"..., 
4096) = 4096
_llseek(5, 299281825792, [299281825792], SEEK_SET) = 0
write(5, "\323\265-Q\33<\331\216\325\304U\3V\221\213\301e\32Q\220"..., 
4096) = 4096
_llseek(5, 5979131904, [5979131904], SEEK_SET) = 0
read(5, "\327\347\2435\210\253^\222H\253\302\331\360\245\323\352"..., 
4096) = 4096
_llseek(5, 299281829888, [299281829888], SEEK_SET) = 0
write(5, "\242\355\370A\2759Q\251\31>\254\240\301\34\320\226J5\22"..., 
4096) = 4096
_llseek(5, 5979136000, [5979136000], SEEK_SET) = 0
read(5, "X\220ik\266\312\306\\ \266\32\220A\362\3319\250\27&\f\357"..., 
4096) = 4096
_llseek(5, 299281833984, [299281833984], SEEK_SET) = 0
write(5, "U\352\255\303`\262\372h\242\275\312\333_\352\3\322\313"..., 
4096) = 4096
_llseek(5, 5979140096, [5979140096], SEEK_SET) = 0
read(5, "\33\265#\367\332{\250Bj\215\277[\313\201\23\340\223\216"..., 
4096) = 4096
_llseek(5, 299281838080, [299281838080], SEEK_SET) = 0
write(5, "\313-\234z\236\253/\3\360\232\222\237p\t5L\353\v\363t%"..., 
4096) = 4096
Process 4562 detached

How long should I let the fsck run?

Regards
Darryl Bond


                                DISCLAIMER

The contents of this electronic message and any attachments are intended only
for the addressee and may contain legally privileged, personal, sensitive or
confidential information. If you are not the intended addressee, and have
received this email, any transmission, distribution, downloading, printing or
photocopying of the contents of this message or attachments is strictly
prohibited. Any legal privilege or confidentiality attached to this message and
attachments is not waived, lost or destroyed by reason of delivery to any
person other than intended addressee. If you have received this message and
are not the intended addressee you should notify the sender by return email and
destroy all copies of the message and any attachments.  Unless expressly
attributed, the views expressed in this email do not necessarily represent the
views of the company.



From tytso at mit.edu  Mon May 23 17:40:37 2005
From: tytso at mit.edu (Theodore Ts'o)
Date: Mon, 23 May 2005 13:40:37 -0400
Subject: FSCK of corrupted ext3 filesystem
In-Reply-To: <42910D68.4090303@nrggos.com.au>
References: <42910D68.4090303@nrggos.com.au>
Message-ID: <20050523174037.GA30505@thunk.org>

On Mon, May 23, 2005 at 08:53:28AM +1000, Darryl Bond wrote:
> Hello,
> I have a 1.3TB ext3 filesystem that has been in service for about 3 months.
> About 6 days ago the Emulex fibrechannel controller logged a SCSI error 
> and the filesystem changed to RO.
> It appears that the filesystem instantly changes to RO and prevents the 
> journal from working, therefore invalidating the filesystem.
> The filesystem was unmounted and a remount was attempted. The mount 
> failed due to errors and an fsck came up with errors.

What version of e2fsck are you using, and what kernel messages were
displayed when the filesystem was remounted read-only?  What version
of the kernel/distribution are you using?  What essages were printed
by e2fsck?

						 - Ted



From dbond at nrggos.com.au  Wed May 25 10:57:25 2005
From: dbond at nrggos.com.au (Darryl Bond)
Date: Wed, 25 May 2005 20:57:25 +1000
Subject: FSCK of corrupted ext3 filesystem
In-Reply-To: <PERBUMSGID-ul6psvf4sq7.fsf@false.linpro.no>
References: <42910D68.4090303@nrggos.com.au>
	<PERBUMSGID-ul6psvf4sq7.fsf@false.linpro.no>
Message-ID: <42945A15.1080206@nrggos.com.au>

Perhaps, but should I stop it.
It doesn't seem to be thrashing. The box is still quite responsive.

After 8 days it is still working. If I stop it, will I have a mountable 
filesystem that I can get as much as possible off.
I have ordered some 400G disks to try to get as much as possible.

Regards



Per Andreas Buer wrote:

>Darryl Bond <dbond at nrggos.com.au> writes:
>
>  
>
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 4562 root          25   0  780m   214m  236 R 99.9         42.6
>>6211:44 fsck.ext3
>>    
>>
>
>I looks like fsck.ext3 has eaten all of your memory. Your system is
>probably thrashing. Buy more memory.
>
>  
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20050525/28533752/attachment.htm>

From perbu at linpro.no  Wed May 25 08:25:52 2005
From: perbu at linpro.no (Per Andreas Buer)
Date: 25 May 2005 10:25:52 +0200
Subject: FSCK of corrupted ext3 filesystem
In-Reply-To: <42910D68.4090303@nrggos.com.au>
References: <42910D68.4090303@nrggos.com.au>
Message-ID: <PERBUMSGID-ul6psvf4sq7.fsf@false.linpro.no>

Darryl Bond <dbond at nrggos.com.au> writes:

>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  4562 root          25   0  780m   214m  236 R 99.9         42.6
> 6211:44 fsck.ext3

I looks like fsck.ext3 has eaten all of your memory. Your system is
probably thrashing. Buy more memory.

-- 
Per Andreas Buer



From menscher at uiuc.edu  Wed May 25 14:29:49 2005
From: menscher at uiuc.edu (Damian Menscher)
Date: Wed, 25 May 2005 09:29:49 -0500 (CDT)
Subject: FSCK of corrupted ext3 filesystem
In-Reply-To: <PERBUMSGID-ul6psvf4sq7.fsf@false.linpro.no>
References: <42910D68.4090303@nrggos.com.au>
	<PERBUMSGID-ul6psvf4sq7.fsf@false.linpro.no>
Message-ID: <Pine.LNX.4.62.0505250927020.8089@lx2.physics.uiuc.edu>

On Wed, 25 May 2005, Per Andreas Buer wrote:

> Darryl Bond <dbond at nrggos.com.au> writes:
>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  4562 root          25   0  780m   214m  236 R 99.9         42.6
>> 6211:44 fsck.ext3
>
> I looks like fsck.ext3 has eaten all of your memory. Your system is
> probably thrashing. Buy more memory.

No.  Look at the columns again, reformatted properly:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
4562 root      25   0  780m  214m 236 R 99.9 42.6  6211:44  fsck.ext3

My _uninformed_ suggestion would be to kill it and run it again.  It 
might help.  Or not.  At least it's unlikely to make matters worse.

Damian Menscher
-- 
-=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=-
-=#| <menscher at uiuc.edu> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-



From mvolaski at aecom.yu.edu  Thu May 26 23:23:49 2005
From: mvolaski at aecom.yu.edu (Maurice Volaski)
Date: Thu, 26 May 2005 19:23:49 -0400
Subject: Confusing -t for -T causes bad block count error
In-Reply-To: <20050208170005.3F34E72E1E@hormel.redhat.com>
References: <20050208170005.3F34E72E1E@hormel.redhat.com>
Message-ID: <a06210200bebc096453b4@[129.98.90.227]>

Just in case anyone ever reads this old post below and tries making a 
file system with the little, lower case letter "t" below, it results 
in a baffling bad block count error. The correct option is the upper 
case, capital letter "T" :)

>Yes, if you are creating larger files.  By default e2fsck assumes the average
>file size is 8kB and allocates a corresponding number of inodes there.  If,
>for example, you are storing lots of larger files there (digital photos, MP3s,
>etc) that are in the MB range you can use "-t largefile" or "-t largefile4"
>to specify an average file size of 1MB or 4MB respectively.  You can also
>use -i or -N (see man page) to override the default bytes-per-inode value.
>This will also speed up e2fsck noticably.

-- 

Maurice Volaski, mvolaski at aecom.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University



From sct at redhat.com  Fri May 27 15:13:35 2005
From: sct at redhat.com (Stephen C. Tweedie)
Date: Fri, 27 May 2005 16:13:35 +0100
Subject: Intermittent ext3 corruption on external firewire Micronet
	1.5Tb RAID on FC3
In-Reply-To: <200505152248.j4FMmI7b031303@josephdwagner.info>
References: <200505152248.j4FMmI7b031303@josephdwagner.info>
Message-ID: <1117206814.1957.42.camel@sisko.sctweedie.blueyonder.co.uk>

Hi,

On Sun, 2005-05-15 at 23:48, Joseph D. Wagner wrote:
> > May 15 04:03:30 localhost kernel: EXT3-fs error (device sdd1):
> > ext3_xattr_get: inode 63343526: bad block 165510584
> > May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> > ext3_xattr_get: inode 63343381: bad block 141623810
> > May 15 04:03:34 localhost kernel: EXT3-fs error (device sdd1):
> > ext3_xattr_get: inode 63947123: bad block 203323361
> 
> These errors cannot be caused by a bug in the file system.

Yes they can, and almost certainly were: I'm not sure why you'd assert
otherwise.  These messages are coming straight back from ext3 when it
doesn't find the right magic number in an xattr block.  Looking at the
kernel version in the initial error:

> Linux localhost.localdomain 2.6.9-1.667smp #1 SMP Tue Nov 2 14:59:52
> EST 2004 i686 i686 i386 GNU/Linux

Andreas and I found and fixed an xattr sharing bug in December 2004,
about five months ago.  It's a race when one process is deleting an
unshared xattr block while another process is simultaneously trying to
share it, and it seems to be particularly visible when you've got
SELinux on.  The fix is in the core mbcache.c code, but directly affects
ext3 xattrs.

This was fixed both upstream and in Fedora updates quite some time ago. 
"yum update" is your friend in this case. :-)

Cheers,
 Stephen




From sct at redhat.com  Fri May 27 15:24:24 2005
From: sct at redhat.com (Stephen C. Tweedie)
Date: Fri, 27 May 2005 16:24:24 +0100
Subject: Intermittent ext3 corruption on external firewire
	Micronet1.5Tb RAID on FC3
In-Reply-To: <200505170741.j4H7fnl2031520@josephdwagner.info>
References: <200505170741.j4H7fnl2031520@josephdwagner.info>
Message-ID: <1117207464.1957.53.camel@sisko.sctweedie.blueyonder.co.uk>

Hi,

On Tue, 2005-05-17 at 08:42, Joseph D. Wagner wrote:
> > What I would do (if you don't mind overwriting the disk, presumably
> > not if it is just new and doesn't contain important data) is to
> > write a small test program to write the byte offset at the start of
> > every 4kB block on the disk, then read them all back and verify it
> > is correct.
> 
> That's what badblocks is for when doing a destructive write test.

No, badblocks just tells you if an IO succeeded.  It's really not
designed to make sure that the IO went to the correct disk block in the
presence of block aliasing, which is what you need to detect wraps.

I wrote a program to test such things a couple of months ago, and have
recently been polishing it up and writing documentation for it for
public consumption.  It's called "verify-data", and it does a
write-then-read verify scan designed for large block devices.  It uses
1MB IOs by default, with the buffer carefully constructed to be easily
recognisable: buffers contain a repeating pattern of block offset, byte
offset, magic number and pass number, so any IOs going astray are
instantly recognisable.  Everything should be 64-bit safe, and I've used
it on block devices up to 13TB in size.

By default it just writes and verifies one chunk every 128GB throughout
the device, but you can tell it to walk the whole device (MUCH
slower!).  I've found it very good for detecting edge-conditions, wraps
etc. on large block devices.  (It also includes a query mode, -Q, to
interrogate the GETBLKSIZE[64] ioctls too.)

It's called "verify-data" and can be found at 

        http://people.redhat.com/sct/src/verify-data/

I've got it in git locally, and can push the git repo to http too if
people find it useful.

Cheers, 
 Stephen