[dm-devel] LVM2 stripe problems on AoE volumes

Jure Pečar pegasus at nerv.eu.org
Tue Mar 28 18:10:42 UTC 2006


Hi all,

I'm currently playing with Ata-over-Ethernet storage from
www.coraid.com. It works very well, however I found some strange
problems when lvm2 striping is used on the AoE disks.

Actually the problem is mentioned in their FAQ at
http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.5
but I would like to figure out why is this happening and how to fix the
problem.

The situation:

Create a volume group of two AoE disks and create a lv on them with
stripe (lvcreate -i 2). Format it with ext2 or 3. You can write on it
as much as you want. You can read either the disks or the lv device
with dd as much as you want. However, reading from a file on the volume
oopses as soon as you read over one stripe size, which is 64k by
default. 

At least that is the situation on rhel4u2 kernel (2.6.9-22.0.2.ELsmp).
Vanilla 2.6.16 behaves a little different - writing also crash, but not
always, and reading crashes immediately, so hard that I wasn't able to
get any complete oops output at all.

So all I have right now is this oops from rhel4u2 kernel:

Unable to handle kernel NULL pointer dereference at virtual address
00000000 printing eip:
c014aaa0
*pde = 364bf001
Oops: 0000 [#1]
SMP
Modules linked in: aoe(U) dm_mod e1000 ext3 jbd raid1 qla2300(U) qla2xxx
(U) qla2xxx_conf(U) mptscsih mptbase sd_mod dCPU:    3 EIP:    0060:
[<c014aaa0>]    Not tainted VLI EFLAGS: 00010246   (2.6.9-22.0.2.ELsmp)
EIP is at page_address+0x6/0x6e
eax: 00000000   ebx: 00000000   ecx: f6a6da80   edx: 00000000
esi: f7531400   edi: f7531680   ebp: 00000000   esp: f643fb68
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 968, threadinfo=f643f000 task=f7575930)
Stack: f735f5fc f7531400 f7531680 00000000 f896427f f6a6da80 f6a43580
f7531464 f6a6dc00 00000000 c0223630 3a385f85 00000000 00000078 f6a6dc00
4c908780 00000000 f6a6dc00 f7ce0880 f6ac667c f88040b8 f6a6dc00 f899a2de
00000002 Call Trace:
 [<f896427f>] aoeblk_make_request+0xa6/0x14b [aoe]
 [<c0223630>] generic_make_request+0x18e/0x19e
 [<f899a2de>] __map_bio+0x35/0xb2 [dm_mod]
 [<f899a4e1>] __clone_and_map+0xc0/0x2c3 [dm_mod]
 [<f8915944>] ext3_get_block+0x64/0x6c [ext3]
 [<f899a77c>] __split_bio+0x98/0xfe [dm_mod]
 [<f899a859>] dm_request+0x77/0x8b [dm_mod]
 [<c0223630>] generic_make_request+0x18e/0x19e
 [<c022370a>] submit_bio+0xca/0xd2
 [<c01bf9ba>] radix_tree_insert+0x6e/0xe7
 [<c01766cc>] mpage_end_io_read+0x0/0x61
 [<c017679b>] mpage_bio_submit+0x19/0x1d
 [<c0176cf6>] mpage_readpages+0xef/0xf9
 [<f8916480>] ext3_readpages+0x12/0x14 [ext3]
 [<f89158e0>] ext3_get_block+0x0/0x6c [ext3]
 [<c0145535>] read_pages+0x33/0xdd
 [<c0143148>] buffered_rmqueue+0x17d/0x1a5
 [<c0143245>] __alloc_pages+0xd5/0x2f7
 [<c01458bd>] do_page_cache_readahead+0x138/0x158
 [<c0145a0e>] page_cache_readahead+0x131/0x19e
 [<c013fe37>] do_generic_mapping_read+0xfa/0x3ae
 [<c0140353>] __generic_file_aio_read+0x19f/0x1bd
 [<c01400eb>] file_read_actor+0x0/0xc9
 [<c01403b1>] generic_file_aio_read+0x40/0x47
 [<c0159c95>] do_sync_read+0x97/0xc9
 [<c01ab790>] selinux_file_permission+0x117/0x120
 [<c011fee1>] autoremove_wake_function+0x0/0x2d
 [<c0159d7d>] vfs_read+0xb6/0xe2
 [<c0159f90>] sys_read+0x3c/0x62
 [<c02d137f>] syscall_call+0x7/0xb
Code: 08 0f 0b de 01 7e 28 2e c0 89 d8 5b e9 c7 fd ff ff 5b c3 69 c0 01
00 37 9e c1 e8 19 c1 e0 07 05 00 30 43 c0 c3 <0>Fatal exception: panic
in 5 seconds Kernel panic - not syncing: Fatal exception

My theory is that dm is doing something at striping that aoe driver
cannot digest. Can anyone familiar with dm internals comment on that?


-- 

Jure Pečar
http://jure.pecar.org




More information about the dm-devel mailing list