[dm-devel] Re: [PATCHES] new solution for dm_any_congested crash
jvrao
jvrao at linux.vnet.ibm.com
Wed Nov 26 21:41:32 UTC 2008
Hi,
I was the one who initially stumbled onto this problem, and when I
realized that it is the ABBA deadlock, I approached Chandra and Mike.
Chandra came up with the initial fix, and it worked fine.
Later Chandra pointed me to this patch, and when I tried it on.. I ran
into system hang.
Please note that, I am running it on -rt kernel based on 2.6.24. I could
not apply the patch directly, so I ported it onto my kernel.
I am attaching the ported version(new_dm_patch)..I already ran this
ported patch by Chandra.
I ran IO stress test with this patch while one of the paths is
constantly bounced . Bounced the same path all the time. (20min between
the bounces)
System hung few hours into the test..and I forced the dump. I am still
analyzing the dump.
If you want to have the dump, please let me know where I can upload it
to. Core is around 8G
Here are few things from the dump that might be interesting.
crash> ps |grep udev | wc -l
425 << <<<425 udevd threads at the time of hang.
crash> ps | wc -l
782
crash> foreach bt| grep rt_mutex_slowlock | wc -l
416 < <<<<416 of the the total 782 threads are waiting for a lock.
crash>
crash> struct rt_mutex ffff81024ec6cca0
struct rt_mutex {
wait_lock = {
raw_lock = {
slock = 49858
},
break_lock = 0
},
wait_list = {
prio_list = {
next = 0xffff8101007f5ae0,
prev = 0xffff8101007f5ae0
},
node_list = {
next = 0xffff8101007f5af0,
prev = 0xffff81007cb51af0
}
},
owner = 0xffff8102400c2b22
}
Following task is holding the lock that many other udevs are waiting for.
PID: 21896 TASK: ffff8102400c2b20 CPU: 6 COMMAND: "udevd" << Holding
#0 [ffff810100667848] schedule at ffffffff8128531c
#1 [ffff810100667900] io_schedule at ffffffff81285859
#2 [ffff810100667920] sync_buffer at ffffffff810d1fc1
#3 [ffff810100667930] __wait_on_bit at ffffffff81285ad1
#4 [ffff810100667970] out_of_line_wait_on_bit at ffffffff81285b71
#5 [ffff8101006679e0] __wait_on_buffer at ffffffff810d1f41
#6 [ffff8101006679f0] ext3_find_entry at ffffffff8803c1d2
#7 [ffff810100667b60] ext3_lookup at ffffffff8803dbae
#8 [ffff810100667ba0] do_lookup at ffffffff810b78cf
#9 [ffff810100667bf0] __link_path_walk at ffffffff810b94ef
#10 [ffff810100667c90] link_path_walk at ffffffff810b9f99
#11 [ffff810100667d60] path_walk at ffffffff810ba04b
#12 [ffff810100667d70] do_path_lookup at ffffffff810ba352
#13 [ffff810100667dc0] __path_lookup_intent_open at ffffffff810bae88
#14 [ffff810100667e10] path_lookup_open at ffffffff810baf38
#15 [ffff810100667e20] open_exec at ffffffff810b41e3
#16 [ffff810100667ed0] do_execve at ffffffff810b53a2
#17 [ffff810100667f20] sys_execve at ffffffff8100ac30
#18 [ffff810100667f50] stub_execve at ffffffff8100c5c7
PID: 21946 TASK: ffff81007f090b20 CPU: 4 COMMAND: "udevd" << One of
the udevds waiting for the lock.
#0 [ffff81007f0e59f8] schedule at ffffffff8128531c
#1 [ffff81007f0e5ab0] rt_mutex_slowlock at ffffffff81286a95
#2 [ffff81007f0e5b80] rt_mutex_lock at ffffffff81285f84
#3 [ffff81007f0e5b90] _mutex_lock at ffffffff812873f9
#4 [ffff81007f0e5ba0] do_lookup at ffffffff810b788b
#5 [ffff81007f0e5bf0] __link_path_walk at ffffffff810b94ef
#6 [ffff81007f0e5c90] link_path_walk at ffffffff810b9f99
#7 [ffff81007f0e5d60] path_walk at ffffffff810ba04b
#8 [ffff81007f0e5d70] do_path_lookup at ffffffff810ba352
#9 [ffff81007f0e5dc0] __path_lookup_intent_open at ffffffff810bae88
#10 [ffff81007f0e5e10] path_lookup_open at ffffffff810baf38
#11 [ffff81007f0e5e20] open_exec at ffffffff810b41e3
#12 [ffff81007f0e5ed0] do_execve at ffffffff810b53a2
#13 [ffff81007f0e5f20] sys_execve at ffffffff8100ac30
#14 [ffff81007f0e5f50] stub_execve at ffffffff8100c5c7
Thanks,
Venkateswararao Jujjuri (JV)
Realtime Team, LTC,
Beaverton, OR 97006
>
> ------------------------------------------------------------------------
>
> Subject:
> [PATCHES] new solution for dm_any_congested crash
> From:
> Mikulas Patocka <mpatocka at redhat.com>
> Date:
> Thu, 13 Nov 2008 20:55:27 -0500 (EST)
> To:
> Alasdair G Kergon <agk at redhat.com>, Chandra Seetharaman
> <sekharan at us.ibm.com>
>
> To:
> Alasdair G Kergon <agk at redhat.com>, Chandra Seetharaman
> <sekharan at us.ibm.com>
> CC:
> dm-devel at redhat.com, Milan Broz <mbroz at redhat.com>
>
>
> Hi
>
> The Chandra's patch was correct, but the problem is more serious (the same
> crash could happen in dm_merge_bvec, dm_unplug_all or at some other dm
> places), so I had to rework reference counting.
>
> These are three patches.
> 1. reverts Chadra's changes
> 2. just a little swap of two calls, to prepare for the third
> 3. the reference counting rework
>
> Chandra, please test the patches at your system (without your original
> patch) and verify that they avoid the crashes as well as your patch does.
>
> Mikulas
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: new_dm_patch
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20081126/d5cfdf7d/attachment.ksh>
More information about the dm-devel
mailing list