RHEL4 Sun Java Messaging Server deadlock
John Dalbec
jpdalbec at ysu.edu
Tue Apr 26 20:52:42 UTC 2011
> Date: Fri, 4 Feb 2011 20:47:27 +0100
> From: (Imed Chihi) ???? ?????? <imed.chihi at gmail.com>
> To: redhat-list at redhat.com
> Subject: Re: RHEL4 Sun Java Messaging Server deadlock (was:
> redhat-list Digest, Vol 84, Issue 3)
> Message-ID:
> <AANLkTikZOdvX-tTkd0Z133bJTh+ooqWQp50A+eaHKgX+ at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> all_unreclaimable is a flag which, when set, tells the virtual memory
> daemons not to bother scanning pages in the zone in question in order
> to try to free memory. Anyway, the DMA zone is insignificantly tiny
> (16MB) that it cannot possibly have any effect in a 32GB machine.
>
> By the way, there seems to be plenty of free HighMem memory, so the
> problem cannot possibly be due to overcommit.
>
> Based on the above, I could suggest two theories to explain what's happening:
>
> 1. you have a Normal zone starvation
> Try to set vm.lower_zone_protection to something large enough like 100 MB:
> sysctl -w vm.lower_zone_protection 100
> If this theory is correct, then the setting should fix the issue.
>
> 2. you have a pagecache flushing storm
> A huge size of dirty pages from the IO of large data sets would stall
> the system while being sync'ed to disk. This typically occurs once
> the pagecache size has grown to significant sizes. Mounting the
> filesystem in sync mode (mount -oremount,sync /dev/device) would "fix"
> the issue. However, synchronous IO is painfully slow, but the test
> would at least tell where the problem is. If this turns out to be the
> problem, then we could think of other less annoying options for a
> bearable fix.
>
> Good luck,
>
> -Imed
>
It's baack...
/proc/sys/vm/lower_zone_protection:100
I don't think running for two months with synchronous I/O is an option.
Apr 26 11:08:26 myysumail kernel: cpu 23 hot: low 32, high 96, batch 16
Apr 26 11:08:26 myysumail kernel: cpu 23 cold: low 0, high 32, batch 16
Apr 26 11:08:26 myysumail kernel:
Apr 26 11:08:26 myysumail kernel: Free pages: 20938624kB (20896192kB
HighMem)
Apr 26 11:08:26 myysumail kernel: Active:1158012 inactive:1741131
dirty:6213 wri
teback:1 unstable:0 free:5234656 slab:162944 mapped:345466 pagetables:6398
Apr 26 11:08:26 myysumail kernel: DMA free:12528kB min:32kB low:64kB
high:96kB a
ctive:0kB inactive:0kB present:16384kB pages_scanned:0
all_unreclaimable? yes
Apr 26 11:08:26 myysumail kernel: protections[]: 0 398800 424400
Apr 26 11:08:26 myysumail kernel: Normal free:29904kB min:7976kB
low:15952kB hig
h:23928kB active:520188kB inactive:565376kB present:4014080kB
pages_scanned:0 al
l_unreclaimable? no
Apr 26 11:08:26 myysumail kernel: protections[]: 0 0 25600
Apr 26 11:08:26 myysumail kernel: HighMem free:20896192kB min:512kB
low:1024kB h
igh:1536kB active:4111860kB inactive:6399148kB present:31621120kB
pages_scanned:
0 all_unreclaimable? no
Apr 26 11:08:26 myysumail kernel: protections[]: 0 0 0
Apr 26 11:08:26 myysumail kernel: DMA: 4*4kB 6*8kB 3*16kB 2*32kB 3*64kB
3*128kB
2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12528kB
Apr 26 11:08:26 myysumail kernel: Normal: 3830*4kB 717*8kB 85*16kB
8*32kB 11*64k
B 7*128kB 6*256kB 4*512kB 0*1024kB 1*2048kB 0*4096kB = 29904kB
Apr 26 11:08:26 myysumail kernel: HighMem: 2904*4kB 1080*8kB 482*16kB
76*32kB 23
694*64kB 7005*128kB 9663*256kB 3151*512kB 705*1024kB 86*2048kB
3288*4096kB = 208
96192kB
Apr 26 11:08:26 myysumail kernel: 2632760 pagecache pages
Apr 26 11:08:26 myysumail kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+
0
Apr 26 11:08:26 myysumail kernel: 0 bounce buffer pages
Apr 26 11:08:26 myysumail kernel: Free swap: 16777208kB
Apr 26 11:08:26 myysumail kernel: 8912896 pages of RAM
Apr 26 11:08:26 myysumail kernel: 7864320 pages of HIGHMEM
Apr 26 11:08:26 myysumail kernel: 597583 reserved pages
Apr 26 11:08:26 myysumail kernel: 1548546 pages shared
Apr 26 11:08:26 myysumail kernel: 0 pages swap cached
205 205 TS - 0 24 22 0.0 D start_this_handle pdflush
5021 5021 TS - 0 24 8 0.0 D journal_commit_trans kjournald
PID: 205 TASK: 81515830 CPU: 22 COMMAND: "pdflush"
#0 [814f3d44] rwsem_down_read_failed at 22d49de
#1 [814f3d98] add_wait_queue_exclusive at 2120dbb
#2 [814f3e5c] dio_bio_end_io at 217b638
#3 [814f3ed0] __pdflush at 21461be
#4 [814f3ee4] sync_sb_inodes at 2179f67
#5 [814f3f28] mpage_end_io_read at 217a485
#6 [814f3f38] dirty_writeback_centisecs_handler at 2145ab1
#7 [814f3f40] do_IRQ at 2107e0a
#8 [814f3f9c] __pdflush at 21462b5
#9 [814f3fd0] kthread_create at 2134227
#10 [814f3ff0] kernel_thread_helper at 21041f3
PID: 5021 TASK: 7fd2adf0 CPU: 8 COMMAND: "kjournald"
#0 [7e85fd84] rwsem_down_read_failed at 22d49de
#1 [7e85fd90] finish_wait at 2120ea5
#2 [7e85fda0] scheduler_tick at 211f273
#3 [7e85fdd8] add_wait_queue_exclusive at 2120dbb
#4 [7e85fe98] find_busiest_group at 211e8f2
#5 [7e85ff04] rwsem_down_read_failed at 22d4a09
#6 [7e85ff0c] finish_wait at 2120ea5
#7 [7e85ff1c] scheduler_tick at 211f273
#8 [7e85ff54] del_timer_sync at 212a271
#9 [7e85ffa4] schedule_tail at 211e12c
#10 [7e85fff0] kernel_thread_helper at 21041f3
If one of the user threads is involved, how can I identify it?
Thanks,
John
More information about the redhat-list
mailing list