[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

oom-killer keeps going postal



For the past few weeks my 64-bit FC5 box has had its kernel's oom-killer go nuts under certain conditions, sometimes merely killing off most processes and often causing a kernel panic.  This happens during the daily 4am cron jobs and if I try to compile nVidia's GPU drivers.  Upgrading to 2GB of RAM failed to help (predictably).  Reverting to the FC5 2096 kernel build failed to solve the problem, as did updating to 2123. 

I see that someone is having similar problems on his Xeon server:
http://www.spinics.net/lists/kernel/msg470359.html

Other messages on the LKML show that they've been monkeying around with oom-killer.  I'm guessing something broke on SMP systems (I have a dualcore CPU).  Is there a way to force uniprocessor mode without installing a uniprocessor kernel?  (For x86_64 there aren't separate uni and multiprocessor kernel builds). 

I have noticed that those attempts to install the nVidia driver will send memory usage skyrocketing, suggesting a memory leak or infinite loop or some such thing.  The swap partition (2GB) appears to be ignored.  Switching to a swap file made no difference. 

Any ideas on how to diagnose exactly what's going wrong? 
Please include my personal email address in any replies. 

ASUS A8N-SLI Premium motherboard (BIOS 1009)
2GB RAM (formerly 1GB RAM)
Athlon 64 X2 3800+ CPU, not overclocked
My PHD PCI2 card (http://www.uxd.com/phdpci2.shtml) gave the system a clean bill of health

/var/log/messages carnage begins with:
May 25 04:09:17 rifle kernel: oom-killer: gfp_mask=0x200d2, order=0
May 25 04:09:18 rifle kernel: 
May 25 04:09:18 rifle kernel: Call Trace: <ffffffff8015e599>{out_of_memory+53} <ffffffff801604c0>{__alloc_pages+544}
May 25 04:09:18 rifle kernel:        <ffffffff80170b53>{read_swap_cache_async+69} <ffffffff80166b2d>{swapin_readahead+98}
May 25 04:09:18 rifle kernel:        <ffffffff8033b34c>{_read_unlock_irq+9} <ffffffff801689a7>{__handle_mm_fault+1519}
May 25 04:09:18 rifle kernel:        <ffffffff8010b6f9>{error_exit+0} <ffffffff8033d11a>{do_page_fault+982}
May 25 04:09:18 rifle kernel:        <ffffffff80190c5c>{sys_select+795} <ffffffff8010b6f9>{error_exit+0}
May 25 04:09:18 rifle kernel: Mem-info:
May 25 04:09:18 rifle kernel: Node 0 DMA per-cpu:
May 25 04:09:18 rifle kernel: cpu 0 hot: high 0, batch 1 used:0
May 25 04:09:18 rifle kernel: cpu 0 cold: high 0, batch 1 used:0
May 25 04:09:18 rifle kernel: cpu 1 hot: high 0, batch 1 used:0
May 25 04:09:18 rifle kernel: cpu 1 cold: high 0, batch 1 used:0
May 25 04:09:18 rifle hcid[3117]: Got disconnected from the system message bus
May 25 04:09:19 rifle kernel: Node 0 DMA32 per-cpu:
May 25 04:09:19 rifle kernel: cpu 0 hot: high 186, batch 31 used:30
May 25 04:09:19 rifle kernel: cpu 0 cold: high 62, batch 15 used:52
May 25 04:09:20 rifle kernel: cpu 1 hot: high 186, batch 31 used:54
May 25 04:09:20 rifle kernel: cpu 1 cold: high 62, batch 15 used:39
May 25 04:09:20 rifle kernel: Node 0 Normal per-cpu: empty
May 25 04:09:20 rifle kernel: Node 0 HighMem per-cpu: empty
May 25 04:09:20 rifle kernel: Free pages:       14244kB (0kB HighMem)
May 25 04:09:20 rifle kernel: Active:1002 inactive:549 dirty:0 writeback:32 unstable:0 free:3561 slab:500885 mapped:1233 pagetables:1571
May 25 04:09:20 rifle kernel: Node 0 DMA free:8028kB min:28kB low:32kB high:40kB active:0kB inactive:0kB present:11140kB pages_scanned:188 all_unreclaimable? yes
May 25 04:09:20 rifle kernel: lowmem_reserve[]: 0 2000 2000 2000
May 25 04:09:20 rifle kernel: Node 0 DMA32 free:6216kB min:5708kB low:7132kB high:8560kB active:4008kB inactive:2196kB present:2048196kB pages_scanned:1681 all_unreclaimable? no
May 25 04:09:20 rifle kernel: lowmem_reserve[]: 0 0 0 0
May 25 04:09:20 rifle kernel: Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
May 25 04:09:20 rifle ainit: 
May 25 04:09:20 rifle kernel: lowmem_reserve[]: 0 0 0 0
May 25 04:09:20 rifle ainit: 
May 25 04:09:20 rifle kernel: Node 0 HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
May 25 04:09:20 rifle kernel: lowmem_reserve[]: 0 0 0 0
May 25 04:09:20 rifle kernel: Node 0 DMA: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8028kB
May 25 04:09:20 rifle kernel: Node 0 DMA32: 148*4kB 1*8kB 3*16kB 0*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 1*4096kB = 6216kB
May 25 04:09:20 rifle kernel: Node 0 Normal: empty
May 25 04:09:20 rifle kernel: Node 0 HighMem: empty
May 25 04:09:20 rifle kernel: Swap cache: add 16394, delete 16057, find 186/199, race 0+0
May 25 04:09:20 rifle kernel: Free swap  = 1966412kB
May 25 04:09:20 rifle kernel: Total swap = 2031608kB
May 25 04:09:20 rifle kernel: Free swap:       1966412kB
May 25 04:09:20 rifle kernel: 524272 pages of RAM
May 25 04:09:20 rifle kernel: 10582 reserved pages
May 25 04:09:20 rifle kernel: 73370 pages shared
May 25 04:09:20 rifle kernel: 345 pages swap cached
May 25 04:09:20 rifle kernel: Out of Memory: Kill process 3506 (mysqld) score 23606 and children.
May 25 04:09:20 rifle kernel: Out of memory: Killed process 3506 (mysqld).
May 25 04:09:20 rifle kernel: oom-killer: gfp_mask=0x201d2, order=0

And it's downhill from there. 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]