[Linux-cluster] Out of Memory Problem

Raj Kumar rajkum2002 at rediffmail.com
Mon Apr 18 14:36:42 UTC 2005


Hi everyone,

One of our GFS Linux servers has crashed twice yesterday. The log messages indicate the server ran out of memory and started killing processes:

Out of Memory: Killed process 21188 (sshd).
Out of Memory: Killed process 5215 (xfs).

The server is a HP DL380 with dual Xeon 3.06 GHz processor, 1GB RAM, 2GB swap space running RHEL 3.0- kernel 2.4.21-27.0.1.ELsmp. The server runs NIS, GFS, SSHD and samba services. After the first crash the server didn’t start due to file system corruption. The problem has been corrected and server returned to operation yesterday evening. Today's log indicates the server ran out of memory and killed processes again this morning. This out of memory problem is recurring speciallyl when users are accessing the storage mounted using GFS.

Where can I start to debug the problem? 

free -m output:

             total       used       free     shared    buffers     cached
Mem:          1001        986         14          0          1         79
-/+ buffers/cache:        905         95
Swap:         1996         49       1946

I don't understand what's happening to the total 1GB memory. This is the free output that happened seconds before crash (I had swatch set up to log the statistics the moment it sees OOM messages). PS output doesn't show any process taking significant portion of memory either. Since this is happening only when users are using GFS heavily I suspect it is the problem. But how do I verify it? Is 1GB too small for a GFS server?

I found that another user has seen the same problem before:
https://www.redhat.com/archives/linux-cluster/2005-January/msg00099.html

GFS setup was fine and all our tests passed. We then moved it to production and it immediately failed after running for two days. Your help is very much appreciated!! The problem seems to be reproducible. So if you need any logs I can rerun what our users did at the time of crash.

Thanks,
Raj

================== Log ============================

Apr  7 10:49:15 server1 kernel: Mem-info:
Apr  7 10:49:15 server1 kernel: Zone:DMA freepages:  2792 min:     0 low:     0 high:     0
Apr  7 10:49:15 server1 kernel: Zone:Normal freepages:   382 min:   766 low:  4031 high:  5791
Apr  7 10:49:15 server1 kernel: Zone:HighMem freepages:   287 min:   255 low:   510 high:   765
Apr  7 10:49:15 server1 kernel: Free pages:        3461 (   287 HighMem)
Apr  7 10:49:15 server1 kernel: ( Active: 22389/6071, inactive_laundry: 889, inactive_clean: 943, free: 3461 )
Apr  7 10:49:15 server1 kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2792
Apr  7 10:49:15 server1 kernel:   aa:6 ac:13 id:292 il:43 ic:0 fr:382
Apr  7 10:49:15 server1 kernel:   aa:15159 ac:7211 id:5769 il:856 ic:943 fr:287
Apr  7 10:49:15 server1 kernel: 2*4kB 1*8kB 3*16kB 3*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 2*4096kB = 11168kB) Apr  7 10:49:15 server1 kernel: 0*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1528kB) Apr  7 10:49:15 server1 kernel: 33*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1148kB) Apr  7 10:49:15 server1 kernel: Swap cache: add 1777, delete 1292, find 20425/20536, race 0+0 Apr  7 10:49:15 server1 kernel: 218499 pages of slabcache Apr  7 10:49:15 server1 kernel: 216 pages of kernel stacks Apr  7 10:49:16 server1 kernel: 0 lowmem pagetables, 489 highmem pagetables
Apr  7 10:49:16 server1 kernel: Free swap:       2038872kB
Apr  7 10:49:16 server1 kernel: 262138 pages of RAM Apr  7 10:49:16 server1 kernel: 32762 pages of HIGHMEM Apr  7 10:49:16 server1 kernel: 5780 reserved pages Apr  7 10:49:16 server1 kernel: 16752 pages shared Apr  7 10:49:16 server1 kernel: 485 pages swap cached Apr  7 10:49:16 server1 kernel: Out of Memory: Killed process 21188 (sshd).
Apr  7 10:49:16 server1 kernel: Out of Memory: Killed process 21188 (sshd).
Apr  7 10:49:20 server1 kernel: Mem-info:
Apr  7 10:49:20 server1 kernel: Zone:DMA freepages:  2792 min:     0 low:     0 high:     0
Apr  7 10:49:20 server1 kernel: Zone:Normal freepages:   382 min:   766 low:  4031 high:  5791
Apr  7 10:49:20 server1 kernel: Zone:HighMem freepages:   291 min:   255 low:   510 high:   765
Apr  7 10:49:20 server1 kernel: Free pages:        3465 (   291 HighMem)
Apr  7 10:49:20 server1 kernel: ( Active: 21743/6636, inactive_laundry: 896, inactive_clean: 1049, free: 3465 )
Apr  7 10:49:20 server1 kernel:   aa:0 ac:0 id:0 il:0 ic:0 fr:2792
Apr  7 10:49:20 server1 kernel:   aa:6 ac:36 id:265 il:40 ic:0 fr:382
Apr  7 10:49:20 server1 kernel:   aa:14479 ac:7222 id:6365 il:862 ic:1049 fr:291
Apr  7 10:49:20 server1 kernel: 2*4kB 1*8kB 3*16kB 3*32kB 0*64kB 0*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 2*4096kB = 11168kB) Apr  7 10:49:20 server1 kernel: 28*4kB 1*8kB 2*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1528kB) Apr  7 10:49:20 server1 kernel: 37*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1164kB) Apr  7 10:49:20 server1 kernel: Swap cache: add 1777, delete 1292, find 20425/20536, race 0+0 Apr  7 10:49:21 server1 kernel: 218570 pages of slabcache Apr  7 10:49:21 server1 kernel: 196 pages of kernel stacks Apr  7 10:49:21 server1 kernel: 0 lowmem pagetables, 404 highmem pagetables
Apr  7 10:49:21 server1 kernel: Free swap:       2038872kB
Apr  7 10:49:21 server1 kernel: 262138 pages of RAM Apr  7 10:49:21 server1 kernel: 32762 pages of HIGHMEM Apr  7 10:49:21 server1 kernel: 5780 reserved pages Apr  7 10:49:22 server1 kernel: 13904 pages shared Apr  7 10:49:22 server1 kernel: 485 pages swap cached Apr  7 10:49:22 server1 kernel: Out of Memory: Killed process 5215 (xfs).
Apr  7 10:49:22 server1 kernel: Out of Memory: Killed process 5215 (xfs).
.........
............
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20050418/4763474f/attachment.htm>


More information about the Linux-cluster mailing list