[rhelv6-list] Kernel memory leak?

Mon Aug 29 23:39:27 UTC 2011

On Mon, Aug 29, 2011 at 6:18 PM, Abdussamad Abdurrazzaq
<abdussamad at abdussamad.com> wrote:
> Hello
>
> Ok please ignore my previous email (if you've seen it). It's quite confused
> because I posted using gmane.org.
>
> I know about how Linux reports memory usage. My problem is very much real.
> Memory usage keeps increasing because of a memory leak in the kernel dentry
> cache. This is the same problem as outlined by others here:
>
> https://www.redhat.com/archives/rhelv6-list/2011-February/msg00001.html
>
> So I was wondering whether this problem was fixed? I am using centos 6 with
> the following kernel:
>
> Linux serve3.websitetheme.com. 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27
> 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux
>
> At one point dentry was using 3GB plus on  my 8GB system!
>
> I am currently using a cron job to clear the cache every so often:
>
> sync && echo 2 >/proc/sys/vm/drop_caches
>
> The above works but I am looking for a more permanent solution. To that end
> I tried increasing:
>
> echo 10000> /proc/sys/vm/vfs_cache_pressure
>
> And in /etc/sysctl.conf But to no effect.
>
> So any idea how to fix this?
>
> Regards,
> Abdussamad
>
> _______________________________________________
> rhelv6-list mailing list
> rhelv6-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rhelv6-list
>

There are some things you can try to do. You can collect a vmcore from
a time period during which the system has exhausted nearly all of it's
memory due to this leak.  Then you could try to analyze kmem to
indicate if there is a problem with the kernel.

You could even go as simple as just looking at top, or the contents of
/proc/<pid>/status periodically for the set of apps you suspect.  If
you do have a single app that is leaking memory, you should be able to
record and graph a consistent increasing trend in the amount of memory
the faulty app is leaking.  That would at least give you a starting
point for where to use an app like valgrind.

Also, if you don't know which proc is at fault, I'd start with
/etc/crontab:
  */5 * * * * root ps axo comm,vsize,rss | tail -n +2 >> /tmp/rawdata

This would collect /proc/pid/statm data for a while.

This should work on an selinux-enforcing machine based on the output of:
 # sesearch -As crond_t | grep tmp

Then use awk to find the low- and high-water marks for each comm.

You could get fancy and add timestamps to the records; maybe track
current, low-water, and high-water marks, then gnuplot with error
bars.

As far as your drop_caches work around, know that drop_caches may
cause performance degrade because some cached data are flushed and
system have to load them from disk if they are needed again.

Use the "ps aux" above results in the cron job and locate which
program have a growing RSS.
And sysstat (/var/log/sa/sar*) may provide some historical memory
information that you may be interested in.

Hope this gives you somewhere to look.

I will follow-up on the thread you mentioned.

~rp