[rhelv6-list] Kernel memory leak?

Tue Aug 30 20:13:54 UTC 2011

On 08/30/2011 05:08 PM, robinprice at gmail.com wrote:
> I did some more searching this morning as I mentioned I would last
> night.  I have not found anything in particular to your situation.
> The only suggestions I have would be:
>
> 1) Try getting a core during the memory consumption as I mentioned and
> do a RCA on the vmcore.
> 2) Write a stap script to trace d_alloc in the kernel (or one of the
> d_cache functions) to see who is allocating dentries, and correlate
> that to a process (a perf script would help you do that pretty
> easily).
> 3) Use lsof to see who has tons of open files.  Presumably if you're
> swapping with 100% of ram holding dentries, someone is using those
> dentries which means lots of open files.
>
> Good luck.  Sorry I couldn't find anything.
>
> If anyone has a valid RHEL subscription, I would encourage you to try
> with the latest RHEL6 kernel to see if the leak is still there, and if
> it is, allow GSS to help you find root cause.
>
Or at least try the newer newer kernel from RHEL 6.1 as available in 
Scientific Linux: 
http://ftp.scientificlinux.org/linux/scientific/6.1/x86_64/updates/security/kernel-2.6.32-131.12.1.el6.x86_64.rpm

> ~rp
>
>
> On Mon, Aug 29, 2011 at 7:52 PM, Abdussamad Abdurrazzaq
> <abdussamad at abdussamad.com>  wrote:
>> On 08/30/2011 04:39 AM, robinprice at gmail.com wrote:
>>> On Mon, Aug 29, 2011 at 6:18 PM, Abdussamad Abdurrazzaq
>>> <abdussamad at abdussamad.com>    wrote:
>>>> Hello
>>>>
>>>> Ok please ignore my previous email (if you've seen it). It's quite
>>>> confused
>>>> because I posted using gmane.org.
>>>>
>>>> I know about how Linux reports memory usage. My problem is very much
>>>> real.
>>>> Memory usage keeps increasing because of a memory leak in the kernel
>>>> dentry
>>>> cache. This is the same problem as outlined by others here:
>>>>
>>>> https://www.redhat.com/archives/rhelv6-list/2011-February/msg00001.html
>>>>
>>>> So I was wondering whether this problem was fixed? I am using centos 6
>>>> with
>>>> the following kernel:
>>>>
>>>> Linux serve3.websitetheme.com. 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun
>>>> 27
>>>> 19:49:27 BST 2011 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> At one point dentry was using 3GB plus on  my 8GB system!
>>>>
>>>> I am currently using a cron job to clear the cache every so often:
>>>>
>>>> sync&&    echo 2>/proc/sys/vm/drop_caches
>>>>
>>>> The above works but I am looking for a more permanent solution. To that
>>>> end
>>>> I tried increasing:
>>>>
>>>> echo 10000>    /proc/sys/vm/vfs_cache_pressure
>>>>
>>>> And in /etc/sysctl.conf But to no effect.
>>>>
>>>> So any idea how to fix this?
>>>>
>>>> Regards,
>>>> Abdussamad
>>>>
>>>> _______________________________________________
>>>> rhelv6-list mailing list
>>>> rhelv6-list at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/rhelv6-list
>>>>
>>> There are some things you can try to do. You can collect a vmcore from
>>> a time period during which the system has exhausted nearly all of it's
>>> memory due to this leak.  Then you could try to analyze kmem to
>>> indicate if there is a problem with the kernel.
>>>
>>> You could even go as simple as just looking at top, or the contents of
>>> /proc/<pid>/status periodically for the set of apps you suspect.  If
>>> you do have a single app that is leaking memory, you should be able to
>>> record and graph a consistent increasing trend in the amount of memory
>>> the faulty app is leaking.  That would at least give you a starting
>>> point for where to use an app like valgrind.
>>>
>>> Also, if you don't know which proc is at fault, I'd start with
>>> /etc/crontab:
>>>    */5 * * * * root ps axo comm,vsize,rss | tail -n +2>>    /tmp/rawdata
>>>
>>> This would collect /proc/pid/statm data for a while.
>>>
>>> This should work on an selinux-enforcing machine based on the output of:
>>>   # sesearch -As crond_t | grep tmp
>>>
>>> Then use awk to find the low- and high-water marks for each comm.
>>>
>>> You could get fancy and add timestamps to the records; maybe track
>>> current, low-water, and high-water marks, then gnuplot with error
>>> bars.
>>>
>>> As far as your drop_caches work around, know that drop_caches may
>>> cause performance degrade because some cached data are flushed and
>>> system have to load them from disk if they are needed again.
>>>
>>> Use the "ps aux" above results in the cron job and locate which
>>> program have a growing RSS.
>>> And sysstat (/var/log/sa/sar*) may provide some historical memory
>>> information that you may be interested in.
>>>
>>> Hope this gives you somewhere to look.
>>>
>>> I will follow-up on the thread you mentioned.
>>>
>>> ~rp
>>>
>>> _______________________________________________
>>> rhelv6-list mailing list
>>> rhelv6-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/rhelv6-list
>>>
>> I don't understand. Isn't dentry cache managed by the kernel? So why would I
>> look at applications for possible leaks when its obviously the kernel that's
>> at fault here? Please read my post again including the thread I linked to.
>>   It seems to me you've misunderstood my problem.
>>
>> _______________________________________________
>> rhelv6-list mailing list
>> rhelv6-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/rhelv6-list
>>
> _______________________________________________
> rhelv6-list mailing list
> rhelv6-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rhelv6-list