Problem with auditd/SnareLinux on RHEL 5.3 - auditd glomming memory

Smith, Gary R gary.smith at pnl.gov
Wed Feb 4 22:15:55 UTC 2009


Hello All,

I have some systems that have just been updated to from RHEL 5.2 to RHEL
5.3. The version of auditd is 1.7.7 and SnareLinux is 1.5.0.

Some time after the update ran, I noticed that the amount of free memory
on the systems had dramatically gone down. Running top, I saw that
auditd had sucked up lots of memory:

top - 09:25:59 up 2 days, 2 min,  1 user,  load average: 0.00, 0.03,
0.00
Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us,  0.1%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:   4043368k total,  3994488k used,    48880k free,    64656k buffers
Swap:  8385920k total,      352k used,  8385568k free,   449884k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

31136 root      12  -3 6568m 3.1g  568 S  0.0 81.0   0:08.84 auditd

 3861 root      15   0 26152  16m  364 S  0.0  0.4   0:00.01 restorecond

29474 root      16   0 97.9m  14m 2852 S  0.0  0.4   0:00.16 sshd

29382 root      17   0  124m  13m 1464 S  0.0  0.3   0:00.09 crond

29478 gsmith    15   0 97.9m  12m  976 S  0.0  0.3   0:00.01 sshd

 5611 root      15   0 58912 9048 4208 S  0.0  0.2   3:50.38 dsmc

 3937 root      15   0 96556 7588  916 S  0.0  0.2   3:54.00 sec.pl    

When I first started looking at the problem, I figured the easiest fix
was reboot since a new version of auditd was installed along with a new
kernel and there was a mismatch between the auditd daemon and the
running kernel. After rebooting, I watched a 'top' display and did not
see the auditd daemon percolate up. It turns out that it does percolate
up to the top of the memory usage; I didn't watch it long enough.

I wondered if there was a difference between 32 bit and 64 bit systems.
As it turns out, there isn't one; it's seems to be slower to happen on
32 bit systems probably due to slower CPU's 

Next, I wondered if this is due to the version of Snare running on the
systems. Just to be sure, I rebuilt Snare for the 64 bit system,
de-installed Snare, installed the newly rebuilt Snare, and started
auditd. Again, the auditd daemon started to percolate to the top of the
memory usage.

So, does this have anything to do with the number of rules that auditd
has to deal with? I started trimming back the rules one at a time,
restarting auditd and watching a top display. The number of rules
doesn't have any affect until you get to zero rules and then auditd
doesn't move up.

Next, I ran a valgrind on auditd; one on a RHEL 5.3 system with no rules
and another one with the standard set of rules put on our systems. 

criticality=0   event=execve    uid=*,root
criticality=1   event=(mount,umount,umount2,settimeofday,swapon,swapoff,
reboot,setdomainname,create_module,delete_module,quotactl)
uid=*,root
criticality=2   event=execve    exe=*passwd*
criticality=3   event=(login_auth,login_start,logout)
criticality=4   event=acct_mgmt

After 2 hours of running the former had 0 malloc/free problems with the
leak summary. The latter on the other hand:
==30021==
==30021== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1)
==30021== malloc/free: in use at exit: 147,024,501 bytes in 16,262
blocks.
==30021== malloc/free: 27,612 allocs, 11,350 frees, 249,669,818 bytes
allocated.
==30021== For counts of detected errors, rerun with: -v ==30021==
searching for pointers to 16,262 not-freed blocks.
==30021== checked 10,651,128 bytes.
==30021==
==30021== LEAK SUMMARY:
==30021==    definitely lost: 146,976,704 bytes in 16,246 blocks.
==30021==      possibly lost: 36,464 bytes in 5 blocks.
==30021==    still reachable: 11,333 bytes in 11 blocks.
==30021==         suppressed: 0 bytes in 0 blocks.

Next, I ran a strace of auditd to see if I could spot where it was doing
all the malloc's. Starting up a strace of fresh instance of auditd,
eventually auditd settled into this loop:

clock_gettime(CLOCK_MONOTONIC, {432334, 891462998}) = 0 
epoll_wait(0, {{EPOLLIN, {u32=3, u64=3}}}, 64, 59743) = 1 
clock_gettime(CLOCK_MONOTONIC, {432334, 891681998}) = 0 
recvfrom(3, "&\0\0\0\33\5\0\0\0\0\0\0\0\0\0\0audit(1233595754"..., 8988,
MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 54

writev(5, [{"\0\0\0\0\20\0\0\0\33\5\0\0&\0\0\0", 16},
{"audit(1233595754.282:836040):  c"..., 38}], 2) = 54

And so it goes. Nothing leaps out at me from this tight code sequence
that it's glomming memory.

I wondered if this odd behavior was exhibited back in RHEL 5.2 and we
just never noticed. So, I installed RHEL 5.2 in a virtual machine,
installed all the patches, and installed Snare. After several hours of
going at it, auditd has not moved up the chart at all. 

So, it appears to be an auditd/Snare interaction on RHEL 5.3. I have a
RHEL 5.3 system on which I'm experimenting with auditd, rulesets and
prelude (no Snare). The ruleset has a lot of rules in it by comparison
(> 200) and everybody is well-behaved. 


Any thoughts on what may be happening?

Best Regards,

Gary Smith

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-audit/attachments/20090204/8537ae4f/attachment.htm>


More information about the Linux-audit mailing list