need debug suggestions on system freeze

LC Bruzenak lenny at magitekltd.com
Fri May 9 19:58:45 UTC 2008


I need some suggestions for debugging an issue I'm having.
I have a Dell Vostro laptop I've been using successfully for a while
(details below). It has some user apps running but doesn't seem
overburdened. I am running mls policy in permissive mode.

However, recently the following happens:

PART 1 (prelude relay disabled):
* audit is enabled, there are 2 audisp plugins (prelude and af_unix).
* The audispd.conf q_depth = 128

* I go to our project source directory and start an "svn up"

* In another window as root I "tail -f /var/syslog/messages":
...
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:40 comms audispd: queue is full - dropping event
May  9 13:51:42 comms auditd[3629]: Audit daemon rotating log files with
keep option
May  9 13:52:10 comms prelude-manager: WARNING: Failover enabled:
connection error with 192.168.31.120:4690: Connection timed out

* Very soon after this the machine locks up. The above is the last entry
in the messages log. Only the "caps lock" and some other "lock" icon on
the keyboard (but not scroll lock) flash, and I have no inbound network
connection & the screen is blank. I cannot get to a terminal with
<ALT><F4> . The only option is power cycle.

* After reboot, if I "service auditd stop" then repeat the svn stuff
there is no freeze, no messages. I suspect it is something with file
traversals and the audit dispatcher/prelude. It also happened once when
doing a "rm -rf " on a directory with many files under my home
directory.

* I purposely have a lot of audit logs left in the directory:
[root at hugo ~]# ls -1 /var/log/audit | wc -l
90
* I purposely have the prelude parent manager (relay-to machine)
disabled.
* The machine was not exceptionally busy in userland according to the
"top" I had running in another window. Here is the header from that (the
"top" process was running, all others sleeping):
top - 13:52:40 up 19 min,  3 users,  load average: 0.14, 0.16, 0.12
Tasks: 156 total,   1 running, 155 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.3%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:   2060944k total,   912324k used,  1148620k free,   210276k buffers
Swap:  6835000k total,        0k used,  6835000k free,   240760k cached

* The freeze-up happens faster (I believe) if I leave the audispd.conf
q_depth = 80 (default).

Details:

[root at hugo ~]# uname -a
Linux hugo 2.6.25-14.fc9.x86_64 #1 SMP Thu May 1 06:06:21 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux

[root at hugo ~]# rpm -qa | grep audit-
audit-libs-1.7.2-6.fc9.i386
audit-1.7.2-6.fc9.x86_64
audit-libs-1.7.2-6.fc9.x86_64
...

* I have lots of audit rules, plan to add more:
[root at hugo ~]# auditctl -l | wc -l
84

* Disk is not full:
[root at hugo ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      108G  7.2G  101G   7% /
/dev/sda1             190M   20M  161M  11% /boot
tmpfs                1007M   48K 1007M   1% /dev/shm


PART 2:
So - then I enabled the relaying prelude-manager. The svn update got
farther, and I thought maybe that was the cause of the original problem.
However, I saw this first in the messages log:
...
May  9 15:31:36 comms audispd: queue is full - dropping event
May  9 15:31:36 comms audispd: queue is full - dropping event
May  9 15:31:36 comms audispd: queue is full - dropping event
May  9 15:31:36 comms audispd: queue is full - dropping event
May  9 15:31:38 comms audispd: queue is full - dropping event
May  9 15:31:38 comms audispd: queue is full - dropping event
May  9 15:31:38 comms audispd: queue is full - dropping event
May  9 15:31:38 comms audispd: queue is full - dropping event
May  9 15:31:38 comms auditd[3682]: Audit daemon rotating log files with
keep option
May  9 15:31:43 comms auditd[3682]: Audit daemon rotating log files with
keep option
May  9 15:31:48 comms auditd[3682]: Audit daemon rotating log files with
keep option
May  9 15:31:53 comms auditd[3682]: Audit daemon rotating log files with
keep option

Then the same freeze-up happens as described above.

Any suggestions or other data I can provide to help debug?

In the meantime I will increase the audispd.conf q_depth and retest.

Thx,
LCB.

-- 
LC (Lenny) Bruzenak
lenny at magitekltd.com




More information about the Linux-audit mailing list