System hangs using audit-0.9.9 (and few versions before)

Fri Jun 24 22:20:37 UTC 2005

On Fri, 2005-06-24 at 17:58 -0400, Amy Griffis wrote:
> I have a little more info on the long pathname hang that Loulwa found.
> I was able to reproduce it several times on a 2 CPU x86_64 box this
> afternoon.  I'm running with the 0.70 kernel and the 0.9.13 tools.

Thanks. The 'long pathname' part of it really ought to be a red herring
-- if you look at the strace you'll see the auditctl aborts without ever
trying to send the watch to the kernel. It makes the decision about
length on its own. Is this _really_ necessary to reproduce the problem?

This looks like a spinlock deadlock between auditctl and the
audit_list_rules thread. But I really can't see anywhere that those two
would be contending for any locks. This isn't easy to debug without
knowing _where_ each CPU is, I'm concerned that SysRq-P isn't working --
does it work _before_ you get the system into this state?

Do you have other options, like crash dump? Reproducing it on a PPC64
LPAR might let us poke at it more useful from the hypervisor.

We could try just adding printks throughout the audit_list_rules()
function so that we can attempt to see how far it got, I suppose...

Can you confirm my understanding of the steps required to reproduce
this... this ought to suffice? 

while true ; do  \
  /sbin/auditctl -w /tmp/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -k good-key  ; \
  /sbin/service auditd restart ; \
done

It's running on my dual i686 test box now...

-- 
dwmw2