[PATCH] audit: improve robustness of the audit queue handling

Wed Dec 15 18:25:19 UTC 2021

On Wed, Dec 15, 2021 at 12:53 PM Richard Guy Briggs <rgb at redhat.com> wrote:
> On 2021-12-13 13:31, Paul Moore wrote:
> > If the audit daemon were ever to get stuck in a stopped state the
> > kernel's kauditd_thread() could get blocked attempting to send audit
> > records to the userspace audit daemon.  With the kernel thread
> > blocked it is possible that the audit queue could grow unbounded as
> > certain audit record generating events must be exempt from the queue
> > limits else the system enter a deadlock state.
> >
> > This patch resolves this problem by lowering the kernel thread's
> > socket sending timeout from MAX_SCHEDULE_TIMEOUT to HZ/10 and tweaks
> > the kauditd_send_queue() function to better manage the various audit
> > queues when connection problems occur between the kernel and the
> > audit daemon.  With this patch, the backlog may temporarily grow
> > beyond the defined limits when the audit daemon is stopped and the
> > system is under heavy audit pressure, but kauditd_thread() will
> > continue to make progress and drain the queues as it would for other
> > connection problems.  For example, with the audit daemon put into a
> > stopped state and the system configured to audit every syscall it
> > was still possible to shutdown the system without a kernel panic,
> > deadlock, etc.; granted, the system was slow to shutdown but that is
> > to be expected given the extreme pressure of recording every syscall.
>
> I assume that in the configuration state of f=2, it would still panic if
> it was not able to deliver messages.

Yes, this patch doesn't really change any of the lost record behavior,
that is all preserved, it basically just makes sure that
kauditd_thread() isn't blocked when the audit daemon isn't able to
receive audit records.  Further, short lived audit daemon stoppages
shouldn't result in lost records either given a properly configured
system with a sufficient backlog as the retry mechanisms/queues are
still intact.  However, if you send a SIGSTOP to the audit daemon and
proceed to flood the audit subsystem with records, you're going to see
some lost records :)

-- 
paul moore
www.paul-moore.com