Occasional delayed output of events

Mon Jan 4 07:55:25 UTC 2021

On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote:
> On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote:
> > Sometimes, events recorded in /var/log/audit/audit.log appear some seconds
> > past co- located events which results in auparse:au_check_events() marking
> > these events complete before they are. An example of this can be seen
> > below with the offending event id 44609.
> > 
> > This has been plaguing me for a year or two and this morning was the first
> > time I still had access to the raw audit.log files (I monitor a lot of
> > event types and the log files roll over fairly quickly).
> > The example below is from a fully patched Centos 7 but I have also seen
> > this on a patched Fedora 32.
> > 
> > Has this been seen before? Do we need to re-evaluate how auparse
> > 'completes' an event (ie 2 seconds is too quick).
> 
> I have never seen this. But on the way to disk, auditd only does light 
> processing of the event.  If the format is enriched, it looks things up on a 
> record by record basis. It does not collect events until they are complete - 
> it dumps it to disk as soon as it can tack on the extra information.
> 
> So, the question would be, does this delay happen on the way to disk? Or is 
> this an artifact of post processing the logs with an auparse based utility? 
> Can this be observed repeatedly on the same raw logs? If so, then maybe 
> auparse does have some issue. But if this is a post processing issue, then 
> the wall clock doesn't matter because this event should have collected up 
> together.
> 
> I'd say this merits some investigation.

OK. I think this needs to be addressed on two fronts. There may be more.
A.  Within post processing ... a 2 second timeout is not sufficient. I would suggest we modify auparse.c:au_check_events() to
  i) perform the event type checks first, then
  ii) increase the timeout of 2 seconds to be a larger value based on empirical tests.

B. I will build a temporary auditd daemon to perform some empirical testing to see how long events can reside within the daemon. I may need some advice on this.
I assume that the code that sets the timestamp is in src/auditd.c:send_audit_event(). If so, I will see if I can put orchestration debug code in to monitor an event's
'time in daemon' until this point. I will then report on this.

I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread, then the testing switch in A. will not be a big issue (time cost wise). It will also mean that if we
over compensate the timeout that would cause additional memory cost in auparse() then this is mittigated.

With respect to 'There may be more' fronts. Are there other points in the 'audit ecosystem' that makes use of the '2 second timeout'.

I will start work on this, this coming weekend if the above makes sense.

Regards
Burn

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-audit/attachments/20210104/5fb315a1/attachment.htm>