Auditd errors on busy hosts when rolling over log files

Tue Nov 5 11:07:08 UTC 2013

On Mon, 2013-11-04 at 08:24 -0500, Steve Grubb wrote:

Thanks Steve.

I did a little experimentation today.

On a system that generates around 7500 audit events every five minutes I
changed, without success, the following:

In auditd.conf
- changed num_logs from 9 to 5 although I didn't expect a change as I
move out the rolled over (audit.log.?) log files as part of the
processing so there shouldn't be a big file rename impost
- changed priority_boost from 4 to 8

In audit.rules
- changed backlog from 32K to 64K to 96K to 128K
- changed rules to reduce the recorded events per 5 minute interval from
7500 to 500-600 for the same period.

This particular system is running audit-1.8.2-el5 but I see a similar
problem on a RHEL 6.4 box which I believe is running audit-2.2-2.el6.

I did note that if I executed the sync(1) command before signaling
auditd to roll over (ie execute /bin/kill -s USR1 pid) the error
SOMETIMES did not appear.

So I am a little bit lost.

I believe that the actual effect is just
- the cost of two additional lines in /var/log/messages
- the loss a few logs

My actual process is to
a. roll over the log file
b. run an ausearch --interpret like command

Perhaps my alternative is to modify my ausearch-like command to be state
full and have it process only new events as per a patch I made to
ausearch some time back

        Subject: 	[PATCH] ausearch: Add checkpoint capability and have
        incomplete logs carry forward when processing multiple audit.log
        files
        Date: 	05/11/2013 03:59:34 PM

Am open to any suggestions ... I think the key issue is that I reduced
the generated commends into audit.log from 7500 to 600 per five minute
interval but I still see the error.

Rgds
> On Monday, November 04, 2013 07:46:18 PM Burn Alting wrote:
> > Hi,
> > 
> > I have some quite busy hosts, that emit the following errors when I
> > request the audit log file is rolled over (via a kill -s USR1
> > auditdpid).
> > 
> >   Error receiving audit netlink packet(No buffer space available)
> >   Error sending signal_info request (No buffer space available)
> > 
> > >From reading earlier posts (circa 2009) it would appear my options are
> > 
> > a. Increase backlog buffer (currently 32768)
> > b. Increase priority_boost (currently 4)
> > c. Reduce the number of log files (currently 9)
> 
> Another corollary to this is that you can increase the file size and decrease 
> the total files which would help on rotation. 
> 
> 
> > Does anyone have a feel for which of the above should offer the best
> > return?
> 
> There are 2 more options:
> 
> 1) Review the rules to make sure you are not getting events that you really do 
> not need. If you have a lot of false positives, then you might add some 
> arguments that better narrow the results. For example, perhaps you have this 
> rule:
> 
> -a always,exit -F arch=b64 -S clock_settime -k time-change
> 
> This can give a lot of false positives. The one that really matters is when a 
> program sets CLOCK_REALTIME (the wall clock). So, the rule can be re-written 
> as:
> 
> -a always,exit -F arch=b64 -S clock_settime -F a0=0 -k time-change
> 
> which narrows its scope.
> 
> 2) You might experiment with cgroups.
> 
> 
> > Are their other configuration parameters I could adjust (aside from
> > changing my ruleset in audit.rules)?
> 
> There might be general disk tuning parameters in sysctl that could help as 
> well. Choice of file system also has performance impacts. I haven't done any 
> experimenting on the performance side, but I know there are people here that 
> also have very busy systems.
> 
> -Steve