question on audit_backlog settings and how to prevent the sytem from hanging due to audit overload

Fri Oct 7 12:28:37 UTC 2011

On Thursday, October 06, 2011 04:27:03 PM larry.erdahl at usbank.com wrote:
> I have a 5.4 Redhat that I'm using Snare to control the audit rules with.
> Recently this server hung on me and pointed to the SnareDispatcher as the
> cause. You can see from the samples below the dispatcher was running at 99
> - 100%.
> The morning of the hang Auditd  peaked at  ~200,000 event's/hour, up from
> ~50,000 events per hour. Is there away to protect the server from hanging
> during unexpected loads like this?
> 
> I'm assuming from what I've read, I'll need to increase the audit_backlog
> level to something higher. Before increasing the number of buffers  I'd
> like to get a clearer understanding of their size and  how increasing
> these buffers my impact my over all system performance. Are there any
> recommendations on what the settings should be or a formula that I could
> use to determine the proper setting.

What the kernel sends to user space is a data structure like this:

#define MAX_AUDIT_MESSAGE_LENGTH    8970 // PATH_MAX*2+CONTEXT_SIZE*2+11+256+1
struct audit_message {
        struct nlmsghdr nlh;
        char   data[MAX_AUDIT_MESSAGE_LENGTH];
};

This is in a skb, so there is probably some more memory used for skb bookkeeping. You 
might just round that off to 9000 bytes and be close enough for practical purposes. 
Increasing the backlog limit means that the kernel allocates this memory and its no 
longer available for user space. With the size of memory in current hardware, I don't 
think you have to worry too much as long as the setting is sane. A backlog length of 
8192 means it occupies a little over 70 Mb of memory. But if you need to do this, you 
need to do this.

> I am looking into what may of caused the spike, but I'd like to know what
> my options to keep from having another system hang

Do you use keys for your audit rules? If so, run the key report to get an idea of what 
was happening. From that you can zero in on what it was. You may also have a rule that 
is too aggressive in logging. For example, perhaps you record file deletions in /usr/* 
and then a yum update comes a long....overwriting and deleting thousands of files in a 
few seconds.

> Any help would be appreciated

Another possibility is increasing the audit daemon's priority a little and make sure 
its disk performance is tuned.

> Sep 30 01:29:16 <servername> kernel: audit: audit_backlog=321 >
> audit_backlog_limit=320

This is the default setting. Its a bit low for production use. I'd bump that up a lot. 
Make it at least 4096 if not 8192.

-Steve