report audit wait metric in audit status reply

Lenny Bruzenak lenny at magitekltd.com
Tue Dec 8 16:57:05 UTC 2020


On 7/1/20 3:32 PM, Max Englander wrote:

> In environments where the preservation of audit events and predictable
> usage of system memory are prioritized, admins may use a combination of
> --backlog_wait_time and -b options at the risk of degraded performance
> resulting from backlog waiting. In some cases, this risk may be
> preferred to lost events or unbounded memory usage. Ideally, this risk
> can be mitigated by making adjustments when backlog waiting is detected.
>
> However, detection can be diffult using the currently available metrics.
> For example, an admin attempting to debug degraded performance may
> falsely believe a full backlog indicates backlog waiting. It may turn
> out the backlog frequently fills up but drains quickly.
>
> To make it easier to reliably track degraded performance to backlog
> waiting, this patch makes the following changes:
>
> Add a new field backlog_wait_sum to the audit status reply. Initialize
> this field to zero. Add to this field the total time spent by the
> current task on scheduled timeouts while the backlog limit is exceeded.
>
> Tested on Ubuntu 18.04 using complementary changes to the audit
> userspace:https://github.com/linux-audit/audit-userspace/pull/134.

Max,

Along those lines, the current failure actions (silent, printk, panic) 
are kind of restrictive. I guess one can filter on the printk messages 
and redirect those to a userspace handler which might do something 
specific to the operating environment? Is that how you would handle it? 
Or would your admin just look for it and report? If you have any 
shareable info I'd appreciate seeing it. Almost no one I can think of 
would want a panic to happen, but only almost. No one who needs some 
level of assurance would want "silent".

FWIW I looked at the kernel printk calls, and although I maybe looked at 
the wrong one, even though on boot I'm seeing drops from the "auditctl 
-s", I do not see any output in the dmesg buffer that appears to be 
indicative of this. My guess there is that on bootup, the auditd 
userspace config has not yet been activated and it's likely using a 
"silent" default...but here I realize was not the focus of your effort. 
Just musing.

I applaud your efforts in this area, and if you are able to share any 
practices about handling the backlogs I'd appreciate seeing that.

V/R,

LCB

-- 
Lenny Bruzenak
MagitekLTD




More information about the Linux-audit mailing list