Lost events during boot

Mon Mar 20 14:55:43 UTC 2017

On Mon, Mar 20, 2017 at 10:44 AM, Paul Moore <paul at paul-moore.com> wrote:
> On Mon, Mar 20, 2017 at 8:08 AM, Paul Moore <paul at paul-moore.com> wrote:
>> On Sun, Mar 19, 2017 at 9:46 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>>> Hello Richard and Paul,
>>>
>>> I was going to do a blog write up about booting the system with
>>> audit_backlog_limit=8192 for STIG users and have stumbled on to a mystery. The
>>> kernel initializes the variable to 64 at power on. During boot, if audit == 1,
>>> then it holds events in the hopes that an audit daemon will show up later and
>>> drain all the events. Anything over 64 events should fall off the end and
>>> increment the lost counter and put a notice in syslog.
>>>
>>> However, when booting with audit_backlog_limit=8192, as soon as I log in I run
>>> "auditctl -s" I can see I've lost 73 events. The I run "aureport --start boot"
>>> and I see 644 total events. This is nowhere near the 8192 limit that I asked
>>> for. So, why am I losing events?
>>>
>>> Additionally, I checked the logs and there is absolutely no message in syslog
>>> showing that I've lost events. This is with failure mode set to 1 - which is
>>> default at power on. And this is in spite of the the fact that the source code
>>> seems to show that it should have printk'ed something.
>>>
>>> Any ideas? Can you replicate this finding?
>>
>> It's funny, I just noticed this for the first time on Friday (the
>> exact same lost count too), although it was a development kernel build
>> with a *heavily* modified audit subsystem so I just assumed I had
>> broken something with the queuing, the lost counter, or both.  It's
>> possible I still may have broken something in the v4.10 queue rework,
>> or something broke a long time ago and we are just noticing it now.
>>
>> First off, can you create a GitHub issue for this and include your
>> kernel build (e.g. 'uname -r')?  Second, if you are seeing this on a
>> +v4.10 kernel, do you see the same results with a +v4.9 kernel?
>
> Quick follow-up, and completely untested, but it would appear that the
> problem lies in kauditd_hold_skb()/kauditd_print_skb();
> kauditd_print_skb() registers a false lost record when the printk
> ratelimit is tripped.  The fix is rather simple, and I'll include that
> in an upcoming patchset.

... and a quick question, if the kernel is booted without "audit=1" do
we want to count lost records in the case where the backlog overflows?

-- 
paul moore
www.paul-moore.com