Lost events during boot

Mon Mar 20 19:25:34 UTC 2017

On Mon, Mar 20, 2017 at 11:05 AM, Steve Grubb <sgrubb at redhat.com> wrote:
> On Monday, March 20, 2017 8:08:27 AM EDT Paul Moore wrote:
>> On Sun, Mar 19, 2017 at 9:46 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>> > Hello Richard and Paul,
>> >
>> > I was going to do a blog write up about booting the system with
>> > audit_backlog_limit=8192 for STIG users and have stumbled on to a mystery.
>> > The kernel initializes the variable to 64 at power on. During boot, if
>> > audit == 1, then it holds events in the hopes that an audit daemon will
>> > show up later and drain all the events. Anything over 64 events should
>> > fall off the end and increment the lost counter and put a notice in
>> > syslog.
>> >
>> > However, when booting with audit_backlog_limit=8192, as soon as I log in I
>> > run "auditctl -s" I can see I've lost 73 events. The I run "aureport
>> > --start boot" and I see 644 total events. This is nowhere near the 8192
>> > limit that I asked for. So, why am I losing events?
>> >
>> > Additionally, I checked the logs and there is absolutely no message in
>> > syslog showing that I've lost events. This is with failure mode set to 1
>> > - which is default at power on. And this is in spite of the the fact that
>> > the source code seems to show that it should have printk'ed something.
>> >
>> > Any ideas? Can you replicate this finding?
>>
>> It's funny, I just noticed this for the first time on Friday (the
>> exact same lost count too), although it was a development kernel build
>> with a *heavily* modified audit subsystem so I just assumed I had
>> broken something with the queuing, the lost counter, or both.  It's
>> possible I still may have broken something in the v4.10 queue rework,
>> or something broke a long time ago and we are just noticing it now.
>>
>> First off, can you create a GitHub issue for this
>
> Lost events during boot #38.

See it, thanks.

>> and include your kernel build (e.g. 'uname -r')?
>
> # uname -r
> 4.9.13-101.fc24.x86_64

Well, at least I can say I didn't break it with the queue rework ;)

>> Second, if you are seeing this on a +v4.10 kernel, do you see the same
>> results with a +v4.9 kernel?
>
> Yes, and I tried a 4.8.10 and see it there as well.
>
> I then checked a 3.10 RHEL 7 kernel and don't see any lost events and that
> even has a backlog_limit of the default of 64.
>
> I then found a system with a 4.5.5 kernel and it also was losing events.

It looks like it has been broken for a while.  Since it was related to
this mega-patch I'm currently testing which fixes netns/locking/queue
problems, I hope to post it to the list within the next day or two and
I'm going to mark it as stable for v4.10+ so the latest kernels will
get the fix, but I'm not going to worry about kernels earlier than
that since it isn't something I would consider worthy of -stable by
itself.

-- 
paul moore
www.paul-moore.com