Lost events during boot

Tue Mar 21 11:30:29 UTC 2017

On Tue, Mar 21, 2017 at 4:04 AM, Richard Guy Briggs <rgb at redhat.com> wrote:
> On 2017-03-20 10:44, Paul Moore wrote:
>> On Mon, Mar 20, 2017 at 8:08 AM, Paul Moore <paul at paul-moore.com> wrote:
>> > On Sun, Mar 19, 2017 at 9:46 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>> >> Hello Richard and Paul,
>> >>
>> >> I was going to do a blog write up about booting the system with
>> >> audit_backlog_limit=8192 for STIG users and have stumbled on to a mystery. The
>> >> kernel initializes the variable to 64 at power on. During boot, if audit == 1,
>> >> then it holds events in the hopes that an audit daemon will show up later and
>> >> drain all the events. Anything over 64 events should fall off the end and
>> >> increment the lost counter and put a notice in syslog.
>> >>
>> >> However, when booting with audit_backlog_limit=8192, as soon as I log in I run
>> >> "auditctl -s" I can see I've lost 73 events. The I run "aureport --start boot"
>> >> and I see 644 total events. This is nowhere near the 8192 limit that I asked
>> >> for. So, why am I losing events?
>> >>
>> >> Additionally, I checked the logs and there is absolutely no message in syslog
>> >> showing that I've lost events. This is with failure mode set to 1 - which is
>> >> default at power on. And this is in spite of the the fact that the source code
>> >> seems to show that it should have printk'ed something.
>> >>
>> >> Any ideas? Can you replicate this finding?
>> >
>> > It's funny, I just noticed this for the first time on Friday (the
>> > exact same lost count too), although it was a development kernel build
>> > with a *heavily* modified audit subsystem so I just assumed I had
>> > broken something with the queuing, the lost counter, or both.  It's
>> > possible I still may have broken something in the v4.10 queue rework,
>> > or something broke a long time ago and we are just noticing it now.
>> >
>> > First off, can you create a GitHub issue for this and include your
>> > kernel build (e.g. 'uname -r')?  Second, if you are seeing this on a
>> > +v4.10 kernel, do you see the same results with a +v4.9 kernel?
>>
>> Quick follow-up, and completely untested, but it would appear that the
>> problem lies in kauditd_hold_skb()/kauditd_print_skb();
>> kauditd_print_skb() registers a false lost record when the printk
>> ratelimit is tripped.  The fix is rather simple, and I'll include that
>> in an upcoming patchset.
>
> Can you make a seperate patch for that in the patchset, or clearly
> identify the problem and the fix in the larger patch?

It is mentioned in the patch description of the larger patch that I'm
going to send to stable for +v4.10.  Assuming testing goes well this
morning I'll be posting the patch as an RFC later today, and if no one
spots anything serious I'll drop the RFC tag and send it up to Linus
later in the week.

> This does seem like a stable fix to me.

We disagree.

-- 
paul moore
www.paul-moore.com