lost events on boot

Tue Dec 8 00:56:40 UTC 2020

On 2020-12-07 16:28, Lenny Bruzenak wrote:
> Apologies if this has been answered. I searched and found some
> relevant-looking dialog 2 years ago (on 12/14/2018) that Paul/RGB/Ondrej
> were discussing, however I do not see the answer.
> 
> I'm running userspace 2.8.5 , kernel 3.10.0-1160.
> 
> I have boot parameters "audit=1 ... audit_backlog_limit=8192" .
> 
> Immediately after boot, I use "auditctl -s and see hundreds (varies, between
> 119-330) of lost records.
> 
> 
> So I cleaned out all the audit data, rebooted again and examined the events.
> 
> They are numbered sequentially 1-515. I counted the events and they match
> (515).
> 
> 
> So my questions are these:
> 
>  * Is this "lost" value accurate?

Not entirely on that vintage of kernel.  It counted a lost message even
if it was later delivered via the audit_skb_hold_queue, IIRC.  Paul
re-did the queues to avoid this false report.  That change went into
v4.10-rc1:
	2016-12-14 c6480207fdf7 ("audit: rework the audit queue handling")
It was too disruptive to backport to the 3.10.0-xxx vintage kernel you
are running.

>  * If the numbering doesn't indicate any gaps, what does that tell me?

Messages that went through the hold queue, IIRC.

>    The kernel is supplying the serial number (right?), so is it
>    discarding the events without assigning a serial number?

Yes, the kernel assigns the serial numbers.  Sometimes.  Some buffers
never get allocated and therefore no serial number assigned due to full
queues or memory pressure.  Other buffers get dropped when queues are
full and there is no choice but to drop a message.  This is true before
and after Paul's queue re-write.

>  * Do I have something wrong with my kernel boot parameters?

Not likely.  From what you have described above it sounds like you have
done what you can.

> I'd have thought that 8k buffers would be enough, and certainly if I only
> have 515 events, should be. Unless, each record inside the event is adding.

If your kernel command line is larger than your lost count and your
serial number when you check it after boot, you should be in good shape.

> I also then counted each record, not just events, and got around 1600, so
> I'd have thought that even multi-record events would have fit. I guess that
> depends on the buffer size.

Good thinking, and you are correct.  That backlog limit may need to be
increased for more recent kernels since there are more events caught and
some events have more records.

> Appreciate the help in advance; thanks.

I hope this helps.

> LCB

- RGB

--
Richard Guy Briggs <rgb at redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635