Occasional delayed output of events

Tue Jan 19 20:42:55 UTC 2021

On 2021-01-19 14:11, Paul Moore wrote:
> On Tue, Jan 19, 2021 at 3:18 AM Burn Alting <burn.alting at iinet.net.au> wrote:
> > I tend to have a rigorous auditing posture (see the rules loaded in https://github.com/linux-audit/audit-userspace/issues/148) which is not normal for most. Perhaps, Paul, you have hit the nail on the head by stating that this 'severe delay' is not that unreasonable given my rules posture and we just need to 'deal with it' in user space.
> > We still get the event data, I just need to adjust the user space tools to deal with this occurrence.
> > As for what the system is doing, in my home case it's a Centos 7 VM running a tomcat service which only gets busy every 20 minutes and the other is a HPE Z800 running Centos 8 with 4-5 VM's mostly dormant. I can put any code in these hosts to assist in 'validating'/testing the delay. Advise and I will run.
> 
> I took a (slightly) closer look at the queuing code just now and the
> corner case I suspected doesn't look very promising here.  On failure
> to send a record to userspace, the record is put back at the front of
> the queue to preserve ordering so the kernel *should* still emit
> records in order even when auditd is under pressure, dead, or in the
> process of a restart.  Also, in this case the kernel kicks the auditd
> connection, disconnecting auditd's netlink socket; I imagine that
> would be a notable event on your systems.
> 
> However, looking at the timestamps in the audit events you posted, I
> noticed something a little odd.  If I align the timestamps a bit
> better let's see if it jumps out at you guys ...
> 
> type=XXX msg=audit(1609519900.159:44606): ...
> type=XXX msg=audit(1609519900.161:44607): ...
> type=XXX msg=audit(1609519900.163:44608): ...
> type=XXX msg=audit(1609519896.829:44609): ...
> type=XXX msg=audit(1609519900.170:44610): ...
> 
> Let me guess Burn, you're running NTP/PTP :)  I'm not sure how auditd
> handles things like this, but it looks like there was a small negative
> time correction between events which caused the odd scenario where
> event N+1 actually occurred before event N according to the wall
> clock.  In other words, your system is time traveling ;)

Interesting...

The timestamp is assigned on syscall entry.
The serial number is assigned on the creation of the first audit record of an event.

>From these timings above, NTP/PTP could explain this, but the third and
fifth are too close together to make that likely, I think.

I'd blame time namespaces, but the host system is all in the same time
namespace.

I think I'd need to see a larger sequence to be convinced of this...

> paul moore

- RGB

--
Richard Guy Briggs <rgb at redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635