Performance of libauparse

Tue Sep 30 22:18:41 UTC 2008

John Dennis wrote:
> I also agree the data stream which emerges from audit is rather 
> difficult to work with. Eric likes to point out we can't change the 
> kernel, so maybe what we really need (and has been proposed) is for 
> auditd to reformat the data before emitting it or writing it do disk 
> (e.g. assemble records into events, decode strings which have been 
> hexified, etc.) Currently auparse is responsible for much of this as 
> part of a post processing step which has to be repeated every time audit 
> data is read instead of just once as it emerges from the kernel. If 
> instead the auparse user level code was folded into auditd which then 
> became responsible for formatting the ad hoc data received from the 
> kernel the final output from audit could be much more friendly and much 
> of the rationale for auparse would evaporate.

I was going to request going the other way with libauparse, i.e. to 
entirely separate it from auditd. As I mentioned, I'm not using auditd 
because it wasn't really written with my customer's requirements in mind 
(high volume, no local storage). My audit daemon needs to run on RHEL 3 
(it has a LAuS backend too) and RHEL 4. I don't see anything 
architecturally which ties libauparse to auditd, so if it was a separate 
library I could recompile it for RHEL 4 without replacing the RHEL 4 
audit-libs, etc. I can certainly see the efficiency in auditd parsing 
data before handing it off to dispatchers, but it's not hard to 
construct non-auditd uses for it either. Of course, it would need some 
performance work first for my use case, but I wouldn't want to duplicate 
the effort unnecessarily.

On the more general topic of the format of data emitted by the kernel, I 
see 2 serious threads of problem presented by the above, and by the 
current solution (even though they are currently the most pragmatic):

1. libauparse only exists to reverse engineer a really bad protocol.
2. The existing protocol has already broken userspace many times.

On that second point, the changes since the protocol was introduced 
(pre-git history, so I can't work out when) have been such that any tool 
written at the time of 2.6.12 couldn't possibly expect to continue to 
function correctly if you updated the kernel underneath it. Some examples:

bccf6ae083318ea08094d6ab185fdf7c49906b3a
"audit_rate_limit=%d old=%d by auid %u" -> "audit_rate_limit=%d old=%d 
by auid=%u"

9e45eeac867d51ff3395dcf3d7aedf5ac2812c8
Add escaping to comm field

a6c043a887a9db32a545539426ddfc8cc2c28f8f
Add tty field without quotes or escaping of value

ac03221a4fdda9bfdabf99bcd129847f20fc1d80
Remove qbytes field from IPC record
Change iuid, igid field names

5b9a4262232d632c28990fcdf4f36d0e0ade5f18
Convert some hex IPC records to octal

de6bbd1d30e5912620d25dd15e3f180ac7f9fcef
Change to format of EXECVE messages

Auditd only continues to function because it has been updated in step 
with the kernel: it is 'special'. Upstream's opinion on this is fairly 
clear. Note this isn't an argument in favour of a binary format 
specifically (although I favour that for efficiency), but it does 
highlight the requirement for a new, well-designed format.

Matt
-- 
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M:       +44 (0)7977 267231
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490