Kernel audit output is inconsistent, hard to parse

Wed Jan 30 16:19:37 UTC 2008

Steve Grubb wrote:
> On Tuesday 29 January 2008 17:56:36 John Dennis wrote:

> Hence the audit parsing library. The idea is to abstract this away so that 
> anyone wanting to write a tool does not need to study all the messages and 
> figure out the parsing rules.

 > The way forward has to be the audit parsing library.

The problem is auparse is just as screwed as anybody else. Unparseable 
output is is just plain wrong and inexcusable. You're suggesting auparse 
embed all sorts of hacks and heuristics to unravel a problem which 
should never exist in the first place. It's a house of cards which in 
time will collapse. You also haven't explained how auparse is going to 
deal with log data generated by different kernel versions, especially 
when logs are aggregated.

> tools developed around these messages and making wholesale changes will break 
> them.

Break what is already fundamentally broken? That's not an answer ;-)

> Any fix will break someone's tool somewhere unless they are coded to the audit 
> parsing library.

auparse is going to break too. The current situation is you can't 
determine if a field is encoded or not by reading the output, you also 
have to know the kernel source code, that's wrong.

>> Auparse is not the answer to irregular kernel audit message

> This is the answer in so many ways. In order to make any change, you have to 
> decouple applications from the actual data structure. You cannot normalize 
> the data without breaking somebody somewhere. 

Which is why making the output so it can be parsed independent of the 
kernel version an essential requirement.

> For example, suppose we all agreed the data structure is an abomination and 
> had to be fixed. We get all the code into 2.6.26 kernel. meanwhile Fedora 9 
> is released on the 2.6.24 kernel. We get the user space pieces fixed up to be 
> released at the same time as 2.6.26. Then Fedora steps up to 2.6.25 kernel 
> and then ultimately 2.6.26. The userspace in Fedora 9 was never intended to 
> work with the new format. We can't keep the kernel team from doing what's 
> right for everyone that wants new device drivers. We're stuck.

You're only stuck if the output can only be parsed by one version, if 
the output were regular the problem goes away. Isn't that the desired 
result?

>> auparse_get_field_str() returns the field value in it's encoded form,

> I would chose the words, raw form.

Yes, raw is a better term. Some raw values are encoded, some aren't, 
that's the problem.

>> this is almost never of value to the caller. The caller wants the
>> field value to be unencoded so it can operate on it.
> 
> Sometimes. It depends on the situation.

Very rarely. As an analogy 99.99% of the time you want your email client 
to decode the contents from the transfer encoding it arrived in, 
otherwise it's just gibberish. Raw form is really only useful when 
debugging the encoding/decoding.

>> If you want the field value to be unencoded you have to call
>> auparse_interpret_field().
> 
> Correct.

>> But auparse_interpret_field() performs two distinctly different operations,
> 
> It does only one thing, that is translate the data from raw to interpreted 
> form.

Wrong :-) It does two entirely different things and those operations 
cannot be separated. The two operations are:

1) decoding (e.g. decoding a field value encoded in hexadecimal form 
back into it's original string)

2) interpretation (e.g. translating a uid field into a username). I call 
this interpretation "contextual substitution" because it's taking a 
field value and substituting in another value, often in a different 
format. You cannot interpret a field value until it has been decoded.

What if I don't want auparse to change the field value and instead 
simply return the field value? Currently you can't simply get the field 
value! Why? Because some fields are encoded, so you either get the raw 
encoded value (which is meaningless 99.99% of the time, if it had been 
encoded) or you get something which is completely munged.

> So, John, if you want selinux format changes, complain on their mail list. 
> I've already done that and lost. :)

FWIW, I can live with not changing the message contents. But no one can 
live with a situation where the data can't be parsed, it is simply 
wrong. Just to be clear the problem is you can't determine as one parses 
if a field value is encoded or not which means you can't decide if it 
has to be decoded or not.

Here is an example from the real world, an audit message has this field

comm=df

So is the value the string "df" (e.g. disk free) or is this the 
hexadecimal encoded byte value 223? The only way to know is by looking 
at the kernel source code and knowing that the "comm" field in a 
specific audit record is generated by calling 
audit_log_untrustedstring(). What if it doesn't call that in an 
different kernel version? What if a new field is added in a new kernel 
version, how will the parser know what which function kernel used to 
generate the string? What if in one kernel version the string was output 
with audit_log_untrustedstring() but in another kernel version it wasn't?

-- 
John Dennis <jdennis at redhat.com>