Kernel audit output is inconsistent, hard to parse

Wed Jan 30 14:26:58 UTC 2008

On Tuesday 29 January 2008 5:56:36 pm John Dennis wrote:
> The format of audit messages from the kernel is a mess. The
> bottom line is one cannot parse the audit messages without special
> case knowledge of each audit message because the data formatting does
> not follow any regular rules.
>
> I don't know how it got this way, but it really needs to be fixed.

I agree.

> Suggested Fix:
> --------------
>
> Most of these problems can easily be fixed if there is exactly one
> central place to format an audit field value. The function
> audit_log_vformat() could very easily ensure consistent formatting via
> % format specifiers in the format string, e.g.:
>
>      audit_log_format("n=%d path=%s", n, path);
>
> Building audit output piecewise would be deprecated, e.g. these types
> of sequences would be eliminated.
>
>      audit_log_format(ab, " n=%d", n);
>      audit_log_format(ab, " name=");
>      audit_log_foo();
>
> and replaced with:
>
>      audit_log_format(ab, " n=%d name=%s", foo_to_string(foo));
>
> Whenever audit_log_vformat() encounters a % format specifier it
> formats to a string, then it converts the string to an escaped quoted
> string, and then inserts the escaped quoted string into the buffer
> (e.g. n="123" name="foo bar\n" )
>
> This way the formatting is consistent, easy to apply, and is never
> special cased by the caller.

I'm afraid that your proposed solution is not enough as it still relies on the 
caller for the overall message format.  I'm a firm believer that if you have 
strict requirements that need to be met (I think we do, your concerns are not 
new) you need to enforce those requirements through the API, relying on 
others to adhere to a "gentlemen's agreement" regarding the careful use of a 
flexible API is always going to be problematic.

What I have in mind is an API similar to the following:

 /* Audit field names */
 enum {
   AUD_NAME_UID,
   AUD_NAME_GUID,
   AUD_NAME_PATH,
   ... etc ...
 }

 /* Audit field value types */
 enum {
   AUD_TYPE_BOOL,
   AUD_TYPE_U32,
   AUD_TYPE_S32,
   AUD_TYPE_U64,
   AUD_TYPE_S64,
   AUD_TYPE_STRING,
   AUD_TYPE_STRING_RAW,
   ... etc ...
 } audit_value_type;

 /**
  * audit_log_field - Record a name=value field in the audit message
  * @buffer: Audit buffer/message
  * @name: Audit message field name
  * @type: Audit message field value type
  * @length: Audit message field value length
  * @value: Audit message field value
  *
  * Description:
  * Record a name=value pair in the audit message specified by @buffer,
  * the value in @value is automatically formatted according to @type.
  */
 void audit_log_field(buffer, name, type, length, value);

So the following code snippet would record a UID and PATH fields:

 audit_log_field(buffer, AUD_NAME_UID, sizeof(my_uid), my_uid);
 audit_log_field(buffer, AUD_NAME_PATH, my_path_len, my_path);

The way I see it, there are three main advantages to this approach: first, the 
caller must use a well defined field name, second, the caller is not 
responsible for any of the message formatting, third, we free the API from 
dealing with audit messages in a particular format (i.e. the message no 
longer needs to be in a string representation, it could be in a binary 
format ... or both).  Done right, an API similar to the one above should 
solve a good deal of the variability in the audit messages while at the same 
time opening the door for future optimizations.

> Auparse is not the answer:
> --------------------------
>
> Auparse is not the answer to irregular kernel audit message
> formatting. First of all it forces auparse to have special case logic
> which is not 100% robust and is tied to the kernel source code
> version.

I also agree.

> While we're at it:
> ------------------
>
> If we do fix the format of audit messages we might as well fix some
> other inconsistencies at the same time.
>
> 1) The initial part of AVC messages do not follow the standard
>     name=value formatting used everywhere else in audit.

I believe there are long standing issues here, revolving around SELinux's 
desire to emit AVC messages regardless of the state of audit.  I can't say 
I've payed too much attention to this in the past but there should be plenty 
of info in the archives (check the SELinux list too).

You might be better off dealing with the above API changes first then dealing 
with this.

> What has to change and what's optional:
> ---------------------------------------
>
> The formatting of name/value pairs in the kernel must be fixed, it is
> simply impossible to correctly parse in it's current state.
>
> The rest of the suggested changes are syntactic sugar which would make
> parsing easier because of regular syntax, but they are not
> critical. We could retain the existing formats if backwards
> compatibility is felt to trump syntactic cleanliness and ease in
> parsing. It's a judgment call over when and how to introduce change
> and the anticipated impact.

All reasons for why I think we need to remove as much of the formatting 
decisions from the caller.

-- 
paul moore
linux security @ hp