Auditing - Snare, LAuS, SELinux

Thu Aug 26 03:28:29 UTC 2004

Hi guys, sorry for the slow replies - timezone offsets are challenging
sometimes. :) Apologies also for the length of this mail. There's a
summary at the end.

I'll add some responses to a variety of emails (Olaf/Jon/Pete) below:

Olaf mentioned:
>> + Integrated within SELinux
> I think that's a neutral, or even a minus from my perspective.
> I would very much prefer an audit solution that is agnostic of the
> security infrastructure you use.

Agreed. I think that there's a lot of crossover in 'active' users of
both SELinux and auditing, it would be good to be able to turn off one
without affecting the other.

> > - Two-part auditing (event, then returncode as separate info),
> > which makes the daemon side a little harder
> 
> Yes, that is something that needs to be changed. If you do stuff like
> switching audit files on the fly (i.e. writing to N different
> buckets), the records for call and outcome may end up in two different
> files, which is hard to deal with in the user land tools.

Very good point! That's one aspect I hadn't thought about.

> > 3) Snare's kernel component
> 
> Okay, so I'm going to comment on snare. Please don't take anything
> I say below personal, Leigh..

*laugh* No worries - I'm very up-front about the fact that the guys at
InterSect Alliance aren't experienced kernel hackers, and our
development strategy with Snare very much reflected that (ie: reduce
complexity in the kernel, and push it into user-space). So whilst some
of the development decisions may result in lower optimisation, they did
keep us from killing anything in the kernel ;)

> I thought the audit record format, and the daemon were rather
> complicated.

Actually, this is the area that probably had the most though put into it
- the token-based system allows for easy follow-on processing, whilst
preserving some level of 'real human' readability, and the capability to
extend the fields, without breaking any assumptions made by follow-on
processing scripts.

I can't emphasise this enough - the output format MUST be developed with
follow-on processing in mind. Human-readability is fair enough for
something like syslog, where you're seeing relatively low,
human-manageable volumes, but in enterprise-level audit situations,
we're seeing multiple gigabytes per day from potentially thousands of
different servers, the only thing that's going to be seeing the raw data
is:
* The audit processing scripts,
* The cdrecord/dvdrecord utilities, and perhaps
* Some unlucky auditor if forensic examination is required (though it's
likely that they'll be using a tool of some sort too).

The current snare format is token based, which means you can add new
tokens if required, multi-delimited (tabs separating tokens, commas
separating internal elements) for easy processing, and
delimiter-consistent - ie: no tabs/commas are allowed (unescaped) INSIDE
a field.

> Which is also fairly prone to deadlock if you start auditing system
> services such as nscd.

Tell me about it... applying audit to nscd on Solaris BSM is a sure-fire
way to crash the system. This was a real challenge in snare for Solaris
about a year ago. Linux seems immune thus far..

> messing around with the system call table

Yes, gone now.

> It also doesn't even attempt to resolve the path names passed into
> system functions, which is rather useless.

Actually, it sent the PWD, and the path sent to the system call, to the
audit daemon. The audit daemon then used a modified realpath() to bring
them together. So not in the kernel, but it did happen at the point of
filtering.

> From an evaluation perspective, I think the snare daemon also didn't
> fare very well when it came to assuring that audit records never got
> lost.

Yes. This is actually a design feature. The administrator could choose
how hard snare tried to ensure audit records weren't lost, by bumping up
the 'linked list' cache between kernel and daemon. So on systems where
audit was useful, but not critical, low effective resource usage
resulted, with a reasonable chance of dropping events (eg: 2% in high
load situations). On systems where audit was more critical, the linked
list RAM is bumped up, which means a much reduced chance of audit loss.

I'd recommend preserving something along these lines in the 'final'
audit capability. We've seen time and time again, in many organisations,
the requirement for 'opportunistic auditing' - ie: Audit information is
important, but not at the expense of significant system performance.
Whilst we should have a 'must try as hard as possible to deliver audit'
facility to satisfy the hard-core, there should also be a 'try to
deliver audit, but if it's going to slow down the system too badly, drop
it'. Even in organisations like national intelligence agencies, such
opportunistic auditing is used on most systems (unless a security plan
specifically mandates otherwise). If audit uses too many resources, then
it will probably be turned off as a result of a security assessment (and
other countermeasures put in place instead - eg: air gaps).

> One major question is portability - for laus we had to support
> i386, x86_64, ppc, ppc64, s390, s390x and are currently doing ia64.

John has already addressed this one. s390 & snare / ia64 and snare seem
to like each other also.

> I liked the GUI though; that would be something that'd be nice to
> have.

I think we can manage this. ;)

> What I am wondering is whether we actually need selinux or the LSM
> hooks to make a decision whether to log a call or not

Perhaps - if only to determine the returncode of an event. If anything
blocks the event from occurring (eg: a file from opening), this needs to
be reflected in the returncode provided to the audit daemon. It's no
good panicing an administrator that /etc/passwd was removed, when it
actually wasn't.. 

However, hopefully there's a point AFTER any normal/selinux access
controls are evaluated that we can pull the returncode from, as you
mention.. hopefully in a way that we can send the full audit event as a
single entity, rather than splitting it.

> Maybe there's even a way we can tap into copy_from_user pretty much
> the same way Rik taps into getname.

This doesn't 'feel' like the right place.. but I'm not sure why yet (nor
can I offer an alternative at this stage).

> > * Can we make things generic enough to cover pluggable system call
> > interception?
> What do you mean by this?

I was just thinking of making the 'syscalltrack' guys jobs easier - ie:
provide a mechanism whereby an external entity can 'request' that an
(almost) arbitrary system call be monitored. Not a priority though.

Olaf and Jon mentioned:
> | I seriously believe that filtering in the user land is wrong. At
> most,
> | the daemon should shovel fully cooked records from the kernel into
> | the audit trail files and potentially other consumers (such as
> IDSs).
> 
> Snare does do preliminary filtering in the kernel, deciding which
> syscall audit records should be passed to the userland based on what
> objectives are specified to the daemon.. i.e., if you don't specify
> any filters on open(), no open syscall audit record will be passed to
> the daemon.

And Olaf then said:
> Doing regular expressions in user land sounds nifty - but is that
> really essential? I think most of the time people will want to audit
> based on the directory hierarchy, e.g. "all of /etc and /usr except
> /usr/tmp" and stuff like that. That requires no regexp matching.

Agreed. (non-event) Filtering was only really done in userspace, as we
weren't confident enough to do it in the kernel.

We've used regexp matching, but in general, find that plain old wildcard
facilities are more than adequate for 99% of users requirements. I would
suggest that 'suffix matches' would also be reasonable (ie: /etc/* -
which is nice and easy to code), but have found many agencies take
advantage of mid-string wildcards to do things like:
"Only audit files within a 'SECRETSTUFF' directory" (for example), which
would require a match like "*/SECRETSTUFF/*"

So as Jon mentioned, prefix and suffix matches are recommended if
matches are to be done in-kernel.

Olaf mentions, commenting on Jon's point:
> > Only if you are running with audit enabled, yes?  I don't know how
> Yes, but from a distributor's point of view we want an audit solution
> that we can enable in all our kernels without a negative impact on
> performance. We cannot do a separate audit-enabled version of every
> kernel :)

*grin* .. and to find the holy grail while you're at it ;)
I think Jon was implying 'only with audit=1 passed to the kernel',
rather than 'separate kernel'.

Pete then came in with:
> You can make use of SELinux policy files to create rules for
> generating audit records.

>From the file Pete attached, I get the impression that file-auditing is
sorta dependent on selinux. Is this the case, or would file auditing be
independent of selinux?

Both approaches have their advantages:
* The Selinux approach is a little like Windows - you have to
specifically request that a file have audit applied. This certainly
reduces both CPU usage, and audit volume, allowing you to target
specific files or directories.
- Unfortunately, this approach means that wildcard file matches are out
the window, and that controlling audit configurations is a real problem
(you have to scan through the entire file system to determine what files
are being audited).

* The in-kernel approach is a little like Solaris BSM / AIX and a few
other approaches. This means that you have a great deal of flexibility,
and are not reliant on external ACL managers.
- However, CPU and resource usage is a real problem on solaris, where
you have no filtering capability (unless you install snare for solaris
;). Turn on file auditing on solaris, and on any reasonable production
system, your audit partition soon fills with data of questionable value.

I'd go with option 2 personally. The BSM approach + reasonable
filtering, is a good move. Could we then make the SELinux file-auditing
capability a little more efficient?

So, in summary:
* Agree - Selinux dependency is probably not optimal. No problem with
selinux being dependent on audit though.
* Snare's not optimised - it passes most stuff to user space for
resolution, however, there are a few things that are worth retaining I
think:
- Tokenised, consistent audit log format, that is MACHINE readable as a
primary goal.
- Variable resource usage, dependent on administrators assertion of
criticality.
- Filtering is a very good thing, but regexps are probably an overkill
(wildcards are good).
* Decoupling from selinux, would be handy (again). If decoupled, does
selinux really NEED to do all that "auditallow * etc_t:file *;" stuff,
or can we provide a more efficient interface to allow it to 'add' a file
event into the mix? What if selinux wants to set up a event for every
file on the file system? would it try create a new event for each?
memory?

Regards,

Leigh.

-- 
Leigh Purdie, Director - InterSect Alliance Pty Ltd
http://www.intersectalliance.com/