Auditing - Snare, LAuS, SELinux

Wed Sep 1 19:33:44 UTC 2004

Hello,

We have some new people to the list, welcome.  The latest post follows,
I have added my comments to Leigh's summary at the end.
> > I'm very much in favor of binary formats, actually. printf formatting
> > wastes a lot of time and space on stuff that nobody's ever going to look
> > at anyway, except using a toolset.
> 
> I have to disagree here. I have no problems with binary as a
> 'transition' format between kernel and userspace, but you're not really
> saving the conversion cost of binary to text - you're only delaying it.
> I'll explain...
> 
> In our experience, almost no organisations actually use 'auditreduce' or
> similar tools to 'select out' events of interest, except for situations
> where an administrator wants a quick-and-dirty scan on a current audit
> log (though, even in this case they are more likely to convert to text,
> and grep).
> 
> Solaris, and Irix have both gone down this path - mostly because it's
> easier for the programmer, rather than easier for the user. I understand
> this approach, but if you consider an audit solution with your proposed
> in-kernel filtering (eventid, wildcards, user selection, etc), where the
> only events that are being delivered to kernel-space, are the ones the
> user is likely to be interested in (and thus wish to convert to text), I
> think you'll see what I'm getting at.
> 
> Binary records (ie: Stored as binary on disk, rather than as a
> transition format between kernel and daemon) can be a real problem -
> we've run into the following examples:
> 
> * Trying to conduct forensics investigations on another machine is very
> difficult:
>  - UID mappings are often different, so UID '123' on one machine,
> translates to an entirely different user on another. Converting to text
> format on the host machine at daemon write-time solves this problem.
> This same problem exists on windows, with SID mappings.
>  - Even if analysed later on the same machine, user 123 may have been
> deleted.
>  - Trying to view a Windows binary eventlog on a machine that it was not
> generated on can cause massive problems - the "string conversion DLL's"
> that exist on one machine (eg: exchange server) to translate event
> strings to text format, may not be installed on another machine, so
> windows reports garbage.
> 
> * Disk forensics are more difficult
>  - There have been situations where a system (including audit log) has
> been utterly trashed. dd if=/dev/hda | strings (or slightly more
> targetted tools) can provide some potential audit strings of interest if
> you can kill the system quickly enough before the attacker does too much
> overwriting. Pulling out binary data is a little harder when you don't
> have inodes to work with.
> 
> > What we're currently doing is a generic header that's the same for
> > all system call audit records, followed by the list of arguments encoded
> > as TLV.
> 
> Sounds pretty reasonable. :)  Note though, we're doing something
> reasonably similar in Snare - ie: the audit daemon reads a small header
> to determine what type of audit event is being sent, and how much more
> to read. This means that there are two read()'s per audit event.
> Recommend implementing something that allows a big, bulk read into a
> user-space cache, which then gets broken up internally within the
> daemon, into events. You're probably already doing this though..
> 
> > Concerning the snare format, what I was actually wondering about was
> > the fact that you seem to have several different classes of records
> > for system call auditing.
> 
> Yes. I was never really happy with the way I implemented this. Jon has
> some ideas that should make this a lot nicer, at a slightly increased
> RAM cost.
> 
> It was an attempt to somewhat optimise memory usage. Some events (eg:
> link, copy, etc.) have a requirement for a source and destination
> filename, but I didn't want to have a single structure for every event
> that preallocated MAX_PATH x 2 - particularly when something like
> set*uid() never uses a path. So I had a structure for 'file access
> events', another for 'file copy events' another for 'userid events' ..
> and a couple of others. It worked OK, but wasn't pretty, and slowed down
> user/kernel interaction.
> 
> > > Actually, it sent the PWD, and the path sent to the system call, to the
> > > audit daemon. The audit daemon then used a modified realpath() to bring
> > > them together. So not in the kernel, but it did happen at the point of
> > > filtering.
> > 
> > There is also a race condition here - by the time the audit daemon puts
> > together the full path, the file system may have changed already. On a
> > slow file system such as NFS, the delay may be long enough for attackers
> > to pretty reliably conceal their tracks (symlink flipping on NFS is a
> > fairly well-known technique to skew the odds in a file access race
> > condition towards the attacker).
> 
> Yup - there are a few problems like this. However, I think we all agree
> that the kernel side of Snare's not the ideal code to use as a basis for
> the new audit subsystem. :) We'll keep trying to improve it during the
> transition phase though, so comments like this are appreciated.
> 
> > > Yes. This is actually a design feature. The administrator could choose
> > > how hard snare tried to ensure audit records weren't lost, by bumping up
> > > the 'linked list' cache between kernel and daemon. So on systems where
> > > audit was useful, but not critical, low effective resource usage
> > > resulted, with a reasonable chance of dropping events (eg: 2% in high
> > > load situations). On systems where audit was more critical, the linked
> > > list RAM is bumped up, which means a much reduced chance of audit loss.
> > 
> > But the CAPP requirement is that you never lose a single audit record.
> 
> I know.. I know.. Urk.. I even remember writing some of these national
> policy documents while at a previous employer, and working with the
> people that conducted evaluations.
> 
> Keep in mind though, that almost all national security policy documents,
> and evaluation teams (including the 'common criteria' evaluation teams),
> use a risk-assessed approach to such evaluations. Don't think of CAPP as
> a 'tick a box, must have, feature list before you get approval', think
> of it more as guidelines that provide an overview of a 'security
> profile' that a product needs to have before being accepted for
> certification. If appropriate justification or alternative
> countermeasures are considered to be adequate, a particular feature can
> be granted waiver. Have a yack to the evaluation team if in doubt -
> they're generally nice guys ;)
> 
> Good example of this is Snare I guess... Not CAPP evaluated, but close
> enough that it allowed Linux to get into a lot of places that wouldn't
> have otherwise considered the OS.
> 
> > And depending on what you audit, your message queue will never be long
> > enough. That's why I think a process trying to deliver an audit record
> > should stall if there's no room in the queue. In fact, that's a fairly
> > simple way to deal with terminal audit problems as well - if the disk
> > fills up, the audit daemon simply stops accepting new events from the
> > kernel, causing all audited processes to stall.
> 
> I agree conceptually. In practise though, I have never seen it used
> operationally in any organisation, ever. As such, if it needs to be in
> there to keep the CAPP guys happy, no worries... but make sure that
> administrators can turn it off, so they don't have to explain to the CIO
> why their web server is no longer accepting requests - even though the
> only partition that's full, is the audit one. ... but I figure that you
> know this already from your comment below ;)
> 
> > Agreed. Evaluation requirements and real-world scenarios don't necessarily
> > have much overlap.
> 
> 
> > As far as I understood from the RedHat folks, auditing files based
> > on their security labels was what they wanted to do. It makes sense if
> > you're doing selinux, and a common audit implementation would definitely
> > have to allow for it (but not mandate it).
> 
> Yeah, agree.
> 
> > What about audit hooks that provide the following services for selinux
> > and other LSM frameworks:
> > 
> >  -	a function that tells the audit framework
> >  	"I decided to audit this system call, please do that for me"
> >  -	a function that tells the audit framework
> > 	"here's an additional blob of information, please attach that
> > 	to the audit record"
> > 
> > Would that be sufficient? What else would be required?
> 
> And possibly a "Here's an audit record (maybe just an arbitrary string)
> that my application considers a critical audit event, can you insert it
> into the audit log for me" system call - where the audit subsystem
> injects a date/time and zaps it out the door.
> 
> > Would it be better to allow selinux to use the generic filtering machinery
> > of the audit subsystem? In that case, it would need a mechanism to specify
> > additional bits of information that could be used in filter evaluation,
> > such as object labels, caller roles, etc.
> 
> Hmm.. good question! I suspect that the SeLinux guys would rather use
> their own internal stuff at first, and just ask the audit subsystem to
> handle the 'report to user space, and store it appropriately with the
> correct date/time' stage... but I think that there might be avenues for
> this sort of thing later, once they're happy with the mechanisms of
> auditing..
> 
> However, you have made my mind wander on to things like a
> is_file_audited() call...
> 
> ie: 
> for FILE in /etc/passwd, /etc/shadow, /usr/local/etc/something:
> 	if(!is_file_audited(FILE)) {
> 		set_file_audited(FILE)
> 
> .. but this would be way too complex to manage, with the possibility of
> either:
> * A rule for every file on the file system, or
> * A new flag attached to every inode (or selinux label? (Posix ACL
> integration?) and some open() mangling, or
> * Some sort of dynamic rule-optimisation code (yuk).
> 
> None of these are very nice..
> 
> Ok, In summary, it sounds like we:
> * Want to make audit independent of SELinux - even the file-audit
> settings
This seems to be a popular request.  Do people prefer we start with
replicating the SELinux functionality and making a clean separation
between the two, or using SELinux as a base and adding audit specfic
extensions?
> * Want to make it easy for the selinux guys (or any other module
> provider) to inject audit log data 'through' the audit subsystem.
> * Go with a binary format between kernel and daemon, but ensure that the
> logs are written in text format.
I suggest that we preserve the text interface for SELinux, making the
binary format an option.  LAuS stored the audit records in binary format
which required audit specific tools, augrep, aucat, to analyze the data.
Did 'aucat -F' ever get fixed? ccb? Thomas? :-)

I agree with writing the records in text format.
> * Meet CAPP requirements where possible (eg: halt-on-audit-fail), but
> provide a reasonable fallback position for real-world use.
I am working through the CAPP requirements to see if we have any issues
that need to be resolved with the current implementation.  If anyone
knows of any issues where audit does not meet CAPP requirements please
post them to the list.  I will follow up with my list when completed.
> * Try and implement filtering in-kernel for items such as filename,
> using wildcards where appropriate.
Do we try to resue the LAuS filtering code? Olaf?
> * Use Rik's code as a base, try and integrate Olaf's stuff (particularly
> for certification reasons), and use a sprinkling of Snare concepts.

Agreed.

We have SELinux people on the list, they may have additional comments
for us.

Regards,

Peter