Audit for live supervision

Fri Aug 15 12:54:35 UTC 2008

On Friday 15 August 2008 02:43:49 Kay Hayen wrote:
> More importantly, and somewhat blocking my tests: With the improved rules I
> get this when compiling quite well reproducible:
>
> type=SYSCALL msg=audit(1218773075.500:118620): arch=c000003e syscall=59
> success=yes exit=0 a0=7fff6f78cf90 a1=7fff6f78cf40 a2=7fff6f78f068 a3=0
> items=2 pp
> id=11412 pid=11421 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
> fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295
> comm="gcc-4.3"
> exe="/usr/bin/gcc-4.3" key=(null)
>
> [...]
> type=SYSCALL msg=audit(1218773075.496:118624): arch=c000003e syscall=56
> success=yes exit=11421 a0=1200011 a1=0 a2=0 a3=7fc067776770 items=0
> ppid=11407 pid
> =11412 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000
> egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="gnatchop"
> exe="/usr/b
> in/gnatchop" key=(null)
>
> Please note the _ascending_ sequence number but _descending_ time.

What this indicates is that there was some recursion before the syscall 
triggered an event. The syscall context exists from sycall entry to exit. If 
during the middle a signal is delivered, the syscall is not finished. Instead 
it runs the signal handler associated with the signal. The signal handler 
might make syscalls which are then handled using the existing syscall context 
via linked list. When that occurs, the timestamp is not being updated. Not 
sure that is appropriate or why the original time really mattered. But that 
is what you are observing. My guess is SIGTERM is being delivered during 
another syscall.

> Seems like a bug? Can you have a look at it?

I'll check on why we don't update the time stamp during syscall recursion.

> -a entry,always -F arch=b32 -S clone -S fork -S vfork
> -a entry,always -F arch=b64 -S clone -S fork -S vfork
>
> Plus I still did't fully grasp why that arch filter was necessary in the
> first place. I mean, after all, I was simply expecting that per default no
> filter should give all arches. Is that filter actually a selector? 

The -F arch is a selector for the syscall table. The kernel works off of 
numbers not strings. So, clone doesn't mean anything to the kernel, but 56 
has meaning. 56 doesn't mean much to people. So, auditctl does you a favor of 
converting text to numbers. It needs to know which table to choose from, the 
32 bit or 64 bit table as both or one could be valid. Its possible to compile 
the kernel to use only the 64 bit table. There is no way to detect this from 
user space except by failure...in which case all you know is failure but not 
why. 

There is also not a direct mapping between x86_64 and i386. There are syscalls 
that exist on one arch but not the other. There are syscalls that change 
names between arches. The problem is that I could maintain a table of all 
these cross references for x86_64 and i386, but I don't have a good idea 
about ppc and s390 which are also biarch. Then the table would be a snapshot 
in time. A syscall could get added in a later kernel but you won't get the 
right results because you were trusting the tool and not suspcious enough to 
do your own review.

Then there is a problem of correlation. If I have 1 rule that expands to 2, 
then how can I do a compare of what's in memory vs what rules are on disk? 
IOW, how do I tell that someone typed:

 -a entry,always -F arch=b32 -S clone -S fork -S vfork
 -a entry,always -F arch=b64 -S clone -S fork -S vfork

or just

-a entry,always -S clone -S fork -S vfork

because auditctl would make 2 from 1. This is a really tricky issue and if we 
didn't care about correlation...or about outdated tools we trust too 
much...we could do this.

> Does it have to do with the fact that syscall numbers are arch dependent?

Yes.

ausyscall x86_64 clone
56

ausyscall i386 clone
120

> > > Can you confirm that a type=EOE delimits every event (is that even
> > > the correct term to use, audit trace, how is it called).
> >
> > It delimits every multipart event. you can use something like this to
> >
> > determine if you have an event:
> 	> if ( r->type == AUDIT_EOE || r->type < AUDIT_FIRST_EVENT ||
> >
> >                                 r->type >= AUDIT_FIRST_ANOM_MSG) {
> >   have full event...
> > }
>
> I will have to check if this affects our intended process tracing. The
> parsing is certainly not simplified by it, for a possibly unrelated reason.

We have an audit parsing library. It takes this into account. the one and only 
bug that I know of in it is when event records are interlaced. This is a 
prolem you'll find at some point. Audit events and their records are not 
serialized in the kernel. So, you could have:

syscall a
path a
syscall b
user msg c
cwd a
avc b

> Without a very stateful message parser, one that e.g. knows how many lines
> are to follow an EXECVE, we don't know when to forward it the part that
> should process it.

time->Thu Aug 14 08:21:34 2008
node=127.0.0.1 type=PATH msg=audit(1218716494.667:677): item=1 
name="/home/sgrubb/.kde/share/config/kmailrc.lock3U3ZZa.tmp" inode=11304982 
dev=08:03 mode=0100644 ouid=4325 ogid=4325 rdev=00:00 
obj=unconfined_u:object_r:user_home_t:s0 

node=127.0.0.1 type=PATH msg=audit(1218716494.667:677): item=0 
name="/home/sgrubb/.kde/share/config/" inode=12550361 dev=08:03 mode=040700 
ouid=4325 ogid=4325 rdev=00:00 obj=unconfined_u:object_r:user_home_t:s0 
node=127.0.0.1 type=CWD msg=audit(1218716494.667:677):  cwd="/home/sgrubb" 

node=127.0.0.1 type=SYSCALL msg=audit(1218716494.667:677): arch=c000003e 
syscall=87 success=yes exit=0 a0=15f06b0 a1=39609389d0 a2=1340ac0 
a3=3960b67a70 items=2 ppid=1 pid=3432 auid=4325 uid=4325 gid=4325 euid=4325 
suid=4325 fsuid=4325 egid=4325 sgid=4325 fsgid=4325 tty=(none) ses=1 
comm="kontact" exe="/usr/bin/kontact" 
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="delete" 

Look at the syscall record. It is always emitted with multi-line records. It 
has an items count. Each auxiliary (path in this case) record has an item 
number. You can tell when you have everything. Single line entries do not 
have an items field. Also note that the record comprising an event comes out 
of the kernel in a backwards order.

> What we first, once we got a message is the following code:
>         # 1. Some lines are split across multiple lines. The good thing is
> that these never start
>         #    with whitespace and so we can make them back into single
> lines. This makes the next
>         #    part easier.
>
>         lines = []
>
>         for line in message.split( "\n" ):
>             if line.strip() == "":
>                 pass
>             elif line.startswith( " type=" ):
>                 lines.append( line )
>             else:
>                 assert line[0] != ' '
>
>                 lines[-1] = lines[-1] + ' ' + line

Did you know about the audit parsing library?

> This is in hope that indeed continued lines always start with a non-space
> and type lines always start with a space. Would you consider this format
> worthy and possible to change?

Don't like changing formats as that affects test suites.

> I have no idea how much it represents and existing external interface, but
> I can imagine you can't change it (easily). Probably the end of type= must
> be detected by terminating empty line in case of those that can be
> continued. But it would be very ugly to have to know the event types that
> have this so early in the decoding process.

We have a parsing library, auparse, that handles the rules of audit parsing. 
Look for auparse.h for the API.

> > There might be tunables that different distros can used with glibc.
> > strace is your friend...and having both 32/64 bit rules if amd64 is the
> > target platform.
>
> We did that of course. And what was confusing us was that the audit.log did
> actually seem to show the calls. Can that even be?

Yes, as explained above.

> > > Does audit not  (yet?) use other tracing interface like SystemTap, etc.
> > > where people try to have 0 cost for inactive traces.
> >
> > They have a cost. :)  Also, systemtap while good for some things not good
> > for auditing. For one, systemtap recompiles the kernel to make new
> > modules. You may not want that in your environment. It also has not been
> > tested for CAPP/LSPP compilance.
> >
> > > Also on a general basis. Do you recommend using the sub-daemon for the
> > > job or should we rather use libaudit for the task instead? Any insight
> > > is welcome here.
> >
> > It really depends on what your environment allows. Do you need an audit
> > trail? With search tools? And reporting tools? Do you need the system to
> > halt if auditing problems occur? Do you need any certifications?
>
> I see. Luckily we are not into security, but only "safety". I can't find
> anything on Wikipedia about it, so I will try to explain it briefly, please
> forgive my limited understanding of it. :-)

At one point, I worked on Space Shuttle software. I know a little on how they 
think about this.

> It certainly will be very helpful to have the audit log and it searchable
> and I understand we get that automatic by leaving audit enabled, but
> configured correctly. In the past we have disabled it, because it caused a
> full disk and boot failure on RHEL 3 after only a month or so. I think it
> complained about the UDP echo packets that we use to check our internal LAN
> operations, but it could have been SELinux too.

RHEL3's audit system is completely different than RHEL5's.

> > > 2. We don't want to poll periodically, but rather only wake up (and
> > > then with minimal latency) when something interesting happened. We
> > > would want to poll a periodic check that forks are still reported, so
> > > we would detect a loss of service from audit.
> >
> > You might write a audispd plugin for this.
>
> Did you mean for the periodic check,

There is a realtime interface for the audit stream. You can write either a new 
event dispatcher or a plugin to the existing one. Seeing as you are more 
concerned with assurance, I'd just replace the current dispatcher with your 
own. I have a description of this here:

http://people.redhat.com/sgrubb/audit/audit-rt-events.txt

> or for the whole job, that means our supervision process?

The supervision process. Then again, maybe you want to replace the audit 
daemon and handle events your own way. libaudit has all the primitives for 
that. So, I guess that brings up the question of how you are accessing the 
audit event stream. Are you reading straight from netlink or the disk?

> Regarding performance I would like to say, you are likely right in that
> it's a non-issue. It has something of a bike-shed to me though. :-) I think
> I still have http://lwn.net/Articles/290428/ on my mind, where I had the
> impression that kernel markers would only require a few noop instructions
> as place holders for a jumps that would cause audit code to run. 

You can go that way if you want. But I don't know of anyone else that has.

> I was wondering why audit wouldn't use that. Is that historic (didn't exist,
> nobody made a patch for it) or conscious decision (too difficult, not worth
> it). Just curious here and of course the comment could be read as a bit
> scary, because it actually means we will have to benchmark the impact...

systemtap came after audit. They have 2 different purposes. One is 
debugging/profiling, the other is regulatory compliance and security. The 
system tap people have no gurantees about what kinds of data is contained in 
the stream or the reliability of delivery. There was some talk about 
combining hooks and in the end it was decided that we should leave them 
disconnected as they serve entirely different purposes.

-Steve