Audit for live supervision

Sat Aug 16 11:19:27 UTC 2008

Hello Steve,

[ time descending, sequence number ascending problem ]

> What this indicates is that there was some recursion before the syscall
> triggered an event. The syscall context exists from sycall entry to exit.
> If during the middle a signal is delivered, the syscall is not finished.
> Instead it runs the signal handler associated with the signal. The signal
> handler might make syscalls which are then handled using the existing
> syscall context via linked list. When that occurs, the timestamp is not
> being updated. Not sure that is appropriate or why the original time really
> mattered. But that is what you are observing. My guess is SIGTERM is being
> delivered during another syscall.

That raised the following question to me: We have "entry" rules defined. When 
we saw that we get exit codes, the conclusion was that "entry" and "exit" are 
not different for every syscall. Can you confirm? 

And assuming that, indeed the may fork complete only after its has completed 
its signal handlers. The expectation would be that this is not an issue 
though, because the new process is inactive. Wrong assumption. The new 
process seemingly can already EXECVE with its fresh life, if the call of FORK 
is interrupted after the process exists.

Now that poses an interesting problem. I guess, what we are missing is that 
FORK should actually enter once and return up to _two_ times, and we need to 
handle and see traces of these. That way we would simply see FORK return with 
0 for the child before it does EXECVE, and so we would know who created it 
(ppid) and know that it exists.

I tried to change our rules to "exit,always" from "entry,always", but it 
didn't make a difference. Can you confirm that only one exit is traced and do 
you think audit can be enhanced to trace these extra exits of syscalls like 
FORK.

For every workaround, a SIGKILL signal towards the FORK making process would 
probably leave us even without a trace, forever possibly.

As for the signal at hand here, I think we have ant (gcj based Java) doing 
multi-threading (parallel building). That would be FORKs series with probably 
a SIGALARM or whatever they use to switch target execution without resorting 
to threading.

> > Seems like a bug? Can you have a look at it?
>
> I'll check on why we don't update the time stamp during syscall recursion.

Thanks a lot. I guess, the expectation could be that "exit" traces have the 
datation of the "exit" and not "entry".

> Then there is a problem of correlation. If I have 1 rule that expands to 2,
> then how can I do a compare of what's in memory vs what rules are on disk?
> IOW, how do I tell that someone typed:
>
>  -a entry,always -F arch=b32 -S clone -S fork -S vfork
>  -a entry,always -F arch=b64 -S clone -S fork -S vfork
>
> or just
>
> -a entry,always -S clone -S fork -S vfork
>
> because auditctl would make 2 from 1. This is a really tricky issue and if
> we didn't care about correlation...or about outdated tools we trust too
> much...we could do this.

That's understood. And a typical danger of user friendly abstraction is that 
it causes confusion. As I said, -F was bound to "filter" in my mind. 
For "arch" it's suddenly a selector. I would find something like this more 
logical:

-a entry,always,any -S clone -S fork -S vfork

and if I really only wanted a certain arch, make me say so:

-a entry,always,b64 -S clone -S fork -S vfork

> ausyscall x86_64 clone
> 56
>
> ausyscall i386 clone
> 120

Very good. We have initially defined a hash in Python manually with what we 
encounter, but we can rather use that to create them. We specifically have 
the problem of visiting a s390 site, where it will handy to have these 
already in place. There is no such function in libaudit, is there?

> We have an audit parsing library. It takes this into account. 

I have looked at it, and auparse_init doesn't seem to support reading from the 
socket itself, does it? It could be AUSOURCE_FILE. And there there is the 
issue that the logs on disk seem to be different from what format we get on 
the socket. The node= is not on disk, newlines, empty lines, etc. see below 
about that.

In an ideal world, we would like to note that the audit socket is readable, 
hand it (or an arbitrarily truncated chunk of data) from it to libaudit, ask 
it for events until it says there are not more. That would leave the 
truncated line/event issue to libaudit. Is that part of the code?

> > Without a very stateful message parser, one that e.g. knows how many
> > lines are to follow an EXECVE, we don't know when to forward it the part
> > that should process it.

[Example deleted]

> Look at the syscall record. It is always emitted with multi-line records.
> It has an items count. Each auxiliary (path in this case) record has an
> item number. You can tell when you have everything. Single line entries do
> not have an items field. Also note that the record comprising an event
> comes out of the kernel in a backwards order.

Ah, we simply ignored the type=PATH etc. elements. But what I mean is that of 
the syscall itself, the arguments seemed to be on new lines:

This is from Python code:

    data = _audit_socket.recv( 32*1024 )
    print data

node=Annuitka type=SYSCALL msg=audit(1218880198.814:42205): arch=c000003e 
syscall=59 success=yes exit=0 a0=16cc168 a1=1464c08 a2=1588008 a3=0 items=2 
ppid=3864 pid=19928 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="ls" 
exe="/bin/ls" key=(null)
node=Annuitka type=EXECVE msg=audit(1218880198.814:42205): argc=2 a0="ls"
a1="--color=auto"

node=Annuitka type=CWD msg=audit(1218880198.814:42205):  
cwd="/data/home/anna/comsoft/v7a1-ps2-acs/src/acs"
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=0 name="/bin/ls" 
inode=1651626 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=1 name=(null) 
inode=779612 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
node=Annuitka type=EOE msg=audit(1218880198.814:42205):

Note that we get a SYSCALL with 2 items, and then in order the items - from 
the socket. But inbetween we get type=EXECVE it doesn't have an item number, 
and worse the new line before 'a1=--color-auto' is real and so is the empty 
line after it. I have another example of a "gnash" call from Konqueror with 
no less than 29 arguments.

That means, in order to parse the socket, we should check argc, right? I think 
we would prefer very long lines like they are in /var/log/audit instead, 
making these kinds of steps optional.

Actually I don't understand the differences in format. I assume they serve the 
purpose of making things readable?

> Did you know about the audit parsing library?

Our assumption was also that it should be easy enough to parse the text. Well 
you know assumptions. Rarely ever true. :-)

> > This is in hope that indeed continued lines always start with a non-space
> > and type lines always start with a space. Would you consider this format
> > worthy and possible to change?
>
> Don't like changing formats as that affects test suites.

That " type=" start is a self-confusion of ours. Starting with 1.6 the node= 
part was added, and some hack was still in place that removes "node=hostname" 
and leaves the space there. Sorry about that.

> > I have no idea how much it represents and existing external interface,
> > but I can imagine you can't change it (easily). Probably the end of type=
> > must be detected by terminating empty line in case of those that can be
> > continued. But it would be very ugly to have to know the event types that
> > have this so early in the decoding process.
>
> We have a parsing library, auparse, that handles the rules of audit
> parsing. Look for auparse.h for the API.

If you confirm that can handles the parsing from the socket, as suggested 
above, we may persue that path and can ignore strangeness of the format once 
its handled by the library.

> > > There might be tunables that different distros can used with glibc.
> > > strace is your friend...and having both 32/64 bit rules if amd64 is the
> > > target platform.
> >
> > We did that of course. And what was confusing us was that the audit.log
> > did actually seem to show the calls. Can that even be?
>
> Yes, as explained above.

Sorry, I am still confused. Can you explain why the socket and the audit.log 
can have different contents. I was blaming my (usually bad) memory.

> > I see. Luckily we are not into security, but only "safety". I can't find
> > anything on Wikipedia about it, so I will try to explain it briefly,
> > please forgive my limited understanding of it. :-)
>
> At one point, I worked on Space Shuttle software. I know a little on how
> they think about this.

Well, that's perfect. :-)

> > > > 2. We don't want to poll periodically, but rather only wake up (and
> > > > then with minimal latency) when something interesting happened. We
> > > > would want to poll a periodic check that forks are still reported, so
> > > > we would detect a loss of service from audit.
> > >
> > > You might write a audispd plugin for this.
> >
> > Did you mean for the periodic check,
>
> There is a realtime interface for the audit stream. You can write either a
> new event dispatcher or a plugin to the existing one. Seeing as you are
> more concerned with assurance, I'd just replace the current dispatcher with
> your own. I have a description of this here:
>
> http://people.redhat.com/sgrubb/audit/audit-rt-events.txt

I saw that too, but I thought it would be better to build upon the existing 
effort. I think that's a viable alternative and potentially could be more 
robust to us. At least it could be that audisp seems to try and solve 
problems we don't have or want.

Looking at the source I saw that node name is something that audisp indeed 
adds the node name and that auditd doesn't log EOEs, explaining some of the 
differences. I didn't find why audisp has extra new lines, or if auditd 
removed these.

I think we will make a prototype for the RT interface and see what it gives 
us.

> > or for the whole job, that means our supervision process?
>
> The supervision process. Then again, maybe you want to replace the audit
> daemon and handle events your own way. libaudit has all the primitives for
> that. So, I guess that brings up the question of how you are accessing the
> audit event stream. Are you reading straight from netlink or the disk?

>From the files is out of question. We thought of the audit sub deamon as 
something that simply allows to access the audit stream live, but that it is 
otherwise the same as the file. Like auditd would accept things from the 
kernel, write it to a file and hand it to audisp as well which would then 
provide it to others. Isn't that the design? 

> > I was wondering why audit wouldn't use that. Is that historic (didn't
> > exist, nobody made a patch for it) or conscious decision (too difficult,
> > not worth it). Just curious here and of course the comment could be read
> > as a bit scary, because it actually means we will have to benchmark the
> > impact...
>
> systemtap came after audit. They have 2 different purposes. One is
> debugging/profiling, the other is regulatory compliance and security. The
> system tap people have no gurantees about what kinds of data is contained
> in the stream or the reliability of delivery. There was some talk about
> combining hooks and in the end it was decided that we should leave them
> disconnected as they serve entirely different purposes.

Ah I see. Thanks for the explanation. :-)

Best regards,
Kay Hayen