Audit for live supervision

Mon Aug 18 15:10:24 UTC 2008

On Saturday 16 August 2008 07:19:27 Kay Hayen wrote:
> [ time descending, sequence number ascending problem ]
>
> > What this indicates is that there was some recursion before the syscall
> > triggered an event. The syscall context exists from sycall entry to exit.
> > If during the middle a signal is delivered, the syscall is not finished.
> > Instead it runs the signal handler associated with the signal. The signal
> > handler might make syscalls which are then handled using the existing
> > syscall context via linked list. When that occurs, the timestamp is not
> > being updated. Not sure that is appropriate or why the original time
> > really mattered. But that is what you are observing. My guess is SIGTERM
> > is being delivered during another syscall.
>
> That raised the following question to me: We have "entry" rules defined.
> When we saw that we get exit codes, the conclusion was that "entry" and
> "exit" are not different for every syscall. Can you confirm?

They are different in that some things are not defined at entry while all 
things are defined at exit. I believe you can write all audit rules as exit 
rules without any noticable differences. if have an entry rule that evaluates 
to never, then it does speed things up since it no longer needs to collect 
aux records. With respect to the time, its set when audit_log_start is called 
which is always on exit for any rules that involve syscalls (that is when the 
exit code is valid).

> I tried to change our rules to "exit,always" from "entry,always", but it
> didn't make a difference. Can you confirm that only one exit is traced and
> do you think audit can be enhanced to trace these extra exits of syscalls
> like FORK.

Yes, I think the kernel could be updated to return twice. This would need to 
be sent upstream and I think 2.6.28 is the next chance.

> > > Seems like a bug? Can you have a look at it?
> >
> > I'll check on why we don't update the time stamp during syscall
> > recursion.
>
> Thanks a lot. I guess, the expectation could be that "exit" traces have the
> datation of the "exit" and not "entry".

See above about timestamp generation.

> > Then there is a problem of correlation. If I have 1 rule that expands to
> > 2, then how can I do a compare of what's in memory vs what rules are on
> > disk? IOW, how do I tell that someone typed:
> >
> >  -a entry,always -F arch=b32 -S clone -S fork -S vfork
> >  -a entry,always -F arch=b64 -S clone -S fork -S vfork
> >
> > or just
> >
> > -a entry,always -S clone -S fork -S vfork
> >
> > because auditctl would make 2 from 1. This is a really tricky issue and
> > if we didn't care about correlation...or about outdated tools we trust
> > too much...we could do this.
>
> That's understood. And a typical danger of user friendly abstraction is
> that it causes confusion. As I said, -F was bound to "filter" in my mind.

-F means field. In this case, the filter does use the arch field to select 
which syscalls become events. But we put it before the syscall so that 
auditctl looks it up in the right table. It might possibly be more correct to 
introduce a selector for -S, but then people won't like giving it twice.

> > ausyscall x86_64 clone
> > 56
> >
> > ausyscall i386 clone
> > 120
>
> Very good. We have initially defined a hash in Python manually with what we
> encounter, but we can rather use that to create them. We specifically have
> the problem of visiting a s390 site, where it will handy to have these
> already in place. There is no such function in libaudit, is there?

For what? 

> > We have an audit parsing library. It takes this into account.
>
> I have looked at it, and auparse_init doesn't seem to support reading from
> the socket itself, does it?

You mean the netlink socket?

> It could be AUSOURCE_FILE. And there there is the issue that the logs on
> disk seem to be different from what format we get on the socket. 

yes it is.

> In an ideal world, we would like to note that the audit socket is readable,
> hand it (or an arbitrarily truncated chunk of data) from it to libaudit,
> ask it for events until it says there are not more. That would leave the
> truncated line/event issue to libaudit. Is that part of the code?

libaudit should pull complete events from the kernel unless an execve has an 
excessive number of arguments or large sized arguments.

> > Look at the syscall record. It is always emitted with multi-line records.
> > It has an items count. Each auxiliary (path in this case) record has an
> > item number. You can tell when you have everything. Single line entries
> > do not have an items field. Also note that the record comprising an event
> > comes out of the kernel in a backwards order.
>
> Ah, we simply ignored the type=PATH etc. elements. But what I mean is that
> of the syscall itself, the arguments seemed to be on new lines:
>
> This is from Python code:
>
>     data = _audit_socket.recv( 32*1024 )
>     print data
>
> node=Annuitka type=SYSCALL msg=audit(1218880198.814:42205): arch=c000003e
> syscall=59 success=yes exit=0 a0=16cc168 a1=1464c08 a2=1588008 a3=0 items=2
> ppid=3864 pid=19928 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
> fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="ls"
> exe="/bin/ls" key=(null)
> node=Annuitka type=EXECVE msg=audit(1218880198.814:42205): argc=2 a0="ls"
> a1="--color=auto"
>
> node=Annuitka type=CWD msg=audit(1218880198.814:42205):
> cwd="/data/home/anna/comsoft/v7a1-ps2-acs/src/acs"
> node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=0
> name="/bin/ls" inode=1651626 dev=08:12 mode=0100755 ouid=0 ogid=0
> rdev=00:00
> node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=1 name=(null)
> inode=779612 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
> node=Annuitka type=EOE msg=audit(1218880198.814:42205):
>
> Note that we get a SYSCALL with 2 items, and then in order the items - from
> the socket. But inbetween we get type=EXECVE it doesn't have an item
> number,

I suppose that could be fixed.

> and worse the new line before 'a1=--color-auto' is real and so is 
> the empty line after it. I have another example of a "gnash" call from
> Konqueror with no less than 29 arguments.

That is coming from here, and I think a patch was submitted fixing it.

http://lxr.linux.no/linux+v2.6.26.2/kernel/auditsc.c#L1114

> That means, in order to parse the socket, we should check argc, right? I
> think we would prefer very long lines like they are in /var/log/audit
> instead, making these kinds of steps optional.
>
> Actually I don't understand the differences in format. I assume they serve
> the purpose of making things readable?

Yes. From the kernel is a structure with some text. For logging we want it all 
text.

> > > This is in hope that indeed continued lines always start with a
> > > non-space and type lines always start with a space. Would you consider
> > > this format worthy and possible to change?
> >
> > Don't like changing formats as that affects test suites.
>
> That " type=" start is a self-confusion of ours. Starting with 1.6 the
> node= part was added, and some hack was still in place that removes
> "node=hostname" and leaves the space there. Sorry about that.

You can remove or add the node field. Its controlled by the name_format config 
item.

> > > I have no idea how much it represents and existing external interface,
> > > but I can imagine you can't change it (easily). Probably the end of
> > > type= must be detected by terminating empty line in case of those that
> > > can be continued. But it would be very ugly to have to know the event
> > > types that have this so early in the decoding process.
> >
> > We have a parsing library, auparse, that handles the rules of audit
> > parsing. Look for auparse.h for the API.
>
> If you confirm that can handles the parsing from the socket, as suggested
> above, we may persue that path and can ignore strangeness of the format
> once its handled by the library.

The audit parsing library wants to read text strings as you would find them on 
disk. The kernel keeps type separate as an integer so that decisions can be 
made about what the record means without having to do a text to int 
conversion. So, the audit daemon does the reformatting after it decides that 
it a record type that we are interested in.

> > > > > 2. We don't want to poll periodically, but rather only wake up (and
> > > > > then with minimal latency) when something interesting happened. We
> > > > > would want to poll a periodic check that forks are still reported,
> > > > > so we would detect a loss of service from audit.
> > > >
> > > > You might write a audispd plugin for this.
> > >
> > > Did you mean for the periodic check,
> >
> > There is a realtime interface for the audit stream. You can write either
> > a new event dispatcher or a plugin to the existing one. Seeing as you are
> > more concerned with assurance, I'd just replace the current dispatcher
> > with your own. I have a description of this here:
> >
> > http://people.redhat.com/sgrubb/audit/audit-rt-events.txt
>
> I saw that too, but I thought it would be better to build upon the existing
> effort. I think that's a viable alternative and potentially could be more
> robust to us. At least it could be that audisp seems to try and solve
> problems we don't have or want.

You have 4 points to get the audit stream, in order of distance from the event 
generation: the audit netlink socket, auditd realtime interface, audisp 
plugin interface, and the af_unix socket created by the af_unix plugin from 
audispd. For higher reliability where you don't want of need any other audit 
processing interfering, I would say use either of the first 2.

> Looking at the source I saw that node name is something that audisp indeed
> adds the node name and that auditd doesn't log EOEs, explaining some of the
> differences. I didn't find why audisp has extra new lines, or if auditd
> removed these.

Auditd strips these before writing to disk. The realtime interface sends the 
event just as it was received by the kernel.

> > > or for the whole job, that means our supervision process?
> >
> > The supervision process. Then again, maybe you want to replace the audit
> > daemon and handle events your own way. libaudit has all the primitives
> > for that. So, I guess that brings up the question of how you are
> > accessing the audit event stream. Are you reading straight from netlink
> > or the disk?
>
> From the files is out of question. We thought of the audit sub deamon as
> something that simply allows to access the audit stream live, but that it
> is otherwise the same as the file.

The data can be either binary or string. Binary means that its exactly the 
same format that comes from thekernel unchanged. String means that its been 
formatted just like you would see on disk. But it appears that its not 
stripping the 0X0A out of the text. It probably should.

> Like auditd would accept things from the kernel, write it to a file and hand
> it to audisp as well which would then provide it to others. Isn't that the
> design? 

Yes.

-Steve