Auditd shutdown

Tue Apr 12 14:29:06 UTC 2005

Hello,

Having the kernel detect a signal being sent to the audit daemon is not 
working. Is anyone troubleshooting this or do we take another approach?

I spent some time yesterday thinking about the shutdown. I came to the 
conclusion that the only way to "do it right" is to get the credentials in 
the signal handler. Everything else is racy.

THE PROBLEM

When I get the term signal, I would need to wait for the event to be logged to 
disk. So that means I have to inspect each packet and wait until the shutdown 
message comes through. But what if the backlog was full when that event would 
have been enqueued? 

Also, suppose I have a time out. When the timeout occurs, I have 2 choices: 
set the audit pid to 0 and then close the socket, or just close the socket. 
If I just close the socket, I get this message in the logs: 

Apr 11 16:55:04 localhost kernel: audit: *NO* daemon at audit_pid=15734

This looks ugly. But if I set the pid to 0, we don't get that message in the 
logs. But I am using the ack flag for positive confirmation of all netlink 
communication. So what if the signal event is the first thing I read from the 
socket instead of the ack? Meaning the event was delivered just after the 
timeout and before the logging thread finished?

Besides, by using a timeout, we do not meet the requirements. If the timeout 
occurs and we go ahead and shutdown, we simply don't have the information 
about who initiated the shutdown.

I can come up with more scenarios that show we can't meet the CAPP 
requirements by having an event placed into the message queue. The only way 
to guarantee that we meet requirements is for the credentials to be available 
*with* the signal delivery.

ALTERNATIVES

What I believe we should do is one of 2 things. Either create a SA_AUDITINFO 
structure that can be delivered with the signal - or to swap the values of 2 
entries in the siginfo_t structure. Between the two, I think SA_AUDITINFO is 
the correct way to do it. But I would like to examine swapping values first.

We need to think about LSPP as we do this and solve both problems while we are 
in this area. LSPP will require that we log the credentials of the initiator. 
This would be the SE Linux sid. It is kept in kernel as a u32 data type. The 
user id is kept as uid_t. So, we need to find 2 elements in the siginfo_t 
structure that we can replace with our data.

The si_uid fits the loginuid perfectly. The si_uid normally indicates the user 
that sent the signal. Since the audit daemon runs as root, only root 
processes can send signals to it. So basically, every time we get a signal, 
this element will be root which is meaningless. We can replace it with the 
loginuid and now it has meaning.

The SE Linux uid is tougher to fit. Because linux is deployed on 16 bit 
platforms, we cannot use any int in the siginfo_t structure and be correct. 
We have to find something that is a long. In include/asm-generic/siginfo.h, 
we can see the structure. A quick grep for long finds this:

#ifndef __ARCH_SI_BAND_T
#define __ARCH_SI_BAND_T long
#endif

We do not use poll in the audit daemon, so this might be a good candidate. 
Another candidate would be anything with clock_t. Looking at the per arch 
definition, they all seem to be long. So this means si_stime or si_utime  
have the right sizes.

The only issue left is choosing which one we want to use and agreeing on that. 
Since long is signed and the SE Linux sid is u32, we need to take care to 
load it correctly so we don't get sign extension. It needs to be cast to 
unsigned long and then long.

The other way of delivering credential with the signal is to create a new 
SA_AUDITINFO flag and a new structure to hold our information:

typedef struct sigauditinfo {
          int      sa_signo;  /* Signal number */
          int      sa_errno;  /* An errno value */
          int      sa_code;   /* Signal code */
          pid_t    sa_pid;    /* Sending process ID */
          sid       sa_pidsid;  /* Sending process sid */
          uid_t    sa_uid;    /* Real user ID of sending process */
          sid       sa_uidsid;  /* Real user's  sid */
          uid_t    sa_luid;    /* Login user ID of sending process */
           int      si_status; /* Exit value or signal */
} sigauditinfo_t;

This structure could be added to a union to ensure that it is the same size as 
siginfo_t. This will keep the stack unwinders happy. The above structure 
could be expanded to also include:

          clock_t  si_utime;  /* User time consumed */
          clock_t  si_stime;  /* System time consumed */
          sigval_t si_value;  /* Signal value */
          int      si_int;    /* POSIX.1b signal */
          void *   si_ptr;    /* POSIX.1b signal */
          void *   si_addr;   /* Memory location which caused fault */
          int      si_band;   /* Band event */
          int      si_fd;     /* File descriptor */

But if we do that, we are too big to be in a union without increasing the 
overall size. We could overcome this problem by using si_addr to point to a 
new structure whenever there's no address fault. That address would be valid 
only until the signal handler returns or is longjmp'ed out of.

NEXT STEP

The next step is to decide which way is cleanest and acceptable by upstream 
developers. Are there holes in either way proposed above? Can sending an 
shutdown audit event via netlink be done without races?

-Steve