Handling -ENOBUFS

Mon Nov 10 21:25:34 UTC 2008

On Thu, Nov 6, 2008 at 12:16 PM, Eric Paris <eparis at redhat.com> wrote:
> On Wed, 2008-11-05 at 18:56 -0200, Lucas C. Villa Real wrote:
>> On Wed, Nov 5, 2008 at 4:19 PM, Steve Grubb <sgrubb at redhat.com> wrote:
>
>> >> One interesting thing which I noticed is that 'auditctl -s' doesn't
>> >> report that messages were lost,
>> >
>> > They weren't lost by the audit system so it doesn't know they didn't arrive.
>>
>> Do you think it would make sense to add an extra member to struct
>> sk_buff (a pointer to a callback function) and then have
>> skb_queue_tail() signal if it failed to send a message? That would
>> allow audit to keep track of such losses, as well as any other
>> subsystem using netlink for communicating with userspace.
>
> Getting a new field in skb is basically a non-starter.  Well, unless you
> can find a way to drop 2 fields...

Hi, Eric

Sorry for the late reply.

Indeed, I looked at that structure and it seems like that's not the
best approach.

> Anyway, I just walked the entire kernel audit "send" system and I don't
> see any place we could drop an skb.  I didn't walk all of the receive
> side so maybe there is something in that code but it's unlikely and that
> 'should' be noticed by auditd if there were any errors...

What about audit_send_reply_thread()? We don't check for
netlink_unicast()'s return value. Even though that's not the channel
used to send events, the error seen in auditd logs could be coming
from this point, as the scripts we're running here perform a couple of
queries from time to time to audit in kernel space.

> I know auditd has a issue where it can not write things to the log file
> even though they came out of the netlink socket (messages over 4k or
> so).  Can you run auditd under strace and see if the number of netlink
> messages it gets is equal to the number you expect or equal to the
> number that show up in the logs?  Things aren't supposed to get dropped
> silently....

Last night I realized that my audit_sanitize_log script (which groups
events by their sequence number) had a problem dealing with events
which crossed the boundaries of an audit log after a rotate request
(ie: a few lines for a given event showed up in the rotated log, and a
few others in the new file which replaced that one). That, together
with a few other things that I've been experimenting here, was the
cause of the files missing from my final report.

Still, I'd like to fix both problems so that we certify that no events
are lost, ever. Do you think Audit could be made more reliable
(regarding the message lost indication) by checking all of
netlink_unicast() retvals?

Thanks!
Lucas