[PATCH v2] audit: report audit wait metric in audit status reply

Richard Guy Briggs rgb at redhat.com
Fri Jul 3 21:29:49 UTC 2020


On 2020-07-02 16:42, Paul Moore wrote:
> On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englander at gmail.com> wrote:
> >
> > In environments where the preservation of audit events and predictable
> > usage of system memory are prioritized, admins may use a combination of
> > --backlog_wait_time and -b options at the risk of degraded performance
> > resulting from backlog waiting. In some cases, this risk may be
> > preferred to lost events or unbounded memory usage. Ideally, this risk
> > can be mitigated by making adjustments when backlog waiting is detected.
> >
> > However, detection can be diffult using the currently available metrics.
> > For example, an admin attempting to debug degraded performance may
> > falsely believe a full backlog indicates backlog waiting. It may turn
> > out the backlog frequently fills up but drains quickly.
> >
> > To make it easier to reliably track degraded performance to backlog
> > waiting, this patch makes the following changes:
> >
> > Add a new field backlog_wait_sum to the audit status reply. Initialize
> > this field to zero. Add to this field the total time spent by the
> > current task on scheduled timeouts while the backlog limit is exceeded.
> >
> > Tested on Ubuntu 18.04 using complementary changes to the audit
> > userspace: https://github.com/linux-audit/audit-userspace/pull/134.
> >
> > Signed-off-by: Max Englander <max.englander at gmail.com>
> > ---
> >  Patch changelogs between v1 and v2:
> >  - Instead of printing a warning when backlog waiting occurs, add
> >    duration of backlog waiting to cumulative sum, and report this
> >    sum in audit status reply.
> >
> >  include/uapi/linux/audit.h | 7 ++++++-
> >  kernel/audit.c             | 9 +++++++++
> >  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> Hi Max,
> 
> In general this looks better than the previous approach, but I do have
> a few specific comments (inline).  It also important that in addition
> to the requisite userspace patch, we also need a test added to the
> audit-testsuite project so we can verify this functionality in the
> future.
> 
> * https://github.com/linux-audit/audit-testsuite
> 
> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index a534d71e689a..ea0cc364beca 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -340,6 +340,7 @@ enum {
> >  #define AUDIT_STATUS_BACKLOG_LIMIT     0x0010
> >  #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
> >  #define AUDIT_STATUS_LOST              0x0040
> > +#define AUDIT_STATUS_BACKLOG_WAIT_SUM  0x0080
> 
> Sooo ... you've defined this, but I don't see any of the corresponding
> AUDIT_SET code that I would expect, was that an oversight?  If not, it
> is something we should support in the kernel as I'm sure admins will
> want to reset this value at some point.

Have a look at the lost reset code as an example.  It is tricky since it
does an atomic reset while delivering a value back up the control plane
and issuing a record.  There were some fallout bug fixes because it
wasn't as obvious as it looked.

> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > @@ -348,6 +349,7 @@ enum {
> >  #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> >  #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> >  #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> 
> In an effort not to exhaust the feature bitmap too quickly, I've been
> restricting it to only those features that would cause breakage with
> userspace.  I haven't looked closely at Steve's userspace in quite a
> while, but I'm guessing it can key off the structure size and doesn't
> need this entry in the bitmap, right?  Let me rephrase, if userspace
> needs to key off anything, it *should* key off the structure size and
> not a new flag in the bitmask ;)

It could key solely off the existance of AUDIT_STATUS_BACKLOG_WAIT_SUM
via HAVE_DECL_AUDIT_STATUS_BACKLOG_WAIT_SUM which would be defined in
configure.ac similar to AUDIT_STATUS_BACKLOG_WAIT_TIME as has already
been done in Max' userspace patch.

It should be possible to drop AUDIT_VERSION_BACKLOG_WAIT_SUM in
userspace and have it work.

> Also, I'm assuming that older userspace doesn't blow-up if it sees the
> larger structure size?  That's even more important.

I believe it won't even notice, but this should be tested.

> >  #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
> >                                   AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> > @@ -355,12 +357,14 @@ enum {
> >                                   AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
> >                                   AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
> >                                   AUDIT_FEATURE_BITMAP_LOST_RESET | \
> > -                                 AUDIT_FEATURE_BITMAP_FILTER_FS)
> > +                                 AUDIT_FEATURE_BITMAP_FILTER_FS | \
> > +                                 AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
> >
> >  /* deprecated: AUDIT_VERSION_* */
> >  #define AUDIT_VERSION_LATEST           AUDIT_FEATURE_BITMAP_ALL
> >  #define AUDIT_VERSION_BACKLOG_LIMIT    AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
> >  #define AUDIT_VERSION_BACKLOG_WAIT_TIME        AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
> > +#define AUDIT_VERSION_BACKLOG_WAIT_SUM AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
> >
> >                                 /* Failure-to-log actions */
> >  #define AUDIT_FAIL_SILENT      0
> > @@ -466,6 +470,7 @@ struct audit_status {
> >                 __u32   feature_bitmap; /* bitmap of kernel audit features */
> >         };
> >         __u32           backlog_wait_time;/* message queue wait timeout */
> > +       __u32           backlog_wait_sum;/* time spent waiting while message limit exceeded */
> 
> This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'?

How about 'backlog_wait_time_cumul' (or 'backlog_wait_time_cumulative')?

> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 87f31bf1f0a0..301ea4f3d750 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -136,6 +136,11 @@ u32                audit_sig_sid = 0;
> >  */
> >  static atomic_t        audit_lost = ATOMIC_INIT(0);
> >
> > +/* Monotonically increasing sum of time the kernel has spent
> > + * waiting while the backlog limit is exceeded.
> > + */
> > +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);
> 
> Needless to say, this should be renamed too so we don't go crazy.
> 
> >  /* Hash for inode-based rules */
> >  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> >
> > @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                 s.backlog               = skb_queue_len(&audit_queue);
> >                 s.feature_bitmap        = AUDIT_FEATURE_BITMAP_ALL;
> >                 s.backlog_wait_time     = audit_backlog_wait_time;
> > +               s.backlog_wait_sum      = atomic_read(&audit_backlog_wait_sum);
> >                 audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
> >                 break;
> >         }
> > @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> >                                 return NULL;
> >                         }
> >                 }
> > +
> > +               if (stime != audit_backlog_wait_time)
> > +                       atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);
> 
> Since stime can only be different in one place in the code above
> (after the schedule_timeout() call), why not move the atomic_add() up
> there and drop the "if"?  Yes there is the potential of calling
> atomic_add() multiple times in this case, but the thread is waiting
> anyway and this way we don't impact other code paths.
> 
> >         }
> >
> >         ab = audit_buffer_alloc(ctx, gfp_mask, type);
> 
> paul moore

- RGB

--
Richard Guy Briggs <rgb at redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635




More information about the Linux-audit mailing list