[PATCH v2] audit: report audit wait metric in audit status reply
Paul Moore
paul at paul-moore.com
Thu Jul 2 20:42:13 UTC 2020
On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englander at gmail.com> wrote:
>
> In environments where the preservation of audit events and predictable
> usage of system memory are prioritized, admins may use a combination of
> --backlog_wait_time and -b options at the risk of degraded performance
> resulting from backlog waiting. In some cases, this risk may be
> preferred to lost events or unbounded memory usage. Ideally, this risk
> can be mitigated by making adjustments when backlog waiting is detected.
>
> However, detection can be diffult using the currently available metrics.
> For example, an admin attempting to debug degraded performance may
> falsely believe a full backlog indicates backlog waiting. It may turn
> out the backlog frequently fills up but drains quickly.
>
> To make it easier to reliably track degraded performance to backlog
> waiting, this patch makes the following changes:
>
> Add a new field backlog_wait_sum to the audit status reply. Initialize
> this field to zero. Add to this field the total time spent by the
> current task on scheduled timeouts while the backlog limit is exceeded.
>
> Tested on Ubuntu 18.04 using complementary changes to the audit
> userspace: https://github.com/linux-audit/audit-userspace/pull/134.
>
> Signed-off-by: Max Englander <max.englander at gmail.com>
> ---
> Patch changelogs between v1 and v2:
> - Instead of printing a warning when backlog waiting occurs, add
> duration of backlog waiting to cumulative sum, and report this
> sum in audit status reply.
>
> include/uapi/linux/audit.h | 7 ++++++-
> kernel/audit.c | 9 +++++++++
> 2 files changed, 15 insertions(+), 1 deletion(-)
Hi Max,
In general this looks better than the previous approach, but I do have
a few specific comments (inline). It also important that in addition
to the requisite userspace patch, we also need a test added to the
audit-testsuite project so we can verify this functionality in the
future.
* https://github.com/linux-audit/audit-testsuite
> diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> index a534d71e689a..ea0cc364beca 100644
> --- a/include/uapi/linux/audit.h
> +++ b/include/uapi/linux/audit.h
> @@ -340,6 +340,7 @@ enum {
> #define AUDIT_STATUS_BACKLOG_LIMIT 0x0010
> #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
> #define AUDIT_STATUS_LOST 0x0040
> +#define AUDIT_STATUS_BACKLOG_WAIT_SUM 0x0080
Sooo ... you've defined this, but I don't see any of the corresponding
AUDIT_SET code that I would expect, was that an oversight? If not, it
is something we should support in the kernel as I'm sure admins will
want to reset this value at some point.
> #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT 0x00000001
> #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> @@ -348,6 +349,7 @@ enum {
> #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER 0x00000010
> #define AUDIT_FEATURE_BITMAP_LOST_RESET 0x00000020
> #define AUDIT_FEATURE_BITMAP_FILTER_FS 0x00000040
> +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM 0x00000080
In an effort not to exhaust the feature bitmap too quickly, I've been
restricting it to only those features that would cause breakage with
userspace. I haven't looked closely at Steve's userspace in quite a
while, but I'm guessing it can key off the structure size and doesn't
need this entry in the bitmap, right? Let me rephrase, if userspace
needs to key off anything, it *should* key off the structure size and
not a new flag in the bitmask ;)
Also, I'm assuming that older userspace doesn't blow-up if it sees the
larger structure size? That's even more important.
> #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
> AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> @@ -355,12 +357,14 @@ enum {
> AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
> AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
> AUDIT_FEATURE_BITMAP_LOST_RESET | \
> - AUDIT_FEATURE_BITMAP_FILTER_FS)
> + AUDIT_FEATURE_BITMAP_FILTER_FS | \
> + AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
>
> /* deprecated: AUDIT_VERSION_* */
> #define AUDIT_VERSION_LATEST AUDIT_FEATURE_BITMAP_ALL
> #define AUDIT_VERSION_BACKLOG_LIMIT AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
> #define AUDIT_VERSION_BACKLOG_WAIT_TIME AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
> +#define AUDIT_VERSION_BACKLOG_WAIT_SUM AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
>
> /* Failure-to-log actions */
> #define AUDIT_FAIL_SILENT 0
> @@ -466,6 +470,7 @@ struct audit_status {
> __u32 feature_bitmap; /* bitmap of kernel audit features */
> };
> __u32 backlog_wait_time;/* message queue wait timeout */
> + __u32 backlog_wait_sum;/* time spent waiting while message limit exceeded */
This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'?
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 87f31bf1f0a0..301ea4f3d750 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -136,6 +136,11 @@ u32 audit_sig_sid = 0;
> */
> static atomic_t audit_lost = ATOMIC_INIT(0);
>
> +/* Monotonically increasing sum of time the kernel has spent
> + * waiting while the backlog limit is exceeded.
> + */
> +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);
Needless to say, this should be renamed too so we don't go crazy.
> /* Hash for inode-based rules */
> struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
>
> @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> s.backlog = skb_queue_len(&audit_queue);
> s.feature_bitmap = AUDIT_FEATURE_BITMAP_ALL;
> s.backlog_wait_time = audit_backlog_wait_time;
> + s.backlog_wait_sum = atomic_read(&audit_backlog_wait_sum);
> audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
> break;
> }
> @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> return NULL;
> }
> }
> +
> + if (stime != audit_backlog_wait_time)
> + atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);
Since stime can only be different in one place in the code above
(after the schedule_timeout() call), why not move the atomic_add() up
there and drop the "if"? Yes there is the potential of calling
atomic_add() multiple times in this case, but the thread is waiting
anyway and this way we don't impact other code paths.
> }
>
> ab = audit_buffer_alloc(ctx, gfp_mask, type);
> --
> 2.17.1
--
paul moore
www.paul-moore.com
More information about the Linux-audit
mailing list