Auditd statsd integration

Wed Feb 10 20:06:12 UTC 2021

On Wednesday, February 10, 2021 2:11:55 PM EST LC Bruzenak wrote:
> On Wed, Feb 10, 2021 at 1:07 PM LC Bruzenak <lenny at magitekltd.com> wrote:
> > On Mon, Feb 8, 2021 at 7:44 PM Steve Grubb <sgrubb at redhat.com> wrote:
> >> Hello,
> >> 
> >> I have recently checked in to the audit tree 2 experimental plugins. You
> >> can
> >> enable them by passing --enable-experimental to configure. One of the
> >> new
> >> plugins is aimed at providing audit metrics to a statsd server. The idea
> >> being that you can use this to relay the metrics to influxdb, prometheus
> >> or
> >> some other collector. Then you can use Grafana to visualize and alert.
> >> 
> >> Currently, it supports the following metrics:
> >> 
> >> kernel.audit.lost
> >> kernel.audit.backlog
> >> auditd.free_space
> >> auditd.plugin_current_depth
> >> auditd.plugin_max_depth
> >> audit_events.total_count
> >> audit_events.total_failed
> >> audit_events.avc_count
> >> audit_events.fanotify_count
> >> audit_events.logins_failed
> >> audit_events.logins_success
> >> audit_events.anomaly_count
> >> audit_events.response_count
> >> 
> >> I'd be interested in hearing if this would be useful. And if these are
> >> the
> >> right metrics that people are interested in. Should something else be
> >> measured? Should an example Grafana dashboard be included?
> >> 
> >> Let me know what you think.
> >> 
> >> -Steve
> > 
> > Steve,
> > 
> > I think this could be awesome; hoping to give it a try soon. An example
> > dashboard would be very helpful if you could include that.
> > The stats you already point out a good start.
> > 
> > I'd also like to have a way to parse the per-machine kernel-assigned
> > event IDs for missing ones. Might that need a separate plugin for that or
> > could something be done within this setup?

This is not tracking event IDs. I don't think that fits with performance 
metrics. To do this, you'd need to keep track of all events coming in and 
some way of determining what's missing. Which means keeping event state 
around until some timeout just in case a straggler comes through late.

> > I'm pretty sure there are more metrics that would be desired as well as
> > some derived; e.g. take a per-user login/logoff set to identify time
> > spent on a particular machine (screenlocks notwithstanding, but maybe
> > eventually).

I was hoping to hear from people that might currently be using Grafana or 
Graphite to hear if there is anything else needed. Do we need to namespace 
the machines? If so, how is the best way based on experience? Is dot notation 
better or underscores?

As for session time, I wonder if that kind of metric is currently provided by 
other parts of statsd/telegraf?

> > Or perhaps if clients send events+heartbeats, when are they
> > up/down? These are some of the questions I've heard from security
> > overseers.

I suppose it would be easy enough to check the audisp-remote state report for 
it's information.

> > And while some of these may not be inspected directly by the end users,
> > in the case of trouble calls or questions they might be the exact thing
> > I'd ask them to relay to me in order to diagnose a problem or answer a
> > question remotely.

That's the idea with system metrics...to see the system getting in trouble in 
realtime before the user calls. There are other system metrics that can be 
configured into statsd/telegraph and standard dashboards for Linux Server 
metrics. How this differs is that this is statistics specifically aimed at the 
audit daemon.

> ... and I forgot to ask - can you include a README there which specifies
> the minimum kernel/userspace level of code required?

There is no minimal kernel. It does need an audit-3.0 daemon in order to dump 
internal state. However, if it doesn't find the state report, then it simply 
doesn't update those counters. So, in that respect, you could transplant it 
to pretty much any audit daemon.

-Steve