Log rotation and client disconnects

Fri Aug 13 15:06:34 UTC 2010

LC Bruzenak wrote:
> On Thu, 2010-08-12 at 11:16 -0400, rshaw1 at umbc.edu wrote:
>> > On Thursday, August 12, 2010 10:02:29 am rshaw1 at umbc.edu wrote:
>> >> I've discovered the issue since I sent it, anyway.  If num_logs is
>> set
>> >> to
>> >> 0, auditd will ignore explicit requests to rotate the logs.  I guess
>> >> this
>> >> may be intentional, but it's unfortunate as num_logs caps at 99 and I
>> >> need
>> >> to keep 365 of them.
>> >
>> > Have you looked at the keep_logs option for max_log_file_action?
>>
>> I did, but the man page states that keep_logs is similar to rotate, so
>> it
>> sounds like if I used this option, it would still rotate the log file if
>> it went above the max_log_file size, which I don't want to happen.  I
>> suppose I could just set max_log_file to 99999 or something (if that's
>> supported).  Typically, uncompressed log files for ~400 clients on the
>> central server end up being around 3-4Gb.
>
> Do you not want to rotate because of the time it takes?
> Yep, the keep_logs does a rotate without a limit.

I am required to rotate the logs once per day, and I would like to make it
exactly once per day. This is to make it easier to keep track (with
date-named logs), easier to keep 1 year's worth of logs (also required),
and easier to run reports on a particular workday's worth of events.

> The max_log_file value is an unsigned long so it should take a very
> large number. However, in case there is a lot of auditing you are not
> prepared for, I'd suggest limiting the file size to 2GB. The rotate time
> should be similar regardless of the file size.

I've made /var a little over 200G on the current audit collection machine
(and on its final destination, /var is much bigger than that).  I guess I
could set a very large "just in case" value that stops short of ludicrous,
but I'd really prefer that size-based rotation never happen.

> BTW, in what a time period are you getting the 3-4GB amounts? Are you
> happy with the data you are getting - or maybe you could pare it down
> some with audit.rules tweaks on the senders?

That amount of data is in one day, for all clients.  Whether I am happy is
somewhat less relevant than whether I am STIG-compliant :p  However, I do
have the data I want for running a few everyday, useful reports.  I'm not
sure whether I could reduce it much and still be auditing everything I'm
required to (I started with the example rules file, and added quite a bit;
each machine auto-generates a list of SUID/SGID binaries and adds rules
for them, etc.)  The size is manageable for us.

Given the nature of the systems, there are often lots of files being
created and destroyed.  This will get even worse once the RHEL4 ones are
brought up (probably to 6), as I'm not auditing them since they'd need
different rules and don't have audisp.

(Technology preview or no, I'm very happy to have audisp; certain other
systems aren't so lucky.)

> Each day I have to move mine out of the way for the same reasons.
> However, the search tools are then impacted, since you'll need to know
> where to find them.
> Also, since it appears you have a lot of data, I assume you are finding
> performance issues on the audit-viewer?

Well, I can't run aureport --summary; it pegs the CPU for hours and hours.
 That's not really a big deal for me, though.  I have a script that runs
shortly after the logs are rotated, generating a report based on the
previous day's data.  It's using 3 aureports and one ausearch (piped
through a bunch of stuff).  Usually takes less than 15 minutes to run.  At
the moment, this is the main way we're using the data, though I'm hoping
to do more in the future.  I've glanced at the audit+Prelude HOWTO, since
Prelude can do a few other things that appeal to me.

(The ausearch used to be an aureport, but aureport --anomaly -i doesn't
seem to get the node/host names from the logs, which is why I ended up
writing my own thing.  Interestingly, --anomaly isn't even in the man page
for aureport; I've no idea where I found it.  I don't know if any of this
is different in more recent versions.)

>> I'm still not sure what to do about the disconnection issues (although
>> hopefully those will be very infrequent once I'm no longer restarting
>> any
>> of the daemons).  If a client does lose the connection to the server for
>> a
>> while though (say, an hour-long network outage for networking upgrades),
>> I'd like to be able to tell them to try reconnecting periodically, and
>> the
>> combination of network_retry_time and max_tries_per_record doesn't seem
>> to
>> be the way to do that.
>>
>> Other than checking the logs, is there a way to determine whether or not
>> a
>> running audispd is connected to the remote server?
>
> I do a combination of things to detect this on the sending side.
> The network_failure_action of the audisp-remote.conf file allows for a
> custom action using the "exec" option.
>
> The remote_ending_action = reconnect helps if the  (server) restarts its
> auditd. Maybe your version is different from mine but I get the
> reconnects...

Hrm.  This is what I have:

network_retry_time = 30
max_tries_per_record = 60
max_time_per_record = 5
network_failure_action = syslog (looks like I'll be changing that)
...
remote_ending_action = reconnect

Are you using the heartbeat_timeout stuff?  I haven't been.

> Also - I have a big ugly system involving timestamps and reconnect
> logic.

Yeah, I think I might come up with something like that, and use the "exec"
option for network_failure_action combined with cron stuff to keep
retrying.

Thanks,

--Ray