aggregation/viewer question

Mon Oct 13 20:24:46 UTC 2008

LC Bruzenak wrote:
> Has anyone been thinking about how to store/maintain the aggregated
> audit data long-term?
>
> In my setup, I will be sending data from several machines to one central
> log host.
>
> After a while, the number of logs/data will grow large. With hundreds of
> files, the rotate will take more time and the audit-viewer "select
> source" option becomes tedious. Most of my searches involve
> time/host/user. Using the prelude plugin helps a lot, because it
> highlights what is otherwise hidden in the data pool. But pulling out
> that record from a selection of log files isn't currently intuitive.
>
> I would think we'd put these into a RDB or structure them by time
> directory structure something like year/month/week ... or maybe
> something else entirely. I'm thinking also about ease of backup/restore
> with incoming records. I'd hate to shut down all the sending clients
> just to backup or restore my audit data, so that part will need to
> operate asynchronously.
>
> Before striking out on my own I thought I'd ask the list and see if
> there are any such plans already in the works.
>   

Yes, we plan on addressing many of these issues in IPA, not just for 
kernel audit data, but for all log data (e.g. Apache error log, Kerberos 
access log, SMTP logs, etc.). The basic idea is that there is will be a 
central server which accepts log data from individual nodes. The log 
data can be signed for authenticity and will be robustly transported via 
AMQP with fail over and guaranteed delivery. The log data will be 
compressed. You can specify which logs you want collected, their 
collection interval, along with record level filtering. Once on the 
server the log meta data is entered into a "catalogue"  (a relational 
database) which along with the meta data stores where the actual log 
data can be found on disk. The disk files will be optimized for 
compression and access. The catalogue manager will be able to 
reconstruct any portion of a log file (stream) from any node within a 
time interval. This can be used for external analysis tools, compliance 
reporting etc. The catalogue will be capable of intelligently archiving 
old log data and restoring it back into a "live catalogue". This is what 
is planned for v2 of IPA, which is anticipated to be about 1 year from 
now. In v3 of IPA the audit catalogue will support search and reporting 
on *all* the log data in the catalogue (not just audit.log but all log 
data). In v3 when data arrives at the catalogue it will be indexed for 
fast search and retrieval. Search will be based on tokens and key/value 
pairs and will accept constraints on nodes, time intervals, users, etc. 
(Note a relational database will NOT be used to support searching, 
rather searches will be performed via optimized reverse indexes on 
textural tokens, the use of an RDB will only be for managing the 
collection of log files)

A note about vocabulary: in "IPA land" when we say "audit data" or an 
"audit catalogue" or "audit search" the term "audit" refers to any log 
data, of which kernel audit data is just one subset.
> As a suggestion, the prewikka viewer seems like a workable model. I
> realize that viewer is built around the IDS structure, but as an event
> search tool it is pretty good and mostly complete. Having network access
> to it is also a nice feature.
>
> So right now I think that feeding the events into a DB and then using a
> tool with the same capabilities as are in the prewikka viewer would be a
> viable option. Others? Ideas?
>
> Thanks in advance,
> LCB.
>
>   

-- 
John Dennis <jdennis at redhat.com>