[Freeipa-interest] Re: [Freeipa-devel] Feedback requested on Audit piece of IPA

Thu Jul 17 12:49:51 UTC 2008

Gunnar Hellekson wrote:
> On  16-Jul-2008, at 8:19 PM, David O'Brien wrote:
>> Karl Wirth wrote:
>>> Currently we identified that audit system in general can be targeted 
>>> to:
>>> *Collect data from different sources
>>> *Consolidate data into a combined storage
>>> *Provide effective tools to analyze collected data
>>> *Archive collected data including signing and compression
>>> *Restore data from archives for audit purposes or analyses
>>>
>>> We need your feedback on a couple of questions:
>>> 1) Should we store structured log data for analysis, original log data,
>>> or both
>>> - To do analysis of the log data, it would be better to structure it 
>>> and
>>> store it.
>>> - But structured data is not the same as the original log file that it
>>> was taken from.   Do we need the original log file format for 
>>> reasons of
>>> compliance or can we throw it away?
>>> - Storing both parsed and unparsed data will have significant storage
>>> impact.
>>>
>> I'm just a beginner but my first reaction here is How is this going 
>> to affect a forensics situation? Shouldn't we always have access to 
>> untouched/raw data? We can parse it and create whatever structure is 
>> required on demand, but if we do it immediately and trash the 
>> original data, there's no going back.
>
> That's right. The user should always have the option of keeping the 
> raw data. Often, there are requirements to maintain that data on 
> write-once media, etc. so I don't think they'd take kindly to 
> summarily trashing it. It would be great it we could accommodate the 
> more hard-core folks, or folks who'd like the raw data for third-party 
> log-eating tools. I feel pretty strongly that we should at least have 
> the option of maintaining the original log file format. We can then 
> allow the raw logs to be managed via logrotate rules for retiring, 
> compression, signing, etc. This may mean that they do not get touched 
> at all, which is what some customers want.
>
But this means that we will have to store twice as much data. It will be 
terabytes! Is this what customers want? I would require a high end 
hardware to process these  logs if we want to provide any kind of 
analysis and correlation. This will be a trade off. We can collect raw 
data - not a big deal I just wanted to be sure that this is really the 
case.  

>>> 2) Should we parse the data into a structure format locally or back on
>>> IPA server?
>>> - Parsing locally and passing both parsed and original log data will
>>> increase network traffic but reduce load on server
>
> The a priori "forensic expert" in me is suspicious of munging data on 
> the client.  It seems as though we're solving a problem destructively, 
> since we lose the ability to verify the original data. What happens if 
> there's a bug in the parser? If we're supporting this, it should be 
> optional.
Ok we will provide an optional capability to preserve raw data.
What about filtering? The problem with the filtering is that you need to 
parse and sort out raw data. As a result you have raw data and parsed 
out data. Then you apply filter to parsed data and decide based upon the 
central policy if this event is of interest to you. If it is not of 
interest you throw it away. Is it a valid use case or in reality we need 
to collect everything and not filter anything? If we collect everything 
there is no need to parse on the client and thus there is only raw data 
to transfer to central location. We can do parsing there . This approach 
saves processing time on client and reduces network traffic but adds 
more burden to the server. We can create different architectures and 
provide same set of features, the question is more about which use case 
is primary. We should optimize the system for the primary use case.
I see two main use cases:
a) Customer wants to preserve and collect original data untouched 
without filtering and store it. Requirements to have capabilities to 
search and analyze are secondary.
b) Customer wants to  collect data for effective processing and 
analysis. Filtering is crucial. Raw data is optional and not that 
important.
We are not talking about real time log monitoring and intrusion 
detection. There is a separate product to do this (Prelude + IDS) and we 
do not want to duplicate it.
If we have to solve both use cases above we will seem to have worst of 
both worlds: a lot of processing, a lot of data to transfer and store. 
If we can select which use case is dominating we would be able to tune 
up the design to solve it best. Is this possible or these two use case 
are equal?

>
>>> 3) What is the scope of what should be included in the audit data in
>>> addition to what we will get from syslog, rsyslog, auditd, etc.  Those
>>> will give us data like user access to a system, keystrokes, etc.  What
>>> beyond that is needed.  For example, is the following needed: Files 
>>> user
>>> accessed on a system
>
> Between a keystroke logger, syslog and auditd, that takes care of just 
> about everything, including a log of the files a user accessed on a 
> system.

The problem is more about other platforms. On Linux we have auditd, 
inotify and a lot of other nice things that would help. But how to 
monitor file changes on Solaris, HP, AIX? We want to collect logs from 
all kinds of machines. Do we need to worry about this and build audit 
information collecting tools for those systems? Which tools a priority? 
How far we need to go?

Dmitri

>
> g
>
> --Gunnar Hellekson, RHCE
> Lead Architect, Red Hat Government
>
>
>
>
> _______________________________________________
> Freeipa-interest mailing list
> Freeipa-interest at redhat.com
> https://www.redhat.com/mailman/listinfo/freeipa-interest

-- 
Dmitri Pal
Engineering Manager
Red Hat Inc.