[Freeipa-devel] "Commit comments log" functionality in IPA

Thu Nov 6 18:47:39 UTC 2008

Simo Sorce wrote:
> On Thu, 2008-11-06 at 10:56 -0500, Dmitri Pal wrote:
>   
>> Simo Sorce wrote:
>>     
>
>   
>> Well, this is the main point. It would have been great if we had a 
>> product that would be able to act as DS and DB at the same time.
>> I agree that this information would easier be managed in a DB. But we do 
>> not have one. We can create one but then we face all the same issues as 
>> we face with policies:
>> a) Replication
>>     
>
> Why do we need to replicate a log database? It is not vital for the
> normal functioning of the Identity solution so it can well be a database
> or a file kept on a single server
>
>   
You are continuing to view it from the wrong angle.
It is not a log database. It is not a log file. It is a property of the 
entry. It should be stored close to the object it is related to.
It is not an audit data. You want this data to be viewable when you are 
viewing or modifying the entry.
Making the UI go and pull this data from the external database would be 
an huge overhead.
This data should be available regardless of which master you are 
connecting to.
By all its characteristics it is not a log data so please stop thinking 
about it from that perspective.

>> b) Referential integrity across different data stores
>>     
>
> Use object GUIDs, that's all is needed, seriously, it is very simple to
> keep referential integrity when GUIDs are used.
>   

Across different data stores? It does not matter what you use as a 
foreign key.  The whole notion of having two data stores and creating 
referential integrity between the two make be cry.
Been there, done that. It is extremely complex and nearly impossible to 
make it right regardless whether it is a DB+DB or DB+DS or DS+File or 
any other combination. It is always more complex than a synch task. I 
strongly believe that having a DS plugin is a small, simple and what is 
more important self-contained solution that does not create any 
dependencies .

>   
>> c) Backup - restore
>>     
>
> Not sure why this would make any difference. We do not offer any special
> backup/restore feature with IPA so it would be just business as usual.
>
>   
If you have more than one data stores you have to make sure that your 
backup and restore are also synched otherwise the referential integrity 
goes out of the window.
We do not have a problem in IPA because we do not have any storage that 
should synched with other storage (except AD sycnh and you know how not 
simple it is :-) ).
This is why I was strongly for storing policies in the DS itself. It 
solves a lot of problems you never even want to think about.
Same is here.

>> With policies the decision was made to store them in DS not to face 
>> these problems.
>>     
>
> Policies will be constantly queried by clients, that's why you need to
> replicate them on multiple servers to be able to scale up when the
> number of clients grow. A log does not need that.
>   

It is not a log. It is a property of the object. It will be queried and 
displayed in the UI any time the object is viewed or modified.

>   
>> I think that creating a parallel infrastructure for policies or for 
>> commit logs is  a huge overhead.
>>     
>
> I think logs and policies are completely different in nature, these logs
> are just audit trails in nature, and should be rather be developed as
> part of the audit system.
>   

They are not audit trails. They are comments that contain information. 
Like this.
For example on the SUDO policy object:

"DPal 11/05/08 12:15PM: Added new group "contractors" to the SUDOERS 
policy according to the decision made by the security board on 10/28/08"
"SSorce 11/17/08 2:47PM: Removed group "contractors" from the SUDOERS 
policy to fix a security problem reported in ticket #2131209"

Is this audit? No. It looks like one but it is not. It has one important 
piece of information: why? who authorized the change to the the critical 
infrastructure.
If we treat it as pure audit log record we will loose its relevance. It 
will go into the audit server and it would be possible to pull it from 
there on demand but not in the context of modification of the entry in 
DS. Making the UI do this search against the audit server would be an 
overhead and there is no guarantee that this data is not archived or 
cleaned from audit DB.
When it is in DS and close to the object we can guarantee that it is 
complete, relevant and always accessible when the entry is viewed and 
administrator is about to make a critical change to the system.

I think that since we a re a security product any changes to the 
policies that define access rules should have a comment facility like this.

>>>> Can you provide what are the reasons that would make a good idea to
>>>> store this kind of data in DS instead of a log file or a log database ?
>>>>
>>>>     
>>>>         
>> Log file is not manageable. It is hard to pull and query.
>>     
>
> About the same complexity of pulling an query a strctured attribute in
> LDAP. To be honesyt for most people a file is *much* more easier to pull
> and parse than an ldap attribute.
>
>   
I suggest not pursue this argument - it is talking about apples and oranges.

>>  It is good as 
>> a stream of data but when you need to analyze it you use other tools.
>> Log database is different. It is for different purpose. Log database is 
>> for analysis. It would be hard to relate the history or a DS entry to 
>> the entry itself and present them in proper way in UI.
>>     
>
> I don't think so, as long as you store the GUID of the entry it refers
> to as part of the metadata of the comment/log entry it is quite easy,
> just a single ldap search (or vice versa, just a single "grep")
>
>   

See replication issues above. Log file or external DB just do not work.

>> It would be much more complex than you think.
>>     
>
> I honestly think you greatly over-estimate the complexity of parsing a
> log file in a modern scripting language like python, or even just old
> stuff like grep, cut or awk.
>
>   
Parsing is not a problem. Manging is the problem. Access control, 
replication, backup etc.
I do not want to duplicate all the arguments again.

>>>> I honestly see no reason, nothing but a human operator would make any
>>>> use of such attributes, they are not useful to any machine which is the
>>>> real consumer of DS data.
>>>>     
>>>>         
>> I completely disagree with trhis statement. DS is a data store for all 
>> data that needs to be stored unless it does not belong there.
>>     
>
> The point is that comments or log do not belong there, they just don't.
>   

This is a property not a log.

>   
>> The "commit log"  data is on the edge but there is unfortunately no 
>> better place to put it.
>>     
>
> A log file or a log database, it's auditing related we will have
> auditing facilities, use them.
>   

Just disagree. See comments above.

>   
>>>> As you pointed on it would add an unnecessary amount of data to
>>>> replicate around.
>>>>
>>>>     
>>>>         
>> If we use MV attributes the replication problem will be solved.
>>     
>
> NO, it will just be softened, you still replicate around data useless
> 99% of the time.
>
>   

No I checked. According to DS gurus the operation is replicated not the 
data so it is not an issue.
Check with Nathan and Rich if you do not believe me. This was one of the 
first things I checked because i was concerned about this too.

>>>> Also because DS is not a logging server but a directory server there are
>>>> many other problems in trying to use it to store logs.
>>>>
>>>>
>>>>     
>>>>         
>> This is not a log. I think this is a fundamental difference. It is a set 
>> of comments.
>>     
>
> Sorry but I see no difference between a set of "comments" or a log, a
> log is just a set of "comments" about what is going on in some generic
> "process".
>   

The log is a set of events generated by softwere in repsonse to user 
actions.
Comments are not a log IMO. They can be BLOG but this is in other 
context :-)
>   
>>  IMO they should be tightly related to the object they refer 
>> to and not be somewhere else in the log server.
>>     
>
> Why?
>
>   

Because when something goes wrong it should be there. The policies are 
too critical to the entire enterprise.
It is much easier to  find what was going on right there with that kind 
of comments .

>> The log will contain who did what when. But it never answers the 
>> question "why".
>>     
>
> If you put a comment in a log it can also answer why, I really do not
> see your point here.
>
>   

We view it from different angles. You are are stuck with the log approach.

>> The "commit comment" feature is designed for that use case.
>> May be the word selection confused you but it is not a "log".
>>     
>
> No, the nature of a comment is the same as that of a log, it is just
> that the "comment" is user generated instead of automatically generated
> but it really bears no relevant semantic difference.
>   

Disagree.

>   
>> Think about comments in bugzilla bug. Bugzilla bug is not a log and 
>> never perceived as such. "commit comment" feature should not be too.
>>     
>
> Bugzilla is used to have "conversations" about topics. The "commit
> comment" is instead just the same as a commit message in an SCM system.
> Not surprisingly to see the list of commit comments in cvs, svn,git,
> etc.. the command is "log" (cvs log, svn log, git log ....)
>
> They are just logs.
>
>   

NO.

>> I agree but I do not see a better alternative. Implementing an external 
>> DB synchronized with DS is much more complex than you would think.
>>     
>
> I beg to strongly disagree. All you need is to use GUIDs on objects,
> that is your "join" key. It makes thing extremely simple to manage.
>   

Have you ever done this? I did. It is not that simple.

>   
>>>> If comments are required for audit trails I think they should just go in
>>>> the auditing system and marked as special "comment" logs. Otherwise they
>>>> should probably simply go in a normal log file or a relational log
>>>> database (the latter in case online searches are required).
>>>>     
>>>>         
>> As I said they are not a log and this approach will make the feature 
>> useless.
>>     
>
> Evidently there is a problem in understanding what this feature is
> useful for.
>
> >From your premises about why customers need it, it is useful as an audit
> trail log to know why an object is in that state. Is there any other use
> to it ?
>
>   

The main reason to have it is to have an answer to "why somone did what 
he did". And it should be there at hand so that when next guy comes in 
and tries to clean the mess he knows why the previous change was made 
and who authorized it. Then re can turn to audit system and dig more but 
the recorded comment will give him a very good starting point.

This is all about authority and responsibility.

>>>> Now to some of the reasons why I don't see DS as a viable option:
>>>>
>>>> - Multi-value attributes are not ordered, so you need to invent some
>>>> scheme to store this data structured so that ordering can be preserved.
>>>> Sure probably using the "posting" date before the content is all is
>>>> needed, but that makes attributes not searcheable.
>>>>
>>>>     
>>>>         
>> I think that creating a generic plugin that would allow storing ordered 
>> MV attributes would be a big benefit for everybody.
>>     
>
> The problem is that ordered MV are not defined in the LDAP protocol, you
> would need to come up with a standardizable way to manage how to add
> entries so that they fall in the precise order you need them to. This
> would mean modifying the add operation either with a control or by
> creating an extended operation. 
>
>   
I will see what RFCs exist for this case. Rich pointed out that OpenLDAP 
has it.

>> We can use the "commit comment" use case to create one. I would think 
>> that DS folks would find such plugin pretty valuable.
>> I am actually surprised to the fact that one does not exist yet. It 
>> would have solved a lot of different issues and paved a way for even 
>> broader adoption of the DS.
>>     
>
> This is *way* more work than you think it is. And the reason why nobody
> did it and no clear standard has still been proposed.
>
>   

That is why I do not want to do it in one step. I want to do commit 
comment plugin first focusing on specific functionality and then reuse 
it to solve a more generic problem.

>> I would start with implementing "commit comment" feature as a step 
>> towards a generic ordered MV value plugin solution.
>>     
>
> This alone would require considerable time.
>   

Probably a week for expert like Nalin and up to 3 weeks for novice like 
me. :-)

>   
>> This would actually mean turning a MV attribute into sort of mini table 
>> with records associated with an entry in this table.
>> I think it is a very cool feature.
>>     
>
> Yes, but would mean modification of the LDAP protocol, you do not want
> to do that outside of a standardization body, or it will most probably
> be just a dead end.
>   

I do not see what you are talking about. There is no change to the 
protocol at all.
There are tons of RFCs that store complex structures in the attribute.
I do not see how my solution is different from those.

>   
>>>> - You would have to create a clean up process that removes old stuff, I
>>>> don't think that keeping around a hundred entries log for years would
>>>> make sense.
>>>>     
>>>>         
>> I am not sure it should be done day one. But I will think more about 
>> cleanup.
>> I view this as a second tear functionality on top of the original feature.
>>
>>     
>>>> - We would need to index yet another attribute if you want to make
>>>> searches on it, note also that just consulting the log would require
>>>> searches on the identity store increasing its load, something a log
>>>> file/database would avoid completely.
>>>>
>>>>     
>>>>         
>> The whole value of the feature is to have the whole list. There is no 
>> need to index it since the search by this attribute does not make sense.
>> This is not a valid concern.
>>     
>
> Can you explain what this list is used for actually? And why someone
> should be interested in a comment written 5 years earlier and that is
> completely outdated as the data in the object has no bearing with that
> original comment anymore?
>
>   

We can show last X comments in the UI. This is irrelevant. You want to 
know how this whole thing ended up in the state it is in and why.
Then you can come up with the effective remediation. You can look at 
just last X comments or the whole stack is up to you.
Whatever is needed for you as admin to make the right decision.

>>>> - If you invent a complex format you loose the capability to do decent
>>>> filtering on searches, meaning you will often need to do wide scope
>>>> searches and implement filtering in the UI (slooow, and loads DS)
>>>>
>>>>     
>>>>         
>> Again as I said: there is no need to filter by this attribute. You will 
>> pull this MV attribute if you need it. Searching inside the values 
>> pulled out is up to the application not to DS.
>>     
>
> This just tells that storing it in DS is really not required.
> If you need it you can pull it from a DB or a file equally well.
>
>   
External source is a problem.

>>>> - You have no relation of events (ldap not beeing a relational database
>>>> makes it particularly difficult indeed).
>>>>
>>>>     
>>>>         
>> There is no relation to the event. There is a relation the object itself 
>> since the attribute that will contain the commit comments will be a part 
>> of the same entry.
>>     
>
> A comment is related to a "change" in an object, therefore it is
> logically related to an event (the change), insomuch that you require
> ordering of the entries. Now assume I have a set of events and related
> comments, like the following:
>
> "add foo to sudoers file"	"foo need access to x as admin"
> "add bar to sudoers file"	"bar need access to x as admin"
> "remove foo from sudoers file"	"foo need no more access as admin"
> "add baz to sudoers file"	"baz need access as admin"
> "remove baz from sudoers file"	"baz need no more access as admin"
> ...
>
> Now who needs the full list ?
> After 30-40 changes why do you care about something like lines 4 and 5
> except for auditing purposes ? And if you do not have a date associated
> what is it useful for ?
>
>   
These comments are useless. They do not answer "why". I did the change 
because my manager authorized me or I responded to ticket  #XXXXXX or 
security board authorized me  - this is what should be put into the 
comment. Not "foo needs access to x as admin". Such comment is the 
duplication of the even and bears no value. The comment should contain 
information that links the event (changing of the policy) to the formal 
process that authorized this change. Hope this clarifies the purpose and 
the difference.

>> There is no need to have any kind of cross references if it is done this 
>> way.
>>     
>
> If there are no cross references like the date the event/change happened
> or who made the operation it would be quite useless (see above example).
>
>   
>>>> - If a single UI command changes many different objects, where do you
>>>> store the comment? In one? All of them?
>>>> If in one how do you relate changes to others ?
>>>> If you replicate it an all objects how do you deal with access to all
>>>> entries? (see below)
>>>> See also scenario above about angry admins if you require a comment for
>>>> each object being changed.
>>>>
>>>>     
>>>>         
>> The "commit comment" feature by nature makes sense in the context of the 
>> top level object as i mentioned above.
>> If change happens to several entries at once the designer of the schema 
>> and UI should decide what would be the best approach and what entry the 
>> "commit comment" should be applied to.
>>     
>
> This is an operational policy, can't be decided at the schema level, it
> completely depends on what kind of event the specific security
> administrator wants in the specific deployment.
>
>   

We can attach it to multiple different objects but the system 
administrator - the "root" of the whole IPA deployment - will have a way 
to say where this is mandatory, where optional, and where it should be 
hidden. The UI and CLI will respect these settings.

>> If we do it as auxiliary class we have a flexibility to use it in 
>> multiple places.
>>     
>
> SCNR but I read this like: we can clutter the DS as much as we want :)
>
>   

We can store whatever we want in the DS. IMO the 2307 is a good example 
of cluttering LDAP especially netgroups schema :-)
I think we are playing much more nicely than this and other RFCs.

>> I really do not see a problem here. You just pick the main entry you are 
>> dealing with. If the UI touches multiple entries of different kind in 
>> one step it is a subject for deep though and reevaluation.
>>     
>
> It will, I am trying to evaluate this *before* we waste a lot of time in
> something that seem wrong to me on way too many levels to easily express
> them all. for some aspects this is just a feeling, but so far, the
> further we dig into it the more the feeling grows.
>
>   
NO comments.

>> Such operations especially with DS where transactions are not supported 
>> should be avoided.
>>     
>
> Reality is people will touch multiple objects when changing stuff
> around, and transactions really do not matter in this case (see the
> angry-admin example I posted in the previous email).
>
>   

This is why we have CLI tools and UI so that things can be done 
properly. Messing with raw data is always dangerous if you do not know 
what you are doing.

>>>> - Comments may contain sensitive information that should not be leaked,
>>>> so comments should not me generally available for search on ldap.
>>>> This would require to add (on the fly?) ACIs on objects that get
>>>> comments.
>>>>
>>>>     
>>>>         
>> The ACI defines the access control rules for the MV attribute. I doubt 
>> that thew comments would contain sensitive information.
>>     
>
> I don't think ACIs can express something like "append-only".
>
>   

Can Nathan or Rich comment on that?

>> This is a matter of administrative policies not software.
>>     
>
> Sorry I do not get what this means. We are talking about access
> authorization to an attribute that seem to me to be framed in a way the
> software can't cope with, I think software matters.
>   
Anyone can read, the one who can edit the entry can add.
The one who can delete the entry can delete all values together.
Noone can modify.

I think plugin can easily enforce this logic.

>   
>> The first step 
>> is at least to treat it as a MV attribute with ordering that can't be 
>> modified or deleted.
>>     
>
> If you can write it you can modify it.
>   

This is where plugin will reject the attempt to write.

>   
>> Later as a second tear feature we can start thinking about more 
>> discretionary read access control. Again I do not see a big issue here 
>> for the first implementation.
>>
>>     
>>>> - Some ACI may allow a lower level admin to perform an operation on some
>>>> attributes, but not add objectclasses or new attributes.
>>>> We loose the comments in this case ?
>>>>     
>>>>         
>> No. I will dig into that but the plan is to have consistent ACI  rules.
>> I hope that DS specialists will chime in and confirm that the ACI has 
>> enough flexibility to deal with the  object I am suggesting.
>>     
>
> I don't think so, if I correctly understood how you want to manage it.
> And btw the way you seem to be willing to manage it once again resembles
> an audit trail log, and that's I think because it ultimately is just an
> audit trail log.
>
>   
See the logic in the previous comment.

>>>> - Anyone with write access to the attribute will be able to change the
>>>> contents, making them generally completely useless as audit trails.
>>>> Delegation of any minor task would require write access to comments all
>>>> over the place.
>>>>     
>>>>         
>>>   
>>>       
>> No. The whole idea is to make it non-editable at all. Only add.
>>     
>
> Exactly this is something that does not exist in the LDAP model, nor in
> the ACI model we have.
>   
I am pretty sure the plugin can take care of that.  If not  I would 
agree that this is not a good idea.
Nathan? Rich?

>   
>> Only 
>> later we might start diving into ACIs and deal with the complexity of 
>> editing the data by admins that have different levels of privileges.
>>     
>
> No, architectural problems must be evaluated first in this case, because
> you are trying to construct something that is so off the way LDAP works
> for something that seem so irrelevant for the general functioning that I
> want to understand exactly why it is so important, Because from this
> reading I just re-evaluated the time need to "adjust" DS to handle this
> single attribute in term of several weeks, and that's a lot of effort,
> just to keep around some log that can instead very easily be piped in a
> file or in a database with an effort that will take a lot less time.
>
>   
The solution if possible is self contained and does not rely on any 
external piece of functionality - audit server (that would not be up to 
the task for quite some time).
Adjusting DS is much simpler (I think) than building a logic of logging 
this into external store (file or DB) and then pulling it out when we 
need to edit policy and see who was messing with it and why.
I have shown all the arguments about this at the top of this thread.

>>> Forgot another important few:
>>>
>>> - It would make extremely difficult for people to extract this
>>> information. Instead of connecting to a well known relational database
>>> with well known tools used for reporting, they would have to build a
>>> custom parser that speaks LDAP. This thing alone would be a deadly one
>>> imo.
>>>
>>>   
>>>       
>> This is not a log. This is a comment on the entry.
>>     
>
> I am sorry, the description you give: ordered list, immutable list, add
> only, all scream this is a log.
> If it were just a comment about the object we would need nothing more
> that the "description" attribute already available in ldap, and there
> would be no ordering nor immutability problem.
>   
View this as a history of the description attribute. Does that help? :-)

> I think you might be trying to conflate together the concepts of
> "description of the object" and "log of changes", I think this is a very
> wrong approach.
>
>   
I think that this is a value add and something that is neede for the 
project to be successful.
Otherwise I would not have suggested that. You know I am against any 
unnecessary work myself.
I see that without this feature the adoption of the IPA will be slower 
since the feature allows tying
the data to the formal processes established in the company.

>>  I think that if down 
>> the road we create a mean to store ordered lists in DS people would take 
>> advantage of that.
>> I disagree that it is hard to extract. It is the same as any other MV 
>> attribute except that there is some header inside that prefixes the data 
>> that need to be skipped.
>> There are so many LDAP attributes that have special internal formats 
>> that I read about in different RFC I really do not see my approach being 
>> against any main stream ways of doing things.
>> There can be a helper library written for an easier adoption later. This 
>> is really not an issue at all imo.
>>     
>
> The issue here is why we want to waste a lot of time and effort to
> implement just a per entry log that is useful only when some auditing
> need to be performed. Seriously I think the effort is not worth the
> value from what I have seen so far.
>   
I would agree that if we had a robust audit server capable of doing real 
time searches now I would have explored the log approach and would have 
considered trying to hook into it.
But it is not there and it would take quite a while to be there so log 
approach is IMO a non starter.

>   
>>> - The time it will take us to build all the necessary machinery around
>>> managing such attributes (I see you even envision a plugin :-O ) would
>>> be considerable, and would probably be much better spent on more
>>> critical features at this stage imo. (Piping this data in a db from
>>> python will take no more than a day or two, building schema, plugin, and
>>> all the testing will require weeks).
>>>   
>>>       
>> Based on the experience with Nalin and my reading of DS documentation 
>> and reading DS plugin code it seems a pretty straightforward task.
>> It is powers of magnitude simpler than any external database.
>>     
>
> I think you really are greatly underestimating the complexity of writing
> a DS plugin, 
I have seen Nalin do it.

> the complexity of dealing with changing LDAP semantics. 
No changes to LDAP semantics. No changes to protocol

> And
> the development cycle involved with all the bugs custom code would
> entail. 
All the code we write is a custom code. This is not an argument :-)

> A plugin also always adds security concerns as it runs as a
> privileged process wrt DS data. 

Yes let us remove all the plugins. They are potentially insecure! Let us 
not publish the APIs so that noone can create one...
This argument sounds really funny from you :-)

> It add stability concerns as it runs
> inside a threaded application (one segfault and goodbye LDAP server).
>
>   
It is as stable as any other plugin. And it is much simpler than the NIS 
plugin Nalin put together. No caching or memory pools.
I do not see any complexity to worry about. May be I am wrong but unless 
I dive deeper I do not see a problem you are talking about.
Just follow the rules of plugin development and do the right thing. 
There is no need to to get or update multiple entries - just couple 
attributes in one and the same entry - what a big deal?

> For a database all you need is a schema and a few SQL queries to
> insert/extract data. you don't have ordering problems there, you can
> split data and metadata and have own tables. Writing an small database
> schema with a couple of tables is honestly orders of magnitude simpler
> than writing a plugin in C that has to do what you would like it to do.
>
>   
Really? And then you need to write installation scripts, dump and load 
utilities, access control rights, backup and restore, replication and 
other utilities. Even if the DS provider has them you now suggenly have 
to deal with all this and manager and document and test... This is far 
more than you think. I know what that means to use an embedded DB. The 
whole server I worked with for 10 years had an embedded DB. Creating 
schema and getting data is small part of the puzzle other utilities are 
the main burden.

>> I think that it is worth a try. If I see that it takes much more time 
>> than I think I might defer this till later.
>> But a decision needs to be made pretty soon. This is why I am bringing 
>> it up.
>>     
>
> So far none of the arguments given convince me the huge effort required
> make sense. If I'd have to vote now, I'd say no.
>
>   
I want to hear other opinions.

>> Mike Langlie is building the UI screens prototype and we (he and I) need 
>> to understand whether this feature will be a part of the UI or not so 
>> that we can prepare properly for usability testing we are planning to 
>> conduct.
>>     
>
> Maybe you can explain how the UI would use this information, that may
> shed some more light on what is the appropriate way to manage these
> user-generated-logs.
>
>   

See example above.

> Simo.
>
>