[Freeipa-devel] "Commit comments log" functionality in IPA

Thu Nov 6 17:13:21 UTC 2008

On Thu, 2008-11-06 at 10:56 -0500, Dmitri Pal wrote:
> Simo Sorce wrote:

> Well, this is the main point. It would have been great if we had a 
> product that would be able to act as DS and DB at the same time.
> I agree that this information would easier be managed in a DB. But we do 
> not have one. We can create one but then we face all the same issues as 
> we face with policies:
> a) Replication

Why do we need to replicate a log database? It is not vital for the
normal functioning of the Identity solution so it can well be a database
or a file kept on a single server

> b) Referential integrity across different data stores

Use object GUIDs, that's all is needed, seriously, it is very simple to
keep referential integrity when GUIDs are used.

> c) Backup - restore

Not sure why this would make any difference. We do not offer any special
backup/restore feature with IPA so it would be just business as usual.

> With policies the decision was made to store them in DS not to face 
> these problems.

Policies will be constantly queried by clients, that's why you need to
replicate them on multiple servers to be able to scale up when the
number of clients grow. A log does not need that.

> I think that creating a parallel infrastructure for policies or for 
> commit logs is  a huge overhead.

I think logs and policies are completely different in nature, these logs
are just audit trails in nature, and should be rather be developed as
part of the audit system.

> >> Can you provide what are the reasons that would make a good idea to
> >> store this kind of data in DS instead of a log file or a log database ?
> >>
> >>     
> Log file is not manageable. It is hard to pull and query.

About the same complexity of pulling an query a strctured attribute in
LDAP. To be honesyt for most people a file is *much* more easier to pull
and parse than an ldap attribute.

>  It is good as 
> a stream of data but when you need to analyze it you use other tools.
> Log database is different. It is for different purpose. Log database is 
> for analysis. It would be hard to relate the history or a DS entry to 
> the entry itself and present them in proper way in UI.

I don't think so, as long as you store the GUID of the entry it refers
to as part of the metadata of the comment/log entry it is quite easy,
just a single ldap search (or vice versa, just a single "grep")

> It would be much more complex than you think.

I honestly think you greatly over-estimate the complexity of parsing a
log file in a modern scripting language like python, or even just old
stuff like grep, cut or awk.

> >> I honestly see no reason, nothing but a human operator would make any
> >> use of such attributes, they are not useful to any machine which is the
> >> real consumer of DS data.
> >>     
> 
> I completely disagree with trhis statement. DS is a data store for all 
> data that needs to be stored unless it does not belong there.

The point is that comments or log do not belong there, they just don't.

> The "commit log"  data is on the edge but there is unfortunately no 
> better place to put it.

A log file or a log database, it's auditing related we will have
auditing facilities, use them.

> >> As you pointed on it would add an unnecessary amount of data to
> >> replicate around.
> >>
> >>     
> If we use MV attributes the replication problem will be solved.

NO, it will just be softened, you still replicate around data useless
99% of the time.

> >> Also because DS is not a logging server but a directory server there are
> >> many other problems in trying to use it to store logs.
> >>
> >>
> >>     
> This is not a log. I think this is a fundamental difference. It is a set 
> of comments.

Sorry but I see no difference between a set of "comments" or a log, a
log is just a set of "comments" about what is going on in some generic
"process".

>  IMO they should be tightly related to the object they refer 
> to and not be somewhere else in the log server.

Why?

> The log will contain who did what when. But it never answers the 
> question "why".

If you put a comment in a log it can also answer why, I really do not
see your point here.

> The "commit comment" feature is designed for that use case.
> May be the word selection confused you but it is not a "log".

No, the nature of a comment is the same as that of a log, it is just
that the "comment" is user generated instead of automatically generated
but it really bears no relevant semantic difference.

> Think about comments in bugzilla bug. Bugzilla bug is not a log and 
> never perceived as such. "commit comment" feature should not be too.

Bugzilla is used to have "conversations" about topics. The "commit
comment" is instead just the same as a commit message in an SCM system.
Not surprisingly to see the list of commit comments in cvs, svn,git,
etc.. the command is "log" (cvs log, svn log, git log ....)

They are just logs.

> I agree but I do not see a better alternative. Implementing an external 
> DB synchronized with DS is much more complex than you would think.

I beg to strongly disagree. All you need is to use GUIDs on objects,
that is your "join" key. It makes thing extremely simple to manage.

> >> If comments are required for audit trails I think they should just go in
> >> the auditing system and marked as special "comment" logs. Otherwise they
> >> should probably simply go in a normal log file or a relational log
> >> database (the latter in case online searches are required).
> >>     
> 
> As I said they are not a log and this approach will make the feature 
> useless.

Evidently there is a problem in understanding what this feature is
useful for.

>From your premises about why customers need it, it is useful as an audit
trail log to know why an object is in that state. Is there any other use
to it ?

> >> Now to some of the reasons why I don't see DS as a viable option:
> >>
> >> - Multi-value attributes are not ordered, so you need to invent some
> >> scheme to store this data structured so that ordering can be preserved.
> >> Sure probably using the "posting" date before the content is all is
> >> needed, but that makes attributes not searcheable.
> >>
> >>     
> 
> I think that creating a generic plugin that would allow storing ordered 
> MV attributes would be a big benefit for everybody.

The problem is that ordered MV are not defined in the LDAP protocol, you
would need to come up with a standardizable way to manage how to add
entries so that they fall in the precise order you need them to. This
would mean modifying the add operation either with a control or by
creating an extended operation. 

> We can use the "commit comment" use case to create one. I would think 
> that DS folks would find such plugin pretty valuable.
> I am actually surprised to the fact that one does not exist yet. It 
> would have solved a lot of different issues and paved a way for even 
> broader adoption of the DS.

This is *way* more work than you think it is. And the reason why nobody
did it and no clear standard has still been proposed.

> I would start with implementing "commit comment" feature as a step 
> towards a generic ordered MV value plugin solution.

This alone would require considerable time.

> This would actually mean turning a MV attribute into sort of mini table 
> with records associated with an entry in this table.
> I think it is a very cool feature.

Yes, but would mean modification of the LDAP protocol, you do not want
to do that outside of a standardization body, or it will most probably
be just a dead end.

> >> - You would have to create a clean up process that removes old stuff, I
> >> don't think that keeping around a hundred entries log for years would
> >> make sense.
> >>     
> I am not sure it should be done day one. But I will think more about 
> cleanup.
> I view this as a second tear functionality on top of the original feature.
> 
> >> - We would need to index yet another attribute if you want to make
> >> searches on it, note also that just consulting the log would require
> >> searches on the identity store increasing its load, something a log
> >> file/database would avoid completely.
> >>
> >>     
> 
> The whole value of the feature is to have the whole list. There is no 
> need to index it since the search by this attribute does not make sense.
> This is not a valid concern.

Can you explain what this list is used for actually? And why someone
should be interested in a comment written 5 years earlier and that is
completely outdated as the data in the object has no bearing with that
original comment anymore?

> >> - If you invent a complex format you loose the capability to do decent
> >> filtering on searches, meaning you will often need to do wide scope
> >> searches and implement filtering in the UI (slooow, and loads DS)
> >>
> >>     
> Again as I said: there is no need to filter by this attribute. You will 
> pull this MV attribute if you need it. Searching inside the values 
> pulled out is up to the application not to DS.

This just tells that storing it in DS is really not required.
If you need it you can pull it from a DB or a file equally well.

> >> - You have no relation of events (ldap not beeing a relational database
> >> makes it particularly difficult indeed).
> >>
> >>     
> 
> There is no relation to the event. There is a relation the object itself 
> since the attribute that will contain the commit comments will be a part 
> of the same entry.

A comment is related to a "change" in an object, therefore it is
logically related to an event (the change), insomuch that you require
ordering of the entries. Now assume I have a set of events and related
comments, like the following:

"add foo to sudoers file"	"foo need access to x as admin"
"add bar to sudoers file"	"bar need access to x as admin"
"remove foo from sudoers file"	"foo need no more access as admin"
"add baz to sudoers file"	"baz need access as admin"
"remove baz from sudoers file"	"baz need no more access as admin"
...

Now who needs the full list ?
After 30-40 changes why do you care about something like lines 4 and 5
except for auditing purposes ? And if you do not have a date associated
what is it useful for ?

> There is no need to have any kind of cross references if it is done this 
> way.

If there are no cross references like the date the event/change happened
or who made the operation it would be quite useless (see above example).

> >> - If a single UI command changes many different objects, where do you
> >> store the comment? In one? All of them?
> >> If in one how do you relate changes to others ?
> >> If you replicate it an all objects how do you deal with access to all
> >> entries? (see below)
> >> See also scenario above about angry admins if you require a comment for
> >> each object being changed.
> >>
> >>     
> The "commit comment" feature by nature makes sense in the context of the 
> top level object as i mentioned above.
> If change happens to several entries at once the designer of the schema 
> and UI should decide what would be the best approach and what entry the 
> "commit comment" should be applied to.

This is an operational policy, can't be decided at the schema level, it
completely depends on what kind of event the specific security
administrator wants in the specific deployment.

> If we do it as auxiliary class we have a flexibility to use it in 
> multiple places.

SCNR but I read this like: we can clutter the DS as much as we want :)

> I really do not see a problem here. You just pick the main entry you are 
> dealing with. If the UI touches multiple entries of different kind in 
> one step it is a subject for deep though and reevaluation.

It will, I am trying to evaluate this *before* we waste a lot of time in
something that seem wrong to me on way too many levels to easily express
them all. for some aspects this is just a feeling, but so far, the
further we dig into it the more the feeling grows.

> Such operations especially with DS where transactions are not supported 
> should be avoided.

Reality is people will touch multiple objects when changing stuff
around, and transactions really do not matter in this case (see the
angry-admin example I posted in the previous email).

> >> - Comments may contain sensitive information that should not be leaked,
> >> so comments should not me generally available for search on ldap.
> >> This would require to add (on the fly?) ACIs on objects that get
> >> comments.
> >>
> >>     
> The ACI defines the access control rules for the MV attribute. I doubt 
> that thew comments would contain sensitive information.

I don't think ACIs can express something like "append-only".

> This is a matter of administrative policies not software.

Sorry I do not get what this means. We are talking about access
authorization to an attribute that seem to me to be framed in a way the
software can't cope with, I think software matters.

> The first step 
> is at least to treat it as a MV attribute with ordering that can't be 
> modified or deleted.

If you can write it you can modify it.

> Later as a second tear feature we can start thinking about more 
> discretionary read access control. Again I do not see a big issue here 
> for the first implementation.
> 
> >> - Some ACI may allow a lower level admin to perform an operation on some
> >> attributes, but not add objectclasses or new attributes.
> >> We loose the comments in this case ?
> >>     
> No. I will dig into that but the plan is to have consistent ACI  rules.
> I hope that DS specialists will chime in and confirm that the ACI has 
> enough flexibility to deal with the  object I am suggesting.

I don't think so, if I correctly understood how you want to manage it.
And btw the way you seem to be willing to manage it once again resembles
an audit trail log, and that's I think because it ultimately is just an
audit trail log.

> >> - Anyone with write access to the attribute will be able to change the
> >> contents, making them generally completely useless as audit trails.
> >> Delegation of any minor task would require write access to comments all
> >> over the place.
> >>     
> >
> >   
> No. The whole idea is to make it non-editable at all. Only add.

Exactly this is something that does not exist in the LDAP model, nor in
the ACI model we have.

> Only 
> later we might start diving into ACIs and deal with the complexity of 
> editing the data by admins that have different levels of privileges.

No, architectural problems must be evaluated first in this case, because
you are trying to construct something that is so off the way LDAP works
for something that seem so irrelevant for the general functioning that I
want to understand exactly why it is so important, Because from this
reading I just re-evaluated the time need to "adjust" DS to handle this
single attribute in term of several weeks, and that's a lot of effort,
just to keep around some log that can instead very easily be piped in a
file or in a database with an effort that will take a lot less time.

> > Forgot another important few:
> >
> > - It would make extremely difficult for people to extract this
> > information. Instead of connecting to a well known relational database
> > with well known tools used for reporting, they would have to build a
> > custom parser that speaks LDAP. This thing alone would be a deadly one
> > imo.
> >
> >   
> 
> This is not a log. This is a comment on the entry.

I am sorry, the description you give: ordered list, immutable list, add
only, all scream this is a log.
If it were just a comment about the object we would need nothing more
that the "description" attribute already available in ldap, and there
would be no ordering nor immutability problem.

I think you might be trying to conflate together the concepts of
"description of the object" and "log of changes", I think this is a very
wrong approach.

>  I think that if down 
> the road we create a mean to store ordered lists in DS people would take 
> advantage of that.
> I disagree that it is hard to extract. It is the same as any other MV 
> attribute except that there is some header inside that prefixes the data 
> that need to be skipped.
> There are so many LDAP attributes that have special internal formats 
> that I read about in different RFC I really do not see my approach being 
> against any main stream ways of doing things.
> There can be a helper library written for an easier adoption later. This 
> is really not an issue at all imo.

The issue here is why we want to waste a lot of time and effort to
implement just a per entry log that is useful only when some auditing
need to be performed. Seriously I think the effort is not worth the
value from what I have seen so far.

> > - The time it will take us to build all the necessary machinery around
> > managing such attributes (I see you even envision a plugin :-O ) would
> > be considerable, and would probably be much better spent on more
> > critical features at this stage imo. (Piping this data in a db from
> > python will take no more than a day or two, building schema, plugin, and
> > all the testing will require weeks).
> >   
> Based on the experience with Nalin and my reading of DS documentation 
> and reading DS plugin code it seems a pretty straightforward task.
> It is powers of magnitude simpler than any external database.

I think you really are greatly underestimating the complexity of writing
a DS plugin, the complexity of dealing with changing LDAP semantics. And
the development cycle involved with all the bugs custom code would
entail. A plugin also always adds security concerns as it runs as a
privileged process wrt DS data. It add stability concerns as it runs
inside a threaded application (one segfault and goodbye LDAP server).

For a database all you need is a schema and a few SQL queries to
insert/extract data. you don't have ordering problems there, you can
split data and metadata and have own tables. Writing an small database
schema with a couple of tables is honestly orders of magnitude simpler
than writing a plugin in C that has to do what you would like it to do.

> I think that it is worth a try. If I see that it takes much more time 
> than I think I might defer this till later.
> But a decision needs to be made pretty soon. This is why I am bringing 
> it up.

So far none of the arguments given convince me the huge effort required
make sense. If I'd have to vote now, I'd say no.

> Mike Langlie is building the UI screens prototype and we (he and I) need 
> to understand whether this feature will be a part of the UI or not so 
> that we can prepare properly for usability testing we are planning to 
> conduct.

Maybe you can explain how the UI would use this information, that may
shed some more light on what is the appropriate way to manage these
user-generated-logs.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York