[Freeipa-devel] "Commit comments log" functionality in IPA

Thu Nov 6 21:16:21 UTC 2008

On Thu, 2008-11-06 at 13:47 -0500, Dmitri Pal wrote:
> Simo Sorce wrote:
> > On Thu, 2008-11-06 at 10:56 -0500, Dmitri Pal wrote:
> >   
> >> Simo Sorce wrote:
> >>     
> >
> >   
> >> Well, this is the main point. It would have been great if we had a 
> >> product that would be able to act as DS and DB at the same time.
> >> I agree that this information would easier be managed in a DB. But we do 
> >> not have one. We can create one but then we face all the same issues as 
> >> we face with policies:
> >> a) Replication
> >>     
> >
> > Why do we need to replicate a log database? It is not vital for the
> > normal functioning of the Identity solution so it can well be a database
> > or a file kept on a single server
> >
> >   
> You are continuing to view it from the wrong angle.

Err sorry lets assume each one has his own Point of view, ok?

> It is not a log database. It is not a log file. It is a property of the 
> entry. It should be stored close to the object it is related to.

It is a log related to an entry (or potentially multiple entries).

> It is not an audit data. You want this data to be viewable when you are 
> viewing or modifying the entry.

Really? I don't think that normally I would care much about the change
log. I do not look at the SCM log each time I modify something, I look
at it only when something does not add up and I want to know what
happened.

> Making the UI go and pull this data from the external database would be 
> an huge overhead.

No, because any sane admin will just pull this data when needed, it
makes no sense to me to show the log of changes each time I edit an
entry, it's not interesting when all goes well.

This kind of data is interesting to retrieve only when there is reason
to suspect something is wrong and you want to check what has been done.
In this case having to wait a second or 2 for the interface to retrieve
this information is really completely acceptable.

> This data should be available regardless of which master you are 
> connecting to.

Yes, but because it is not vital for the normal functionality of the
server, but only to "check", generally after the fact or before an
important modification, what is going on, this kind of information does
not have the same kind of requirements the rest of the data has.

Assuming you put this data in a database, even if this data is not
available for some hours, it is really not important.

> Across different data stores? It does not matter what you use as a 
> foreign key.  The whole notion of having two data stores and creating 
> referential integrity between the two make be cry.

Yes, but I think it is a better solution and would help shaping how the
audit database could work too.

> Been there, done that. It is extremely complex and nearly impossible to 
> make it right regardless whether it is a DB+DB or DB+DS or DS+File or 
> any other combination.

It's not like this is something new in our field, I've seen this done in
various products.

> It is always more complex than a synch task.

IT is not a synchronization task.

> >> c) Backup - restore
> >>     
> >
> > Not sure why this would make any difference. We do not offer any special
> > backup/restore feature with IPA so it would be just business as usual.
> >
> >   
> If you have more than one data stores you have to make sure that your 
> backup and restore are also synched otherwise the referential integrity 
> goes out of the window.

No, at most you may loose some of the logs, that would be regrettable,
but given the fact this information is not necessary for the server to
work it is not that important.
Besides, a "restore" in the IPA world is a "end of the world" situation,
it should really be a "disaster recovery" situation, therefore if the
log database is not perfectly inline (some more entries since the last
backup or lost some) it is really a very minor concern.
In any case given the nature of the log (entry-bound) at most you have a
problem of missing the very latest logs, or having logs for operations
that have been rolled back by the restore. In the first case you can
just live with the fact, you can also probably reconstruct most of it,
in the second it will A) actually help you understand what is left to do
to get back on par with the previous situation, then it can be wipe out
based on the date the DS backup was made.

> We do not have a problem in IPA because we do not have any storage that 
> should synched with other storage (except AD sycnh and you know how not 
> simple it is :-) ).

AD synch is a completely different kind of problem (quoting you, let's
not dive into apple to oranges comparisons).

> This is why I was strongly for storing policies in the DS itself. It 
> solves a lot of problems you never even want to think about.
> Same is here.

Policies are different, that's why, even if I do not like it too much, I
am ok with storing them in DS, they are a vital part of the information
we need to distribute, and we need to scale in the way we provide them.

These logs are neither vital, nor will break anything even if they
suddenly vanish.

> >> I think that creating a parallel infrastructure for policies or for 
> >> commit logs is  a huge overhead.
> >>     
> >
> > I think logs and policies are completely different in nature, these logs
> > are just audit trails in nature, and should be rather be developed as
> > part of the audit system.
> >   
> 
> They are not audit trails. They are comments that contain information. 
> Like this.
> For example on the SUDO policy object:
> 
> "DPal 11/05/08 12:15PM: Added new group "contractors" to the SUDOERS 
> policy according to the decision made by the security board on 10/28/08"
> "SSorce 11/17/08 2:47PM: Removed group "contractors" from the SUDOERS 
> policy to fix a security problem reported in ticket #2131209"
> 
> Is this audit? No. It looks like one but it is not. It has one important 
> piece of information: why? who authorized the change to the the critical 
> infrastructure.

It looks like a log, smells like a log ...

> If we treat it as pure audit log record we will loose its relevance.

Why ? What matters is what information we store, not how we store it.

>  It 
> will go into the audit server and it would be possible to pull it from 
> there on demand but not in the context of modification of the entry in 
> DS.

I really fail to see why this is relevant. The data itself is not
necessary for the modification you are going to make.

>  Making the UI do this search against the audit server would be an 
> overhead and there is no guarantee that this data is not archived or 
> cleaned from audit DB.

It would be an overhead only if you fetch it every time you search the
object, I really do no think that is necessary, I even think it is
really not what admins want. But even if that is what some admin want,
we can use caches to solve performances problems, let's not try to
optimize first, let's first use the right mechanisms and optimize later.

> When it is in DS and close to the object we can guarantee that it is 
> complete, relevant and always accessible when the entry is viewed and 
> administrator is about to make a critical change to the system.

The change maybe critical, but the log is certainly not. Some place may
"require" the admin to fill a change log, but yet in itself it is not at
all critical for the functioning of the server.

> I think that since we a re a security product any changes to the 
> policies that define access rules should have a comment facility like this.

It is a "nice to have", but not a requirement, other popular solutions
in this area do not have it and they still thrive.

> Parsing is not a problem. Manging is the problem. Access control, 
> replication, backup etc.
> I do not want to duplicate all the arguments again.

Sorry but these are some of the core arguments, you give some of them
for granted while I do not so maybe we should come to an agreement about
them first, or we will just keep disagreeing on the consequences because
we have different premises.

Access control: as you said there is no special access control needed,
it should be just read and append only. This is something quite simple
to achieve, it is what every single log facility I know implements.

Replication: again replication is important when data availability is
important for the normal operation of a server. This log data is
important for someone that needs to check the history of some changes,
but not for the availability of the identity or policy service, so even
a single point of failure server is probably good enough in most cases.

> > NO, it will just be softened, you still replicate around data useless
> > 99% of the time.
> >
> >   
> 
> No I checked. According to DS gurus the operation is replicated not the 
> data so it is not an issue.

What I meant to say is that you are still sending this operation to all
masters and replicas, and you are sending the full set of data when you
create a new replica.

> Check with Nathan and Rich if you do not believe me. This was one of the 
> first things I checked because i was concerned about this too.

I've read the replication code in DS and studied ldap replication for
some time myself, I know how it works, you misunderstood what I wrote.

> >>  IMO they should be tightly related to the object they refer 
> >> to and not be somewhere else in the log server. 
> >
> > Why?  
> 
> Because when something goes wrong it should be there. The policies are 
> too critical to the entire enterprise.
> It is much easier to  find what was going on right there with that kind 
> of comments .

It depends on what is going wrong, but unless you put your database on
the most unreliable server you could find this data will be there.
This data is not critical for the identity or policy distribution
services, so I really fail to see how the log database availability
would really impact any *important* operation.

> Have you ever done this? I did. It is not that simple.

It's not that complex either *for this specific tasks in the way you
have described it*.

> The main reason to have it is to have an answer to "why somone did what 
> he did". And it should be there at hand so that when next guy comes in 
> and tries to clean the mess he knows why the previous change was made 
> and who authorized it. Then re can turn to audit system and dig more but 
> the recorded comment will give him a very good starting point.

Yes, but this information is *not* critical for the service.
Some people may see it as critical on an information management level,
but it is not at the data layer level, and from the point of view of the
importance of the information can equally be stored somewhere else.

> This is all about authority and responsibility.

It's a nice to have yes, but not critical, it can be safely stored in
another data storage.

Another storage also allows you to separate privileges, so that if one
storage is compromised the other is not necessarily compromised as well,
and this is usually more important for critical logs.

> >>>> Now to some of the reasons why I don't see DS as a viable option:
> >>>>
> >>>> - Multi-value attributes are not ordered, so you need to invent some
> >>>> scheme to store this data structured so that ordering can be preserved.
> >>>> Sure probably using the "posting" date before the content is all is
> >>>> needed, but that makes attributes not searcheable.
> >>>>
> >>>>     
> >>>>         
> >> I think that creating a generic plugin that would allow storing ordered 
> >> MV attributes would be a big benefit for everybody.
> >>     
> >
> > The problem is that ordered MV are not defined in the LDAP protocol, you
> > would need to come up with a standardizable way to manage how to add
> > entries so that they fall in the precise order you need them to. This
> > would mean modifying the add operation either with a control or by
> > creating an extended operation. 
> >
> >   
> I will see what RFCs exist for this case. Rich pointed out that OpenLDAP 
> has it.

It's a proposal, it is not an official RFC yet AFAIK.
Anyway Rich also pointed out that it may not be a trivial task to
implement it because the slapi API may not exposes enough hooks deep
into DS core for it.
I totally agree that it would be a nice feature to have in general,
would be even better if formalized in an official approved RFC though.

> I do not see what you are talking about. There is no change to the 
> protocol at all.

I am sorry I used "protocol" in a broad way in this sentence, meaning
that currently there is no official way to store ordered multi-value
attributes. At the very least it requires creating a convention like it
has been done in the OpenLDAP extension, but then you need to make sure
the client code you use is able to understand it and provide you
functions to use that convention. Current client libraries are built
with the knowledge that multivalue attributes order is not important so
some bindings may decide to reorder elements at will before returning
them to the caller.

> There are tons of RFCs that store complex structures in the attribute.
> I do not see how my solution is different from those.

To be honest I think that implementing ordering is not even necessary to
fulfill your goal, all you need is to store a date in the single
attribute and use it to reorder entries in the UI for display.

The only reason to build a plugin, I think, would be to assure
"append-only" behavior, if really required.
But given the fact you said this is not an audit trail I am not even
sure we should enforce that at the DS level, it could just be an XML-RPC
interface standard practice to just append and never delete old
comments.

> We can show last X comments in the UI. This is irrelevant. You want to 
> know how this whole thing ended up in the state it is in and why.
> Then you can come up with the effective remediation. You can look at 
> just last X comments or the whole stack is up to you.
> Whatever is needed for you as admin to make the right decision.

Except to cast blame I do not think that going back years in the
comments would be all that useful, but I don't have data to back this
impression so that is not important, any storage can be made to keep
comments indefinitely.

> These comments are useless. They do not answer "why". I did the change 
> because my manager authorized me or I responded to ticket  #XXXXXX or 
> security board authorized me  - this is what should be put into the 
> comment.

Are you going to enforce a format that grants information is stored this
way?
If not just face the fact that lot of people will not be so disciplinate
and will put in garbage like that.

> Not "foo needs access to x as admin". Such comment is the 
> duplication of the even and bears no value. The comment should contain 
> information that links the event (changing of the policy) to the formal 
> process that authorized this change. Hope this clarifies the purpose and 
> the difference.

Sure, but this is kind of stuff is a policy that will certainly differ
per environment. You have no control on what people will put in it, and
if you try to force a too strict control on the format you will probably
alienate all the ones that need something slightly different (or they
will just hack it in).

As far as I can see all you can record is:
- who did the change
- when the change was made
- which objects were affected
and finally
- a user comment that *hopefully* describes why

Did I say this looks like a log, smells.. :-)

> We can attach it to multiple different objects but the system 
> administrator - the "root" of the whole IPA deployment - will have a way 
> to say where this is mandatory, where optional, and where it should be 
> hidden. The UI and CLI will respect these settings.

Are you thinking of rejecting changes to an ldap object if this attribute is not added, ie enforcing comments at the ldap operation level ?
This proposal would require a host of new considerations to make, and it would seem to me completely disproportionate and perhaps even dangerous.

> We can store whatever we want in the DS. IMO the 2307 is a good
> example 
> of cluttering LDAP especially netgroups schema :-)

Yeah I have a special personal grief against some of the schemas we
inherited ... my latest one is the automount one I guess :-)

> I think we are playing much more nicely than this and other RFCs.

I think it depends on the point of view. I think it is not appropriate
to store this kind of data in LDAP itself. It should be implemented at a
higher level imo, we are already putting a lot of complexity in DS
lately, we need to be more moderate, too much change will certainly bite
back.

Even if we get to the conclusion that this is something we want to do, I
think it is way too much for v2, I would strongly suggest to postpone it
to later.

> > Reality is people will touch multiple objects when changing stuff
> > around, and transactions really do not matter in this case (see the
> > angry-admin example I posted in the previous email).
> >
> >   
> 
> This is why we have CLI tools and UI so that things can be done 
> properly. Messing with raw data is always dangerous if you do not know 
> what you are doing.

Uhm sorry but I do not understand the context of this comment.

> Anyone can read, the one who can edit the entry can add.
> The one who can delete the entry can delete all values together.
> Noone can modify.
> 
> I think plugin can easily enforce this logic.

> This is where plugin will reject the attempt to write.

Why would you need to enforce it at the DS level if this is not an audit
trail ?

> > I don't think so, if I correctly understood how you want to manage it.
> > And btw the way you seem to be willing to manage it once again resembles
> > an audit trail log, and that's I think because it ultimately is just an
> > audit trail log.
> >
> >   
> See the logic in the previous comment.

I am really sorry, but I think I still fail to fully see it.

> I am pretty sure the plugin can take care of that.  If not  I would 
> agree that this is not a good idea.
> Nathan? Rich?

I can tell you there is no concept of append-only attribute in LDAP.
But see above, why should we enforce this at the LDAP level ?

> The solution if possible is self contained and does not rely on any 
> external piece of functionality - audit server (that would not be up to 
> the task for quite some time).

Then lets delay it.

> Adjusting DS is much simpler (I think) than building a logic of logging 
> this into external store (file or DB) and then pulling it out when we 
> need to edit policy and see who was messing with it and why.

It might be simpler for a limited task, but it is the "wrong way" to do
it long term IMO, in fact no other solution does something like this,
they all rely on external storage for auditing purposes or external
applications if a company wants to implement a formal protocol.

> I think that this is a value add and something that is neede for the 
> project to be successful.

I seriously see it just as a marginal nice to have at this stage. No one
of the people I have been talking to ever raised the necessity to have
something like this implemented at a so low level. And we have a lot
more to do to have the basic foundations up running then caring about
something like this imo. 

> Otherwise I would not have suggested that. You know I am against any 
> unnecessary work myself.

I know, but I still think this is not something necessary at this stage
and implemented this way.

> I see that without this feature the adoption of the IPA will be slower 
> since the feature allows tying
> the data to the formal processes established in the company.

No other Identity management tool at our stage of development has it,
and none have it at the LDAP level, I think, with good reason.

> I would agree that if we had a robust audit server capable of doing real 
> time searches now I would have explored the log approach and would have 
> considered trying to hook into it.
> But it is not there and it would take quite a while to be there so log 
> approach is IMO a non starter.

Then we will wait until it becomes available and concentrate our efforts
on it. Short term quick hacks should not drive core changes in DS,
really I am quite opposed to this line of reasoning.

> > I think you really are greatly underestimating the complexity of writing
> > a DS plugin, 
> I have seen Nalin do it.

I have done it myself! More than once. And also seen some unfortunate
cases also quite recently where we had to go in and almost rewrite the
entire plugin because it came out fundamentally flawed on the first
implementation (dna, memberof).

> > the complexity of dealing with changing LDAP semantics. 
> No changes to LDAP semantics. No changes to protocol

Changing a multivalue attribute to be ordered is a change in semantics,
so far LDAP multivalued attributes are not ordered, and applications
rely/account/exploit/endure that.

> 
> > And
> > the development cycle involved with all the bugs custom code would
> > entail. 
> All the code we write is a custom code. This is not an argument :-)

Unfortunate choice of wording on my side, but adding code to DS
shouldn't be taken lightly. It is a critical piece of infrastructure,
every line of code we add is much more critical then any 100 lines of
python we add in the UI.

> > A plugin also always adds security concerns as it runs as a
> > privileged process wrt DS data. 
> 
> Yes let us remove all the plugins. They are potentially insecure! Let us 
> not publish the APIs so that noone can create one...
> This argument sounds really funny from you :-)

I don't find funny to be cautious and avoid unnecessary risk in a
critical piece of code. Code that run in DS and code that runs in the
interface have completely different ciriticality*. Therefore only
"necessary" plugins see my approval, "handful" ones don't.

* (is this an english word? :-).

> > It add stability concerns as it runs
> > inside a threaded application (one segfault and goodbye LDAP server).
> >
> >   
> It is as stable as any other plugin.
Until it fails it is stable as anything else, but right now it is
nothing it does not exists :)

> And it is much simpler than the NIS 
> plugin Nalin put together. No caching or memory pools.

The NIS plugin could not be easily be built as an external process.

> I do not see any complexity to worry about. May be I am wrong but unless 
> I dive deeper I do not see a problem you are talking about.

It's a matter of risk and long term support, the more the code the more
the risk and the support burden, amplified by the fact that this code
runs in a critical service.

> Just follow the rules of plugin development and do the right thing. 

That does not mitigate risk, we always try to make perfect code, but it
is clear that perfect code does not exist or we would have no bugs
reported ever.

> There is no need to to get or update multiple entries - just couple 
> attributes in one and the same entry - what a big deal?

A matter of perspective.

> > For a database all you need is a schema and a few SQL queries to
> > insert/extract data. you don't have ordering problems there, you can
> > split data and metadata and have own tables. Writing an small database
> > schema with a couple of tables is honestly orders of magnitude simpler
> > than writing a plugin in C that has to do what you would like it to do.
> >
> >   
> Really? And then you need to write installation scripts, dump and load 
> utilities, access control rights, backup and restore, replication and 
> other utilities.

Every decent database provide most of this already.

> Even if the DS provider has them you now suggenly have 
> to deal with all this and manager and document and test... This is far 
> more than you think. I know what that means to use an embedded DB. The 
> whole server I worked with for 10 years had an embedded DB. Creating 
> schema and getting data is small part of the puzzle other utilities are 
> the main burden.

I am not proposing an embedded DB, it would make no sense. I would agree
with you if I were to propose something like that, but I was thinking of
just a basic mysql, postgresql or something like that, nothing scary or 
complex.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York