[Freeipa-devel] topology management question

Simo Sorce simo at redhat.com
Thu Dec 11 14:20:24 UTC 2014


On Thu, 11 Dec 2014 14:18:36 +0100
Ludwig Krispenz <lkrispen at redhat.com> wrote:

> 
> On 12/05/2014 04:50 PM, Simo Sorce wrote:
> > On Thu, 04 Dec 2014 14:33:09 +0100
> > Ludwig Krispenz <lkrispen at redhat.com> wrote:
> >
> >> hi,
> >>
> >> I just have another (hopefully this will end soon) issue I want to
> >> get your input. (please read to teh end first)
> >>
> >> To recapture the conditions:
> >> -  the topology plugin manages the connections between servers as
> >> segments in the shared tree
> >> - it is authoritative for managed servers, eg it controls all
> >> connections between servers listed under cn=masters,
> >>     it is permissive for connection to other servers
> >> - it rejects any removal of a segment, which would disconnect the
> >> topology.
> >> - a change in topology can be applied to any server in the
> >> topology, it will reach the respective servers and the plugin will
> >> act upon it
> >>
> >> Now there is a special case, causing a bit of trouble. If a replica
> >> is to be removed from the topology, this means that
> >> the replication agreements from and to this replica should be
> >> removed, the server should be removed from the manages servers.
> >> The problem is that:
> >> - if you remove the server first, the server becomes unmanaged and
> >> removal of the segment will not trigger a removal of the
> >> replication agreement
> > Can you explain what you mean "if you remove the server first"
> > exactly ? What LDAP operation will be performed, by the management
> > tools ?
> as far as the plugin is concerned a removal of a replica triggers two 
> operations:
> - removal of the host from the sservers in cn=masters, so the server
> is no longer considered as managed
> - removal of the segment(s) connecting the to be removed replica to 
> other still amnaged servers, which should remove the corresponding 
> replication agreements.
> It was the order of these two operations I was talking

We can define a correct order, the plugin can refuse to do any other
order for direct operations (we need to be careful not to refuse
replication operations I think).

> >
> >> - if you remove the segments first, one segment will be the last
> >> one connecting this replica to the topology and removal will be
> >> rejected
> > We should never remove the segments first indeed.
> if we can fully control that only specific management tools can be
> used, we can define the order, but an admin could apply individual
> operations and still it would be good if nothing breaks

I think we had a plan to return UNWILLING_TO_PERFORM if the admin tries
to remove the last segment first. So we would have no problem really,
the admin can try and fail. If he wants to remove a master he'll have
to remove it from the masters group, and this will trigger the removal
of all segments.

> >> Now, with some effort this can be resolved, eg
> >> if the server is removed, keep it internally as removed server and
> >> for segments connecting this server trigger removal of replication
> >> agreements or mark a the last segment, when tried to remove, as
> >> pending and once the server is removed also remove the
> >> corresponding repl agreements
> > Why should we "keep it internally" ?
> > If you mark the agreements as managed by setting an attribute on
> > them, then you will never have any issue recognizing a "managed"
> > agreement in cn=config, and you will also immediately find out it
> > is "old" as it is not backed by a segment so you will safely remove
> > it.
> I didn't want to add new flags/fields to the replication agreements
> as long as anything can be handled by the data in the shared tree.

We have too. I think it is a must or we will find numerous corner cases.
Is there a specific reason why you do not want to add flags to
replication agreements in cn=config ?

> "internally" was probably misleading, but I will think about it again

Ok, it is important we both understand what issues we see with any of
the possible approaches so we can agree on the best one.

> > Segments (and their agreements) should be removed as trigger on the
> > master entry getting removed. This should be done even if it causes
> > a split brain, because if the server is removed, no matter how much
> > we wish to keep tropology integrity we effectively are in a split
> > brain situation, keeping toplogy agreements alive w/o the server
> > entry doesn't help.
> If we can agree on that, that presence/removal of masters is the
> primary trigger that's fine.

Yes I think we can definitely agree that this is the primary trigger
for server removal/addition.

> I was thinking of situations where a server was removed, 
> but not uninstalled.

Understood, but even then it makes no real difference, once the server
is removed from the group of masters it will not be able to replicate
outbound anymore as the other master's ACIs will not recognize this
server credentials as valid replicator creds.

> Just taking it out of the topology, but it could still be reached

It can be reached, and that may be a problem for clients. But in the
long term this should be true only for clients manually configured to
reach that server. Clients that use SRV records would see it drop off,
and switch to another one.

We may consider whether we want some automatism that causes the server
to shut itself down if it can't replicate (or receives replication data
to the effect it realizes it is out of the topology). But this may be a
little too drastic.

> >> But there is a problem, which I think is much harder and I am not
> >> sure how much effort I should put in resolving it.
> >> If we want to have the replication agreements cleaned up after
> >> removal of a replica without direct modification of cn=config, we
> >> need to follow the path above,
> >> but this also means that the last change needs to reach both the
> >> removed replica (R) and the last server(S) it is connected to.
> > It would be nice if the changed reached the replica, indeed, but
> > not a big deal if it doesn't, if you are removing the replica it
> > means you are decommissioning it, so it is not really that
> > important that it receives updates, it will be destroyed shortly.
> That's what I was not sure about, couldn't there be cases where it is 
> not destroyed, just isolated.

Why would you isolate a server ? Is there a legitimate case an admin
would want to do that ?

> > And if it is not destroyed for whatever reason, it will be removed
> > from the masters group anyway so it will have no permission to
> > replicate back, and no harm is done to the overall domain.
> >
> >> The bad thing is that if this change triggers a
> >> removal of the replication agreement on S it could be that the
> >> change is not replicated to R before the agreement is removed and
> >> is lost. There is no way (or no easy) way to know for teh plugin
> >> if a change was received by an other server,
> > There is an easy way, contact the other server and see if the change
> > happened in its LDAP tree :)
> > BNut this is not really necessary, as explained above.
> >
> >> I was also thinking about some kind
> >> of acknowledge mechanism by doing a ping pong of changes, but the
> >> problem always is the same that one server does not know if the
> >> other has received it.
> >> And even if this would theoretically work, we cannot be sure that R
> >> is not shutdown and only the remaining topology is tried to be
> >> cleaned up, so S would wait forever.
> > We should not care, if you are deleting a replica it doesn't matter
> > what's on the replica side IMO.
> >
> >> My suggestion to resolve this (in most cases) is to define a wait
> >> interval, after the final combination of removal of a server and
> >> its connecting segment is received, wait for some time and then
> >> remove the corresponding replication agreements.
> > Why ?
> >
> >> So I'm asking you if this would be acceptable or if you have a
> >> better solution.
> > I am trying to understand why we have a problem, actually, I do not
> > really see one, why do you think it is important to update a replica
> > that is being killed ?
> because I had scenarios in mind where it would not be killed, just 
> removed from the topology

Ok, but I do not see what it would be a legitimate action to cause a
server to get out. But even if that happens the server won't be able to
replicate back to the domain until the admin takes the step of putting
the server back into the masters group (causing replication to be
restored both ways), so I see no harm.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York




More information about the Freeipa-devel mailing list