[Freeipa-users] replication again :-(
Mark Reynolds
mareynol at redhat.com
Thu May 21 17:10:38 UTC 2015
On 05/21/2015 09:59 AM, Janelle wrote:
> On 5/21/15 6:46 AM, Ludwig Krispenz wrote:
>>
>> On 05/21/2015 03:28 PM, Janelle wrote:
>>> I think I found the problem.
>>>
>>> There was a lone replica running in another DC. It was installed as
>>> a replica some time ago with all the others. Think of this -- the
>>> original config had 5 servers, one of them was this server. Then the
>>> other 4 servers were RE-BUILT from scratch, so all the replication
>>> agreements were changed AND - this is the important part - the 5th
>>> server was never added back in. BUT - the 5th server was left
>>> running and never told it that it was not a member anymore. It still
>>> thought it had a replication agreement with original "server 1", but
>>> server 1 knew otherwise.
>>>
>>> Now, although the first 4 servers were rebuilt, the same domain,
>>> realm, AND passwords were used.
>>>
>>> I am guessing that somehow, this 5th server keeps trying to
>>> interject its info into the ring of 4 servers, kind of forcing its
>>> way in. Somehow, because the original credentials still work (but
>>> certs are all different) is leaving the first 4 servers with a
>>> "can't decode" issue.
>>>
>>> There should be some security checks so this can't happen. It should
>>> also be easy to replicate.
>>>
>>> Now I have to go re-initialize all the servers from a good server,
>>> so everyone is happy again. The "problem" server has been shutdown
>>> completely. (and yes, there were actually 3 of them in my scenario -
>>> I just used 1 to simplify my example - but that explains the 3 CSNs
>>> that just kept "appearing")
>>>
>>> What concerns me most about this - were the servers outside of the
>>> "good ring" somehow able to inject data into replication which might
>>> have been causing bad data??? This is bad if it is true.
>> it depends a bit on what you mean by rebuilt from scratch.
>> A replication session needs to meet three conditions to be able to
>> send data:
>> - the supplier side needs to be able to authenticate and the
>> authenticated users has to be in the list of binddns of the replica
>> - the data generation of supplier and consumer side need to be the
>> same (they all have to have the same common origin)
>> - the supplier needs to have the changes (CSNs) to be able to
>> position in its changelog to send updates
>>
>> now if you have 5 servers, forget about one of them and do not change
>> the credentials in the others and do not reinitialize the database by
>> an ldif import to generate a new database generation, the fifth
>> server will still be able to connect and eventually send updates -
>> how should the other servers know that this one is no longer a "good"
>> one
>>>
>>> ~Janelle
>>>
>>
> The only problem left now - is no matter what, this last entry will
> NOT go away and now I have 2 "stuck" cleanruvs that will not "abort"
> either.
>
> unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
>
> CLEANALLRUV tasks
> RID 24 None
> No abort CLEANALLRUV tasks running
> =====================================
>
> ldapmodify -D "cn=directory manager" -W -a
>
> dn: cn=abort 24, cn=abort cleanallruv, cn=tasks, cn=config
> objectclass: extensibleObject
> replica-base-dn: dc=example,dc=com
> cn: abort 24
> replica-id: 24
> replica-certify-all: no
> adding new entry *" cn=abort 24, cn=abort cleanallruv, cn=tasks,
> cn=config" *
> ldap_add: No such object (32)
There should not be a white space at the beginning: *" cn=abort 24,
cn=abort cleanallruv, cn=tasks, cn=config" **
*
When I run the abort task I don't have that extra white space, and the
task is successfully added:
[root at localhost ~]# ldapmodify -D cn=dm -w password -a
dn: cn=abort 24, cn=abort cleanallruv, cn=tasks, cn=config
objectclass: extensibleObject
replica-base-dn: dc=example,dc=com
cn: abort 24
replica-id: 24
replica-certify-all: no
adding new entry *"cn=abort 24, cn=abort cleanallruv, cn=tasks, cn=config"*
The extra white space is the probable cause of the error 32 (no such
object) you were seeing. You can verify this by looking at the access
log (/var/log/dirsrv/slapd-INSTANCE/access)
Like I said before you could also check the errors log for the reason
why the cleanAllRUV task is not completing as well.
Regards,
Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150521/803e45c6/attachment.htm>
More information about the Freeipa-users
mailing list