[Freeipa-users] replication again :-(
thierry bordaz
tbordaz at redhat.com
Thu May 21 12:20:32 UTC 2015
On 05/21/2015 01:36 PM, Janelle wrote:
> On 5/20/15 7:53 AM, Mark Reynolds wrote:
>>
>>
>> On 05/20/2015 10:17 AM, thierry bordaz wrote:
>>> On 05/20/2015 03:46 PM, Janelle wrote:
>>>> On 5/20/15 6:01 AM, thierry bordaz wrote:
>>>>> On 05/20/2015 02:57 AM, Janelle wrote:
>>>>>> On 5/19/15 12:04 AM, thierry bordaz wrote:
>>>>>>> On 05/19/2015 03:42 AM, Janelle wrote:
>>>>>>>> On 5/18/15 6:23 PM, Janelle wrote:
>>>>>>>>> Once again, replication/sync has been lost. I really wish the
>>>>>>>>> product was more stable, it is so much potential and yet.
>>>>>>>>>
>>>>>>>>> Servers running for 6 days no issues. No new accounts or
>>>>>>>>> changes (maybe a few users changing passwords) and again, 5
>>>>>>>>> out of 16 servers are no longer in sync.
>>>>>>>>>
>>>>>>>>> I can test it easily by adding an account and then waiting a
>>>>>>>>> few minutes, then run "ipa user-show --all username" on all
>>>>>>>>> the servers, and only a few of them have the account. I have
>>>>>>>>> now waited 15 minutes, still no luck.
>>>>>>>>>
>>>>>>>>> Oh well.. I guess I will go look at alternatives. I had such
>>>>>>>>> high hopes for this tool. Thanks so much everyone for all your
>>>>>>>>> help in trying to get things stable, but for whatever reason,
>>>>>>>>> there is a random loss of sync among the servers and obviously
>>>>>>>>> this is not acceptable.
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>> ~J
>>>>>>>>
>>>>>>
>>>>>> All the replicas are happy again. I found these again:
>>>>>>
>>>>>> unable to decode {replica 16} 55356472000300100000
>>>>>> 55356472000300100000
>>>>>> unable to decode {replica 23} 5553e3a3000000170000
>>>>>> 55543240000300170000
>>>>>> unable to decode {replica 24} 554d53d3000000180000
>>>>>> 554d54a4000200180000
>>>>>>
>>>>>> What I also found to be interesting is that I have not deleted
>>>>>> any masters at all, so this was quite perplexing where the
>>>>>> orphaned entries came from. However I did find 3 of the replicas
>>>>>> did not show complete RUV lists... While most of the replicas had
>>>>>> a list of all 16 servers, a couple of them listed only 4 or 5.
>>>>>> (using ipa-replica-manage list-ruv)
>>>>> I don't know about the orphaned entries. Did you get entries below
>>>>> deleted parents ?
>>>>>
>>>>> AFAIK all replicas are master and so have an entry {replica <rid>}
>>>>> in the RUV. We should expect all servers having the same number of
>>>>> RUVelements (16, 4 or 5). The servers with 4 or 5 may be isolated
>>>>> so that they did not received updates from those with 16 RUVelements.
>>>>> would you copy/paste an example of RUV with 16 and with 4-5 ?
>>>>
>>>> Now, the steps to clear this were:
>>>>
>>>> Removed the "unable to decode" with the direct ldapmodify's. This
>>>> worked across all replicas, which was nice and did not have to be
>>>> repeated in each one. In other words, entered on a single server,
>>>> and it was removed on all.
>>> Hello,
>>>
>>> Did you do direct ldapmodify onto the RUV entry
>>> (nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,SUFFIX) , clean RUV ?
>> Thierry,
>>
>> Janelle just manually added a cleanallruv task (that I had
>> recommended the other week).
>>
>> Mark
>>>
>>> dc1-ipa1 and dc1-ipa2 are missing some RUVelement. If you do an
>>> update on dc3-ipa1, is it replicated to dc1-ipa[12] ?
>>>
>>> Also there are duplicated RID (9, 25) for dc1-ipa2.example.com:389.
>>> You may see some messages like 'attrlist_replace' in some error logs.
>>> 25 seems to be the new RID.
>>>
>>> thanks
>>> thierry
>>>
>>>>
>>>> re-initialized --from=good server on the ones with the short list.
>>>>
>>>> Waited 5 minutes to let everything settle, then started running
>>>> tests of adds/deletes which seemed to be just fine.
>>>>
>>>> Here are 2 of the DCs
>>>>
>>>> -------------------------------------
>>>> Node dc1-ipa1
>>>> -------------------------------------
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa4.example.com 389 4
>>>> -------------------------------------
>>>> Node dc1-ipa2
>>>> -------------------------------------
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa2.example.com 389 25
>>>> dc1-ipa3.example.com 389 8
>>>> dc1-ipa4.example.com 389 4
>>>> -------------------------------------
>>>> Node dc1-ipa3
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389 14
>>>> dc3-ipa2.example.com 389 13
>>>> dc3-ipa3.example.com 389 12
>>>> dc3-ipa4.example.com 389 11
>>>> dc2-ipa1.example.com 389 7
>>>> dc2-ipa2.example.com 389 6
>>>> dc2-ipa3.example.com 389 5
>>>> dc2-ipa4.example.com 389 3
>>>> dc4-ipa1.example.com 389 18
>>>> dc4-ipa2.example.com 389 19
>>>> dc4-ipa3.example.com 389 20
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa2.example.com 389 25
>>>> dc1-ipa2.example.com 389 9
>>>> dc1-ipa3.example.com 389 8
>>>> dc1-ipa4.example.com 389 4
>>>> unable to decode {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode {replica 24} 554d53d3000000180000
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389 26
>>>> dc5-ipa2.example.com 389 15
>>>> dc5-ipa3.example.com 389 17
>>>> -------------------------------------
>>>> Node dc1-ipa4
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389 14
>>>> dc3-ipa2.example.com 389 13
>>>> dc3-ipa3.example.com 389 12
>>>> dc3-ipa4.example.com 389 11
>>>> dc2-ipa1.example.com 389 7
>>>> dc2-ipa2.example.com 389 6
>>>> dc2-ipa3.example.com 389 5
>>>> dc2-ipa4.example.com 389 3
>>>> dc4-ipa1.example.com 389 18
>>>> dc4-ipa2.example.com 389 19
>>>> dc4-ipa3.example.com 389 20
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa2.example.com 389 25
>>>> dc1-ipa2.example.com 389 9
>>>> dc1-ipa3.example.com 389 8
>>>> dc1-ipa4.example.com 389 4
>>>> unable to decode {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode {replica 24} 554d53d3000000180000
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389 26
>>>> dc5-ipa2.example.com 389 15
>>>> dc5-ipa3.example.com 389 17
>>>> -------------------------------------
>>>> Node dc2-ipa1
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389 14
>>>> dc3-ipa2.example.com 389 13
>>>> dc3-ipa3.example.com 389 12
>>>> dc3-ipa4.example.com 389 11
>>>> dc2-ipa1.example.com 389 7
>>>> dc2-ipa2.example.com 389 6
>>>> dc2-ipa3.example.com 389 5
>>>> dc2-ipa4.example.com 389 3
>>>> dc4-ipa1.example.com 389 18
>>>> dc4-ipa2.example.com 389 19
>>>> dc4-ipa3.example.com 389 20
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa2.example.com 389 25
>>>> dc1-ipa2.example.com 389 9
>>>> dc1-ipa3.example.com 389 8
>>>> dc1-ipa4.example.com 389 4
>>>> unable to decode {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode {replica 23} 5553e3a3000000170000
>>>> 55543240000300170000
>>>> unable to decode {replica 24} 554d53d3000000180000
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389 26
>>>> dc5-ipa2.example.com 389 15
>>>> dc5-ipa3.example.com 389 17
>>>> -------------------------------------
>>>> Node dc2-ipa2
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389 14
>>>> dc3-ipa2.example.com 389 13
>>>> dc3-ipa3.example.com 389 12
>>>> dc3-ipa4.example.com 389 11
>>>> dc2-ipa1.example.com 389 7
>>>> dc2-ipa2.example.com 389 6
>>>> dc2-ipa3.example.com 389 5
>>>> dc2-ipa4.example.com 389 3
>>>> dc4-ipa1.example.com 389 18
>>>> dc4-ipa2.example.com 389 19
>>>> dc4-ipa3.example.com 389 20
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa2.example.com 389 25
>>>> dc1-ipa2.example.com 389 9
>>>> dc1-ipa3.example.com 389 8
>>>> dc1-ipa4.example.com 389 4
>>>> unable to decode {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode {replica 24} 554d53d3000000180000
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389 26
>>>> dc5-ipa2.example.com 389 15
>>>> dc5-ipa3.example.com 389 17
>>>> -------------------------------------
>>>> Node dc2-ipa3
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389 14
>>>> dc3-ipa2.example.com 389 13
>>>> dc3-ipa3.example.com 389 12
>>>> dc3-ipa4.example.com 389 11
>>>> dc2-ipa1.example.com 389 7
>>>> dc2-ipa2.example.com 389 6
>>>> dc2-ipa3.example.com 389 5
>>>> dc2-ipa4.example.com 389 3
>>>> dc4-ipa1.example.com 389 18
>>>> dc4-ipa2.example.com 389 19
>>>> dc4-ipa3.example.com 389 20
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa2.example.com 389 25
>>>> dc1-ipa2.example.com 389 9
>>>> dc1-ipa3.example.com 389 8
>>>> dc1-ipa4.example.com 389 4
>>>> unable to decode {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode {replica 24} 554d53d3000000180000
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389 26
>>>> dc5-ipa2.example.com 389 15
>>>> dc5-ipa3.example.com 389 17
>>>> -------------------------------------
>>>> Node dc2-ipa4
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389 14
>>>> dc3-ipa2.example.com 389 13
>>>> dc3-ipa3.example.com 389 12
>>>> dc3-ipa4.example.com 389 11
>>>> dc2-ipa1.example.com 389 7
>>>> dc2-ipa2.example.com 389 6
>>>> dc2-ipa3.example.com 389 5
>>>> dc2-ipa4.example.com 389 3
>>>> dc4-ipa1.example.com 389 18
>>>> dc4-ipa2.example.com 389 19
>>>> dc4-ipa3.example.com 389 20
>>>> dc4-ipa4.example.com 389 21
>>>> dc1-ipa1.example.com 389 10
>>>> dc1-ipa2.example.com 389 25
>>>> dc1-ipa2.example.com 389 9
>>>> dc1-ipa3.example.com 389 8
>>>> dc1-ipa4.example.com 389 4
>>>> unable to decode {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode {replica 24} 554d53d3000000180000
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389 26
>>>> dc5-ipa2.example.com 389 15
>>>> dc5-ipa3.example.com 389 17
>>>>
>>>>
>>>> Happy Wednesday
>>>> ~Janelle
>>>
>>>
>>>
>>
>
> And just like that - for no reason, they all reappeared:
>
> unable to decode {replica 16} 55356472000300100000 55356472000300100000
> unable to decode {replica 23} 5545d61f000200170000 5552f718000300170000
> unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
>
> :-(
> ~J
>
Hello Janelle,
Those 3 RIDs were already present in Node dc2-ipa1, correct ? They
reappeared on others nodes as well ?
May be ds2-ipa1 established a replication session with its peers and
send those RIDs.
Could you track in all the access logs, when the op
csn=5552f718000300170000 was applied.
Note that the two hexa values of replica 23 changed
(5545d61f000200170000 5552f718000300170000 vs 5553e3a3000000170000
55543240000300170000). Have you recreated a replica 23 ?.
Do you have replication logging enabled ?
thanks
thierry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150521/38f76873/attachment.htm>
More information about the Freeipa-users
mailing list