[Freeipa-users] replication again :-(

thierry bordaz tbordaz at redhat.com
Thu May 21 12:20:32 UTC 2015


On 05/21/2015 01:36 PM, Janelle wrote:
> On 5/20/15 7:53 AM, Mark Reynolds wrote:
>>
>>
>> On 05/20/2015 10:17 AM, thierry bordaz wrote:
>>> On 05/20/2015 03:46 PM, Janelle wrote:
>>>> On 5/20/15 6:01 AM, thierry bordaz wrote:
>>>>> On 05/20/2015 02:57 AM, Janelle wrote:
>>>>>> On 5/19/15 12:04 AM, thierry bordaz wrote:
>>>>>>> On 05/19/2015 03:42 AM, Janelle wrote:
>>>>>>>> On 5/18/15 6:23 PM, Janelle wrote:
>>>>>>>>> Once again, replication/sync has been lost. I really wish the 
>>>>>>>>> product was more stable, it is so much potential and yet.
>>>>>>>>>
>>>>>>>>> Servers running for 6 days no issues. No new accounts or 
>>>>>>>>> changes (maybe a few users changing passwords) and again, 5 
>>>>>>>>> out of 16 servers are no longer in sync.
>>>>>>>>>
>>>>>>>>> I can test it easily by adding an account and then waiting a 
>>>>>>>>> few minutes, then run "ipa  user-show --all username" on all 
>>>>>>>>> the servers, and only a few of them have the account.  I have 
>>>>>>>>> now waited 15 minutes, still no luck.
>>>>>>>>>
>>>>>>>>> Oh well.. I guess I will go look at alternatives. I had such 
>>>>>>>>> high hopes for this tool. Thanks so much everyone for all your 
>>>>>>>>> help in trying to get things stable, but for whatever reason, 
>>>>>>>>> there is a random loss of sync among the servers and obviously 
>>>>>>>>> this is not acceptable.
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>> ~J
>>>>>>>>
>>>>>>
>>>>>> All the replicas are happy again. I found these again:
>>>>>>
>>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>>> 55356472000300100000
>>>>>> unable to decode  {replica 23} 5553e3a3000000170000 
>>>>>> 55543240000300170000
>>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>>> 554d54a4000200180000
>>>>>>
>>>>>> What I also found to be interesting is that I have not deleted 
>>>>>> any masters at all, so this was quite perplexing where the 
>>>>>> orphaned entries came from.  However I did find 3 of the replicas 
>>>>>> did not show complete RUV lists... While most of the replicas had 
>>>>>> a list of all 16 servers, a couple of them listed only 4 or 5. 
>>>>>> (using ipa-replica-manage list-ruv)
>>>>> I don't know about the orphaned entries. Did you get entries below 
>>>>> deleted parents ?
>>>>>
>>>>> AFAIK all replicas are master and so have an entry {replica <rid>} 
>>>>> in the RUV. We should expect all servers having the same number of 
>>>>> RUVelements (16, 4 or 5). The servers with 4 or 5 may be isolated 
>>>>> so that they did not received updates from those with 16 RUVelements.
>>>>> would you copy/paste an example of RUV with 16 and with 4-5 ?
>>>>
>>>> Now, the steps to clear this were:
>>>>
>>>> Removed the "unable to decode" with the direct ldapmodify's. This 
>>>> worked across all replicas, which was nice and did not have to be 
>>>> repeated in each one. In other words, entered on a single server, 
>>>> and it was removed on all.
>>> Hello,
>>>
>>> Did you do direct ldapmodify onto the RUV entry 
>>> (nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,SUFFIX) , clean RUV ?
>> Thierry,
>>
>> Janelle just manually added a cleanallruv task (that I had 
>> recommended the other week).
>>
>> Mark
>>>
>>> dc1-ipa1 and dc1-ipa2 are missing some RUVelement. If you do an 
>>> update on dc3-ipa1, is it replicated to dc1-ipa[12] ?
>>>
>>> Also there are duplicated RID (9, 25) for dc1-ipa2.example.com:389. 
>>> You may see some messages like 'attrlist_replace' in some error logs.
>>> 25 seems to be the new RID.
>>>
>>> thanks
>>> thierry
>>>
>>>>
>>>> re-initialized --from=good server on the ones with the short list.
>>>>
>>>> Waited 5 minutes to let everything settle, then started running 
>>>> tests of adds/deletes which seemed to be just fine.
>>>>
>>>> Here are 2 of the DCs
>>>>
>>>> -------------------------------------
>>>> Node dc1-ipa1
>>>> -------------------------------------
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa4.example.com 389  4
>>>> -------------------------------------
>>>> Node dc1-ipa2
>>>> -------------------------------------
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa2.example.com 389  25
>>>> dc1-ipa3.example.com 389  8
>>>> dc1-ipa4.example.com 389  4
>>>> -------------------------------------
>>>> Node dc1-ipa3
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389  14
>>>> dc3-ipa2.example.com 389  13
>>>> dc3-ipa3.example.com 389  12
>>>> dc3-ipa4.example.com 389  11
>>>> dc2-ipa1.example.com 389  7
>>>> dc2-ipa2.example.com 389  6
>>>> dc2-ipa3.example.com 389  5
>>>> dc2-ipa4.example.com 389  3
>>>> dc4-ipa1.example.com 389  18
>>>> dc4-ipa2.example.com 389  19
>>>> dc4-ipa3.example.com 389  20
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa2.example.com 389  25
>>>> dc1-ipa2.example.com 389  9
>>>> dc1-ipa3.example.com 389  8
>>>> dc1-ipa4.example.com 389  4
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389  26
>>>> dc5-ipa2.example.com 389  15
>>>> dc5-ipa3.example.com 389  17
>>>> -------------------------------------
>>>> Node dc1-ipa4
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389  14
>>>> dc3-ipa2.example.com 389  13
>>>> dc3-ipa3.example.com 389  12
>>>> dc3-ipa4.example.com 389  11
>>>> dc2-ipa1.example.com 389  7
>>>> dc2-ipa2.example.com 389  6
>>>> dc2-ipa3.example.com 389  5
>>>> dc2-ipa4.example.com 389  3
>>>> dc4-ipa1.example.com 389  18
>>>> dc4-ipa2.example.com 389  19
>>>> dc4-ipa3.example.com 389  20
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa2.example.com 389  25
>>>> dc1-ipa2.example.com 389  9
>>>> dc1-ipa3.example.com 389  8
>>>> dc1-ipa4.example.com 389  4
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389  26
>>>> dc5-ipa2.example.com 389  15
>>>> dc5-ipa3.example.com 389  17
>>>> -------------------------------------
>>>> Node dc2-ipa1
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389  14
>>>> dc3-ipa2.example.com 389  13
>>>> dc3-ipa3.example.com 389  12
>>>> dc3-ipa4.example.com 389  11
>>>> dc2-ipa1.example.com 389  7
>>>> dc2-ipa2.example.com 389  6
>>>> dc2-ipa3.example.com 389  5
>>>> dc2-ipa4.example.com 389  3
>>>> dc4-ipa1.example.com 389  18
>>>> dc4-ipa2.example.com 389  19
>>>> dc4-ipa3.example.com 389  20
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa2.example.com 389  25
>>>> dc1-ipa2.example.com 389  9
>>>> dc1-ipa3.example.com 389  8
>>>> dc1-ipa4.example.com 389  4
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 23} 5553e3a3000000170000 
>>>> 55543240000300170000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389  26
>>>> dc5-ipa2.example.com 389  15
>>>> dc5-ipa3.example.com 389  17
>>>> -------------------------------------
>>>> Node dc2-ipa2
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389  14
>>>> dc3-ipa2.example.com 389  13
>>>> dc3-ipa3.example.com 389  12
>>>> dc3-ipa4.example.com 389  11
>>>> dc2-ipa1.example.com 389  7
>>>> dc2-ipa2.example.com 389  6
>>>> dc2-ipa3.example.com 389  5
>>>> dc2-ipa4.example.com 389  3
>>>> dc4-ipa1.example.com 389  18
>>>> dc4-ipa2.example.com 389  19
>>>> dc4-ipa3.example.com 389  20
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa2.example.com 389  25
>>>> dc1-ipa2.example.com 389  9
>>>> dc1-ipa3.example.com 389  8
>>>> dc1-ipa4.example.com 389  4
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389  26
>>>> dc5-ipa2.example.com 389  15
>>>> dc5-ipa3.example.com 389  17
>>>> -------------------------------------
>>>> Node dc2-ipa3
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389  14
>>>> dc3-ipa2.example.com 389  13
>>>> dc3-ipa3.example.com 389  12
>>>> dc3-ipa4.example.com 389  11
>>>> dc2-ipa1.example.com 389  7
>>>> dc2-ipa2.example.com 389  6
>>>> dc2-ipa3.example.com 389  5
>>>> dc2-ipa4.example.com 389  3
>>>> dc4-ipa1.example.com 389  18
>>>> dc4-ipa2.example.com 389  19
>>>> dc4-ipa3.example.com 389  20
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa2.example.com 389  25
>>>> dc1-ipa2.example.com 389  9
>>>> dc1-ipa3.example.com 389  8
>>>> dc1-ipa4.example.com 389  4
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389  26
>>>> dc5-ipa2.example.com 389  15
>>>> dc5-ipa3.example.com 389  17
>>>> -------------------------------------
>>>> Node dc2-ipa4
>>>> -------------------------------------
>>>> dc3-ipa1.example.com 389  14
>>>> dc3-ipa2.example.com 389  13
>>>> dc3-ipa3.example.com 389  12
>>>> dc3-ipa4.example.com 389  11
>>>> dc2-ipa1.example.com 389  7
>>>> dc2-ipa2.example.com 389  6
>>>> dc2-ipa3.example.com 389  5
>>>> dc2-ipa4.example.com 389  3
>>>> dc4-ipa1.example.com 389  18
>>>> dc4-ipa2.example.com 389  19
>>>> dc4-ipa3.example.com 389  20
>>>> dc4-ipa4.example.com 389  21
>>>> dc1-ipa1.example.com 389  10
>>>> dc1-ipa2.example.com 389  25
>>>> dc1-ipa2.example.com 389  9
>>>> dc1-ipa3.example.com 389  8
>>>> dc1-ipa4.example.com 389  4
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> dc5-ipa1.example.com 389  26
>>>> dc5-ipa2.example.com 389  15
>>>> dc5-ipa3.example.com 389  17
>>>>
>>>>
>>>> Happy Wednesday
>>>> ~Janelle
>>>
>>>
>>>
>>
>
> And just like that - for no reason, they all reappeared:
>
> unable to decode  {replica 16} 55356472000300100000 55356472000300100000
> unable to decode  {replica 23} 5545d61f000200170000 5552f718000300170000
> unable to decode  {replica 24} 554d53d3000000180000 554d54a4000200180000
>
> :-(
> ~J
>
Hello Janelle,

Those 3 RIDs were already present in Node dc2-ipa1, correct ? They 
reappeared on others nodes as well ?
May be ds2-ipa1 established a replication session with its peers and 
send those RIDs.
Could you track in all the access logs, when the op 
csn=5552f718000300170000 was applied.

Note that the two hexa values of replica 23 changed 
(5545d61f000200170000 5552f718000300170000 vs 5553e3a3000000170000 
55543240000300170000). Have you recreated a replica 23 ?.

Do you have replication logging enabled ?

thanks
thierry


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150521/38f76873/attachment.htm>


More information about the Freeipa-users mailing list