[Freeipa-users] replication again :-(

Janelle janellenicole80 at gmail.com
Thu May 21 12:23:52 UTC 2015


On 5/21/15 5:20 AM, thierry bordaz wrote:
> On 05/21/2015 01:36 PM, Janelle wrote:
>> On 5/20/15 7:53 AM, Mark Reynolds wrote:
>>>
>>>
>>> On 05/20/2015 10:17 AM, thierry bordaz wrote:
>>>> On 05/20/2015 03:46 PM, Janelle wrote:
>>>>> On 5/20/15 6:01 AM, thierry bordaz wrote:
>>>>>> On 05/20/2015 02:57 AM, Janelle wrote:
>>>>>>> On 5/19/15 12:04 AM, thierry bordaz wrote:
>>>>>>>> On 05/19/2015 03:42 AM, Janelle wrote:
>>>>>>>>> On 5/18/15 6:23 PM, Janelle wrote:
>>>>>>>>>> Once again, replication/sync has been lost. I really wish the 
>>>>>>>>>> product was more stable, it is so much potential and yet.
>>>>>>>>>>
>>>>>>>>>> Servers running for 6 days no issues. No new accounts or 
>>>>>>>>>> changes (maybe a few users changing passwords) and again, 5 
>>>>>>>>>> out of 16 servers are no longer in sync.
>>>>>>>>>>
>>>>>>>>>> I can test it easily by adding an account and then waiting a 
>>>>>>>>>> few minutes, then run "ipa user-show --all username" on all 
>>>>>>>>>> the servers, and only a few of them have the account.  I have 
>>>>>>>>>> now waited 15 minutes, still no luck.
>>>>>>>>>>
>>>>>>>>>> Oh well.. I guess I will go look at alternatives. I had such 
>>>>>>>>>> high hopes for this tool. Thanks so much everyone for all 
>>>>>>>>>> your help in trying to get things stable, but for whatever 
>>>>>>>>>> reason, there is a random loss of sync among the servers and 
>>>>>>>>>> obviously this is not acceptable.
>>>>>>>>>>
>>>>>>>>>> regards
>>>>>>>>>> ~J
>>>>>>>>>
>>>>>>>
>>>>>>> All the replicas are happy again. I found these again:
>>>>>>>
>>>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>>>> 55356472000300100000
>>>>>>> unable to decode  {replica 23} 5553e3a3000000170000 
>>>>>>> 55543240000300170000
>>>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>>>> 554d54a4000200180000
>>>>>>>
>>>>>>> What I also found to be interesting is that I have not deleted 
>>>>>>> any masters at all, so this was quite perplexing where the 
>>>>>>> orphaned entries came from. However I did find 3 of the replicas 
>>>>>>> did not show complete RUV lists... While most of the replicas 
>>>>>>> had a list of all 16 servers, a couple of them listed only 4 or 
>>>>>>> 5. (using ipa-replica-manage list-ruv)
>>>>>> I don't know about the orphaned entries. Did you get entries 
>>>>>> below deleted parents ?
>>>>>>
>>>>>> AFAIK all replicas are master and so have an entry {replica 
>>>>>> <rid>} in the RUV. We should expect all servers having the same 
>>>>>> number of RUVelements (16, 4 or 5). The servers with 4 or 5 may 
>>>>>> be isolated so that they did not received updates from those with 
>>>>>> 16 RUVelements.
>>>>>> would you copy/paste an example of RUV with 16 and with 4-5 ?
>>>>>
>>>>> Now, the steps to clear this were:
>>>>>
>>>>> Removed the "unable to decode" with the direct ldapmodify's. This 
>>>>> worked across all replicas, which was nice and did not have to be 
>>>>> repeated in each one. In other words, entered on a single server, 
>>>>> and it was removed on all.
>>>> Hello,
>>>>
>>>> Did you do direct ldapmodify onto the RUV entry 
>>>> (nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,SUFFIX) , clean RUV ?
>>> Thierry,
>>>
>>> Janelle just manually added a cleanallruv task (that I had 
>>> recommended the other week).
>>>
>>> Mark
>>>>
>>>> dc1-ipa1 and dc1-ipa2 are missing some RUVelement. If you do  an 
>>>> update on dc3-ipa1, is it replicated to dc1-ipa[12] ?
>>>>
>>>> Also there are duplicated RID (9, 25) for dc1-ipa2.example.com:389. 
>>>> You may see some messages like 'attrlist_replace' in some error logs.
>>>> 25 seems to be the new RID.
>>>>
>>>> thanks
>>>> thierry
>>>>
>>>>>
>>>>> re-initialized --from=good server on the ones with the short list.
>>>>>
>>>>> Waited 5 minutes to let everything settle, then started running 
>>>>> tests of adds/deletes which seemed to be just fine.
>>>>>
>>>>> Here are 2 of the DCs
>>>>>
>>>>> -------------------------------------
>>>>> Node dc1-ipa1
>>>>> -------------------------------------
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa4.example.com 389  4
>>>>> -------------------------------------
>>>>> Node dc1-ipa2
>>>>> -------------------------------------
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa2.example.com 389  25
>>>>> dc1-ipa3.example.com 389  8
>>>>> dc1-ipa4.example.com 389  4
>>>>> -------------------------------------
>>>>> Node dc1-ipa3
>>>>> -------------------------------------
>>>>> dc3-ipa1.example.com 389  14
>>>>> dc3-ipa2.example.com 389  13
>>>>> dc3-ipa3.example.com 389  12
>>>>> dc3-ipa4.example.com 389  11
>>>>> dc2-ipa1.example.com 389  7
>>>>> dc2-ipa2.example.com 389  6
>>>>> dc2-ipa3.example.com 389  5
>>>>> dc2-ipa4.example.com 389  3
>>>>> dc4-ipa1.example.com 389  18
>>>>> dc4-ipa2.example.com 389  19
>>>>> dc4-ipa3.example.com 389  20
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa2.example.com 389  25
>>>>> dc1-ipa2.example.com 389  9
>>>>> dc1-ipa3.example.com 389  8
>>>>> dc1-ipa4.example.com 389  4
>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>> 55356472000300100000
>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>> 554d54a4000200180000
>>>>> dc5-ipa1.example.com 389  26
>>>>> dc5-ipa2.example.com 389  15
>>>>> dc5-ipa3.example.com 389  17
>>>>> -------------------------------------
>>>>> Node dc1-ipa4
>>>>> -------------------------------------
>>>>> dc3-ipa1.example.com 389  14
>>>>> dc3-ipa2.example.com 389  13
>>>>> dc3-ipa3.example.com 389  12
>>>>> dc3-ipa4.example.com 389  11
>>>>> dc2-ipa1.example.com 389  7
>>>>> dc2-ipa2.example.com 389  6
>>>>> dc2-ipa3.example.com 389  5
>>>>> dc2-ipa4.example.com 389  3
>>>>> dc4-ipa1.example.com 389  18
>>>>> dc4-ipa2.example.com 389  19
>>>>> dc4-ipa3.example.com 389  20
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa2.example.com 389  25
>>>>> dc1-ipa2.example.com 389  9
>>>>> dc1-ipa3.example.com 389  8
>>>>> dc1-ipa4.example.com 389  4
>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>> 55356472000300100000
>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>> 554d54a4000200180000
>>>>> dc5-ipa1.example.com 389  26
>>>>> dc5-ipa2.example.com 389  15
>>>>> dc5-ipa3.example.com 389  17
>>>>> -------------------------------------
>>>>> Node dc2-ipa1
>>>>> -------------------------------------
>>>>> dc3-ipa1.example.com 389  14
>>>>> dc3-ipa2.example.com 389  13
>>>>> dc3-ipa3.example.com 389  12
>>>>> dc3-ipa4.example.com 389  11
>>>>> dc2-ipa1.example.com 389  7
>>>>> dc2-ipa2.example.com 389  6
>>>>> dc2-ipa3.example.com 389  5
>>>>> dc2-ipa4.example.com 389  3
>>>>> dc4-ipa1.example.com 389  18
>>>>> dc4-ipa2.example.com 389  19
>>>>> dc4-ipa3.example.com 389  20
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa2.example.com 389  25
>>>>> dc1-ipa2.example.com 389  9
>>>>> dc1-ipa3.example.com 389  8
>>>>> dc1-ipa4.example.com 389  4
>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>> 55356472000300100000
>>>>> unable to decode  {replica 23} 5553e3a3000000170000 
>>>>> 55543240000300170000
>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>> 554d54a4000200180000
>>>>> dc5-ipa1.example.com 389  26
>>>>> dc5-ipa2.example.com 389  15
>>>>> dc5-ipa3.example.com 389  17
>>>>> -------------------------------------
>>>>> Node dc2-ipa2
>>>>> -------------------------------------
>>>>> dc3-ipa1.example.com 389  14
>>>>> dc3-ipa2.example.com 389  13
>>>>> dc3-ipa3.example.com 389  12
>>>>> dc3-ipa4.example.com 389  11
>>>>> dc2-ipa1.example.com 389  7
>>>>> dc2-ipa2.example.com 389  6
>>>>> dc2-ipa3.example.com 389  5
>>>>> dc2-ipa4.example.com 389  3
>>>>> dc4-ipa1.example.com 389  18
>>>>> dc4-ipa2.example.com 389  19
>>>>> dc4-ipa3.example.com 389  20
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa2.example.com 389  25
>>>>> dc1-ipa2.example.com 389  9
>>>>> dc1-ipa3.example.com 389  8
>>>>> dc1-ipa4.example.com 389  4
>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>> 55356472000300100000
>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>> 554d54a4000200180000
>>>>> dc5-ipa1.example.com 389  26
>>>>> dc5-ipa2.example.com 389  15
>>>>> dc5-ipa3.example.com 389  17
>>>>> -------------------------------------
>>>>> Node dc2-ipa3
>>>>> -------------------------------------
>>>>> dc3-ipa1.example.com 389  14
>>>>> dc3-ipa2.example.com 389  13
>>>>> dc3-ipa3.example.com 389  12
>>>>> dc3-ipa4.example.com 389  11
>>>>> dc2-ipa1.example.com 389  7
>>>>> dc2-ipa2.example.com 389  6
>>>>> dc2-ipa3.example.com 389  5
>>>>> dc2-ipa4.example.com 389  3
>>>>> dc4-ipa1.example.com 389  18
>>>>> dc4-ipa2.example.com 389  19
>>>>> dc4-ipa3.example.com 389  20
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa2.example.com 389  25
>>>>> dc1-ipa2.example.com 389  9
>>>>> dc1-ipa3.example.com 389  8
>>>>> dc1-ipa4.example.com 389  4
>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>> 55356472000300100000
>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>> 554d54a4000200180000
>>>>> dc5-ipa1.example.com 389  26
>>>>> dc5-ipa2.example.com 389  15
>>>>> dc5-ipa3.example.com 389  17
>>>>> -------------------------------------
>>>>> Node dc2-ipa4
>>>>> -------------------------------------
>>>>> dc3-ipa1.example.com 389  14
>>>>> dc3-ipa2.example.com 389  13
>>>>> dc3-ipa3.example.com 389  12
>>>>> dc3-ipa4.example.com 389  11
>>>>> dc2-ipa1.example.com 389  7
>>>>> dc2-ipa2.example.com 389  6
>>>>> dc2-ipa3.example.com 389  5
>>>>> dc2-ipa4.example.com 389  3
>>>>> dc4-ipa1.example.com 389  18
>>>>> dc4-ipa2.example.com 389  19
>>>>> dc4-ipa3.example.com 389  20
>>>>> dc4-ipa4.example.com 389  21
>>>>> dc1-ipa1.example.com 389  10
>>>>> dc1-ipa2.example.com 389  25
>>>>> dc1-ipa2.example.com 389  9
>>>>> dc1-ipa3.example.com 389  8
>>>>> dc1-ipa4.example.com 389  4
>>>>> unable to decode  {replica 16} 55356472000300100000 
>>>>> 55356472000300100000
>>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>>> 554d54a4000200180000
>>>>> dc5-ipa1.example.com 389  26
>>>>> dc5-ipa2.example.com 389  15
>>>>> dc5-ipa3.example.com 389  17
>>>>>
>>>>>
>>>>> Happy Wednesday
>>>>> ~Janelle
>>>>
>>>>
>>>>
>>>
>>
>> And just like that - for no reason, they all reappeared:
>>
>> unable to decode  {replica 16} 55356472000300100000 55356472000300100000
>> unable to decode  {replica 23} 5545d61f000200170000 5552f718000300170000
>> unable to decode  {replica 24} 554d53d3000000180000 554d54a4000200180000
>>
>> :-(
>> ~J
>>
> Hello Janelle,
>
> Those 3 RIDs were already present in Node dc2-ipa1, correct ? They 
> reappeared on others nodes as well ?
> May be ds2-ipa1 established a replication session with its peers and 
> send those RIDs.
> Could you track in all the access logs, when the op 
> csn=5552f718000300170000 was applied.
>
> Note that the two hexa values of replica 23 changed 
> (5545d61f000200170000 5552f718000300170000 vs 5553e3a3000000170000 
> 55543240000300170000). Have you recreated a replica 23 ?.
>
> Do you have replication logging enabled ?
>
> thanks
> thierry
>
>
As I mentioned in the email I just sent and to be clear - NOTHING 
changed in the environment. No new replicas. No changes in the servers 
at all other than some simple add and deletes of users. This just 
happens randomly. In the process of trying to clean them to get back 
into production, as it is causing issues, and I need production to run. 
Back later once I am running again.

~Janelle
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150521/fe62f65e/attachment.htm>


More information about the Freeipa-users mailing list