[Freeipa-users] replication again :-(

Ludwig Krispenz lkrispen at redhat.com
Wed May 20 13:59:54 UTC 2015


On 05/20/2015 03:25 PM, Janelle wrote:
> On 5/20/15 12:54 AM, Ludwig Krispenz wrote:
>>
>> On 05/20/2015 02:57 AM, Janelle wrote:
>>> On 5/19/15 12:04 AM, thierry bordaz wrote:
>>>> On 05/19/2015 03:42 AM, Janelle wrote:
>>>>> On 5/18/15 6:23 PM, Janelle wrote:
>>>>>> Once again, replication/sync has been lost. I really wish the 
>>>>>> product was more stable, it is so much potential and yet.
>>>>>>
>>>>>> Servers running for 6 days no issues. No new accounts or changes 
>>>>>> (maybe a few users changing passwords) and again, 5 out of 16 
>>>>>> servers are no longer in sync.
>>>>>>
>>>>>> I can test it easily by adding an account and then waiting a few 
>>>>>> minutes, then run "ipa  user-show --all username" on all the 
>>>>>> servers, and only a few of them have the account.  I have now 
>>>>>> waited 15 minutes, still no luck.
>>>>>>
>>>>>> Oh well.. I guess I will go look at alternatives. I had such high 
>>>>>> hopes for this tool. Thanks so much everyone for all your help in 
>>>>>> trying to get things stable, but for whatever reason, there is a 
>>>>>> random loss of sync among the servers and obviously this is not 
>>>>>> acceptable.
>>>>>>
>>>>>> regards
>>>>>> ~J
>>>>>
>>>
>>> All the replicas are happy again. I found these again:
>>>
>>> unable to decode  {replica 16} 55356472000300100000 55356472000300100000
>>> unable to decode  {replica 23} 5553e3a3000000170000 55543240000300170000
>>> unable to decode  {replica 24} 554d53d3000000180000 554d54a4000200180000
>>>
>>> What I also found to be interesting is that I have not deleted any 
>>> masters at all, so this was quite perplexing where the orphaned 
>>> entries came from.  However I did find 3 of the replicas did not 
>>> show complete RUV lists... While most of the replicas had a list of 
>>> all 16 servers, a couple of them listed only 4 or 5. (using 
>>> ipa-replica-manage list-ruv)
>> so this happens "out of the blue" ? Did it happen at the same time, 
>> do you know when it started ? The maxcsns in the ruv are quite old: 
>> r16: apr,21, r23: may,14 r24: may,9 could it be that there was no 
>> change applied to these masters for that time ?
>>>
> Indeed yes, that is a correct statement. It seems to be incredibly 
> random.
> Ok, I give up - how are you finding the date in the strings? And 
> really, is May 14th that old?
55356472000300100000 is a CSN (ChangeSequenceNumber), it is built of

hextimestamp: 55356472
sequence number: 0003  (numbering of csns generated within the sceond of 
the time stamp
replica id: 0010 (==16) replica, where the change was received
subsequence number: 0000 used internally if a mod consists of several 
sub-mods

May. 14 is not old, but would mean that there was no change on that 
replica for a couple of days

>
> What is odd about the Apr 21st one, is that if you see my previous 
> emails, I had cleaned up all of this before, so for that to 
> "re-appear" is indeed a mystery.
>
> As of this morning, things remain clean. What will be funny, now that 
> I had extended logging enabled, they know we are on to them, so the 
> servers won't fail again. :-)
>
> ~J
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150520/29bf9604/attachment.htm>


More information about the Freeipa-users mailing list