[Freeipa-users] replication again :-(
Ludwig Krispenz
lkrispen at redhat.com
Wed May 20 13:59:54 UTC 2015
On 05/20/2015 03:25 PM, Janelle wrote:
> On 5/20/15 12:54 AM, Ludwig Krispenz wrote:
>>
>> On 05/20/2015 02:57 AM, Janelle wrote:
>>> On 5/19/15 12:04 AM, thierry bordaz wrote:
>>>> On 05/19/2015 03:42 AM, Janelle wrote:
>>>>> On 5/18/15 6:23 PM, Janelle wrote:
>>>>>> Once again, replication/sync has been lost. I really wish the
>>>>>> product was more stable, it is so much potential and yet.
>>>>>>
>>>>>> Servers running for 6 days no issues. No new accounts or changes
>>>>>> (maybe a few users changing passwords) and again, 5 out of 16
>>>>>> servers are no longer in sync.
>>>>>>
>>>>>> I can test it easily by adding an account and then waiting a few
>>>>>> minutes, then run "ipa user-show --all username" on all the
>>>>>> servers, and only a few of them have the account. I have now
>>>>>> waited 15 minutes, still no luck.
>>>>>>
>>>>>> Oh well.. I guess I will go look at alternatives. I had such high
>>>>>> hopes for this tool. Thanks so much everyone for all your help in
>>>>>> trying to get things stable, but for whatever reason, there is a
>>>>>> random loss of sync among the servers and obviously this is not
>>>>>> acceptable.
>>>>>>
>>>>>> regards
>>>>>> ~J
>>>>>
>>>
>>> All the replicas are happy again. I found these again:
>>>
>>> unable to decode {replica 16} 55356472000300100000 55356472000300100000
>>> unable to decode {replica 23} 5553e3a3000000170000 55543240000300170000
>>> unable to decode {replica 24} 554d53d3000000180000 554d54a4000200180000
>>>
>>> What I also found to be interesting is that I have not deleted any
>>> masters at all, so this was quite perplexing where the orphaned
>>> entries came from. However I did find 3 of the replicas did not
>>> show complete RUV lists... While most of the replicas had a list of
>>> all 16 servers, a couple of them listed only 4 or 5. (using
>>> ipa-replica-manage list-ruv)
>> so this happens "out of the blue" ? Did it happen at the same time,
>> do you know when it started ? The maxcsns in the ruv are quite old:
>> r16: apr,21, r23: may,14 r24: may,9 could it be that there was no
>> change applied to these masters for that time ?
>>>
> Indeed yes, that is a correct statement. It seems to be incredibly
> random.
> Ok, I give up - how are you finding the date in the strings? And
> really, is May 14th that old?
55356472000300100000 is a CSN (ChangeSequenceNumber), it is built of
hextimestamp: 55356472
sequence number: 0003 (numbering of csns generated within the sceond of
the time stamp
replica id: 0010 (==16) replica, where the change was received
subsequence number: 0000 used internally if a mod consists of several
sub-mods
May. 14 is not old, but would mean that there was no change on that
replica for a couple of days
>
> What is odd about the Apr 21st one, is that if you see my previous
> emails, I had cleaned up all of this before, so for that to
> "re-appear" is indeed a mystery.
>
> As of this morning, things remain clean. What will be funny, now that
> I had extended logging enabled, they know we are on to them, so the
> servers won't fail again. :-)
>
> ~J
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150520/29bf9604/attachment.htm>
More information about the Freeipa-users
mailing list