[Freeipa-users] Replication broken

thierry bordaz tbordaz at redhat.com
Tue Sep 27 16:38:58 UTC 2016


Hi Timothy,

When you say username do you mean 'User login' (uid) ?
I can create such entry (with heading '1') on 4.2 and after. I see no 
reason why the uid value ('1' is valid in uid syntax) would trigger that 
failure when adding an entry 'cn=changelog'.

I think something wrong happened to the retroCL counter and hoping to 
see what in the error log.
Is it possible you send me the error log between the time 
changenumber=112697 was created and when DSRetroclPlugin reports a failure ?


thanks
thierry

On 09/27/2016 05:28 PM, Timothy Geier wrote:
> On Tue, 2016-09-27 at 12:47 +0200, thierry bordaz wrote:
>> Hi Timothy,
>>
>> The changenumber counter is protected by a lock and we should not see
>> duplicate value.. except if there is a bug :-(
>>
>> Retrieving the time when changenumber=112697,cn=changelog was created
>> and the time when you saw the error, can you see any error in
>> operations (access log) or in the error log ?
>>
>> Or did you disabled/enable retorCL between those two times ?
>>
>> regards
>> thiery
> Unfortunately, the issue appears to be a certain username that starts
> with a '1'..in both cases, trying to delete this user caused (and is
> causing) the exact same issue.  Are there any known bugs relating to
> this?
>
>>
>>
>> On 09/27/2016 12:37 AM, Timothy Geier wrote:
>>
>>>> On Sep 26, 2016, at 4:07 PM, Timothy Geier <tgeier at accertify.com>
>>>> wrote:
>>>>
>>>>> On Sep 26, 2016, at 2:17 PM, Timothy Geier
>>>>> <tgeier at accertify.com> wrote:
>>>>>
>>>>> This issue started when trying to remove a user; ipa user-del
>>>>> showed “operation failed” and the user was not removed.  The
>>>>> same ipa user-del command was performed on a replica and
>>>>> completed successfully, but it was then immediately apparent
>>>>> that this change did not replicate anywhere else.  All of the
>>>>> replicas then were re-initalized using "ipa-replica-manage
>>>>> re-initialize” and now the LDAP trees/users are consistent
>>>>> though no further changes have been made.
>>>>>
>>>>> The slapd error logs are showing repeated instances of
>>>>>
>>>>> DSRetroclPlugin - replog: an error occured while adding change
>>>>> number 112697, dn = changenumber=112697,cn=changelog: Already
>>>>> exists.
>>>>> retrocl-plugin - retrocl_postob: operation failure [68]
>>>>>
>>>>> Package versions are
>>>>> ipa-server-4.2.0-15.0.1.el7.centos.6.1.x86_64
>>>>> and
>>>>> 389-ds-base-1.3.4.0-29.el7_2.x86_64
>>>>>
>>>>> ipa-replica-manage list-ruv
>>>>> ipa: WARNING: session memcached servers not running
>>>>> unable to decode: {replica 11} 56044ef50000000b0000
>>>>> 56044ef50000000b0000
>>>>> unable to decode: {replica 7} 561f17ba000800070000
>>>>> 561f17ba000800070000
>>>>> unable to decode: {replica 5} 561f17bc000300050000
>>>>> 561f17bc000300050000
>>>>> unable to decode: {replica 9} 561f17ba000a00090000
>>>>> 561f17ba000a00090000
>>>>> unable to decode: {replica 4} 561f17ba000300040000
>>>>> 561f17ba000300040000
>>>>> (These are likely leftovers from the previous incarnation of
>>>>> these servers on a RHEL6-like setup)
>>>>> ipa07:389: 16
>>>>> ipa02:389: 13
>>>>> ipa03:389: 14
>>>>> ipa01:389: 12
>>>>> ipa04:389: 15
>>>>> ipa05:389: 17
>>>>>
>>>>> Thanks much,
>>>> After not taking any action, this error has stopped but has been
>>>> replaced with
>>>>
>>>> [26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin -
>>>> agmt="cn=meToipa03" (ipa03:389): Missing data encountered
>>>> [26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin -
>>>> agmt="cn=meToipa03" (ipa03:389): Incremental update failed and
>>>> requires administrator action
>>>>
>>>> for all of the replicas and things are slightly out of sync
>>>> everywhere.
>>>>
>>>> Is the best course of action here to declare one a new master and
>>>> do a ipa-replica-manage re-initialize to all of the others from
>>>> that one?
>>>>
>>>>
>>>>
>>>
>>> After doing some testing, that’s exactly what we did and replication
>>> is now working again.  It is odd that the DSRetroclPlugin errors
>>> stopped on their own (after approximately 3 hours); the only action
>>> taken there was looking at the cn=changelog base using ldapvi to see
>>> what number it was on but that has to be a sheer coincidence;
>>> absolutely no changes were made.
>>>
>>>
>>> We’re also still unsure what caused this; our best theory at the
>>> moment is a race condition where everything that could have gone
>>> wrong at that exact moment did..is there any validity to this?
>>>
>>>
>>> Thanks,
>>> "This message and any attachments may contain confidential information. If you
>>> have received this  message in error, any use or distribution is prohibited.
>>> Please notify us by reply e-mail if you have mistakenly received this message,
>>> and immediately and permanently delete it and any attachments. Thank you."
>>>
>>>




More information about the Freeipa-users mailing list