[Freeipa-users] Replication broken

Timothy Geier tgeier at accertify.com
Tue Sep 27 15:28:32 UTC 2016


On Tue, 2016-09-27 at 12:47 +0200, thierry bordaz wrote:
> Hi Timothy,
> 
> The changenumber counter is protected by a lock and we should not see
> duplicate value.. except if there is a bug :-(
> 
> Retrieving the time when changenumber=112697,cn=changelog was created
> and the time when you saw the error, can you see any error in
> operations (access log) or in the error log ?
> 
> Or did you disabled/enable retorCL between those two times ?
> 
> regards
> thiery

Unfortunately, the issue appears to be a certain username that starts
with a '1'..in both cases, trying to delete this user caused (and is
causing) the exact same issue.  Are there any known bugs relating to
this?

> 
> 
> 
> On 09/27/2016 12:37 AM, Timothy Geier wrote:
> 
> > 
> > > On Sep 26, 2016, at 4:07 PM, Timothy Geier <tgeier at accertify.com>
> > > wrote:
> > > 
> > > > 
> > > > On Sep 26, 2016, at 2:17 PM, Timothy Geier
> > > > <tgeier at accertify.com> wrote:
> > > > 
> > > > This issue started when trying to remove a user; ipa user-del
> > > > showed “operation failed” and the user was not removed.  The
> > > > same ipa user-del command was performed on a replica and
> > > > completed successfully, but it was then immediately apparent
> > > > that this change did not replicate anywhere else.  All of the
> > > > replicas then were re-initalized using "ipa-replica-manage
> > > > re-initialize” and now the LDAP trees/users are consistent
> > > > though no further changes have been made.
> > > > 
> > > > The slapd error logs are showing repeated instances of
> > > > 
> > > > DSRetroclPlugin - replog: an error occured while adding change
> > > > number 112697, dn = changenumber=112697,cn=changelog: Already
> > > > exists.
> > > > retrocl-plugin - retrocl_postob: operation failure [68]
> > > > 
> > > > Package versions are
> > > > ipa-server-4.2.0-15.0.1.el7.centos.6.1.x86_64
> > > > and 
> > > > 389-ds-base-1.3.4.0-29.el7_2.x86_64
> > > > 
> > > > ipa-replica-manage list-ruv
> > > > ipa: WARNING: session memcached servers not running
> > > > unable to decode: {replica 11} 56044ef50000000b0000
> > > > 56044ef50000000b0000
> > > > unable to decode: {replica 7} 561f17ba000800070000
> > > > 561f17ba000800070000
> > > > unable to decode: {replica 5} 561f17bc000300050000
> > > > 561f17bc000300050000
> > > > unable to decode: {replica 9} 561f17ba000a00090000
> > > > 561f17ba000a00090000
> > > > unable to decode: {replica 4} 561f17ba000300040000
> > > > 561f17ba000300040000 
> > > > (These are likely leftovers from the previous incarnation of
> > > > these servers on a RHEL6-like setup)
> > > > ipa07:389: 16
> > > > ipa02:389: 13
> > > > ipa03:389: 14
> > > > ipa01:389: 12
> > > > ipa04:389: 15
> > > > ipa05:389: 17
> > > > 
> > > > Thanks much,
> > > 
> > > After not taking any action, this error has stopped but has been
> > > replaced with
> > > 
> > > [26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin -
> > > agmt="cn=meToipa03" (ipa03:389): Missing data encountered
> > > [26/Sep/2016:15:54:54 -0500] NSMMReplicationPlugin -
> > > agmt="cn=meToipa03" (ipa03:389): Incremental update failed and
> > > requires administrator action
> > > 
> > > for all of the replicas and things are slightly out of sync
> > > everywhere.  
> > > 
> > > Is the best course of action here to declare one a new master and
> > > do a ipa-replica-manage re-initialize to all of the others from
> > > that one?
> > > 
> > > 
> > > 
> > 
> > 
> > After doing some testing, that’s exactly what we did and replication
> > is now working again.  It is odd that the DSRetroclPlugin errors
> > stopped on their own (after approximately 3 hours); the only action
> > taken there was looking at the cn=changelog base using ldapvi to see
> > what number it was on but that has to be a sheer coincidence;
> > absolutely no changes were made. 
> > 
> > 
> > We’re also still unsure what caused this; our best theory at the
> > moment is a race condition where everything that could have gone
> > wrong at that exact moment did..is there any validity to this?
> > 
> > 
> > Thanks,
> > "This message and any attachments may contain confidential information. If you
> > have received this  message in error, any use or distribution is prohibited. 
> > Please notify us by reply e-mail if you have mistakenly received this message,
> > and immediately and permanently delete it and any attachments. Thank you."
> > 
> > 
> 

-- 
Timothy R. Geier 
Sr. Linux Systems Administrator 
Accertify, Inc. an American Express Company 
2 Pierce Place
Suite 900
Itasca, IL 60143 
Office: + 1 (630) 735-4785 
tgeier at accertify.com





"This message and any attachments may contain confidential information. If you
have received this  message in error, any use or distribution is prohibited. 
Please notify us by reply e-mail if you have mistakenly received this message,
and immediately and permanently delete it and any attachments. Thank you."




More information about the Freeipa-users mailing list