[Freeipa-devel] user deletion in offline mode does not get replicated after node recovery

Tue Jun 16 17:02:02 UTC 2015

Hello

On Master:
     User 'onmaster' was deleted

[16/Jun/2015:10:16:45 -0400] conn=402 op=19 SRCH 
base="cn=otp,dc=bagam,dc=net" scope=1 
filter="(&(objectClass=ipatoken)(ipatokenOwner=uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net))" 
attrs="ipatokenNotAfter description ipatokenOwner objectClass 
ipatokenDisabled ipatokenVendor managedBy ipatokenModel 
ipatokenNotBefore ipatokenUniqueID ipatokenSerial"
[16/Jun/2015:10:16:45 -0400] conn=402 op=19 RESULT err=0 tag=101 
nentries=0 etime=0
[16/Jun/2015:10:16:45 -0400] conn=402 op=20 DEL 
dn="uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net"
[16/Jun/2015:10:16:45 -0400] conn=402 op=21 UNBIND
[16/Jun/2015:10:16:45 -0400] conn=402 op=21 fd=120 closed - U1
[16/Jun/2015:10:16:45 -0400] conn=402 op=20 RESULT err=0 tag=107 
nentries=0 etime=0 csn=55802fcf000300040000

     Replication agreement failed to replicate it to the replica2
[16/Jun/2015:10:18:36 -0400] NSMMReplicationPlugin - 
agmt="cn=f22master.bagam.net-to-f22replica2.bagam.net" 
(f22replica2:389): Consumer failed to replay change (uniqueid 
b8242e18-143111e5-b1d0d0c3-ae5854ff, CSN 55802fcf000300040000): 
Operations error (1). Will retry later.

On replica2:

     The replicated operation failed
[16/Jun/2015:10:18:27 -0400] conn=8 op=4 RESULT err=0 tag=101 nentries=1 
etime=0
[16/Jun/2015:10:18:27 -0400] conn=8 op=5 EXT 
oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
[16/Jun/2015:10:18:27 -0400] conn=8 op=5 RESULT err=0 tag=120 nentries=0 
etime=0
[16/Jun/2015:10:18:27 -0400] conn=8 op=6 DEL 
dn="uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net"
[16/Jun/2015:10:18:35 -0400] conn=8 op=6 RESULT err=1 tag=107 nentries=0 
etime=8 csn=55802fcf000300040000

     because of DB failures to update.
     The failures were E_AGAIN or E_DB_DEADLOCK. In such situation, DS 
retries after a small delay.
     The problem is that it retried 50 times without success.
[16/Jun/2015:10:18:34 -0400] NSMMReplicationPlugin - changelog program - 
_cl5WriteOperationTxn: retry (49) the transaction 
(csn=55802fcf000300040000) failed (rc=-30993 (BDB0068 DB_LOCK_DEADLOCK: 
Locker killed to resolve a deadlock))
[16/Jun/2015:10:18:34 -0400] NSMMReplicationPlugin - changelog program - 
_cl5WriteOperationTxn: failed to write entry with csn 
(55802fcf000300040000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: 
Locker killed to resolve a deadlock
[16/Jun/2015:10:18:34 -0400] NSMMReplicationPlugin - 
write_changelog_and_ruv: can't add a change for 
uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net (uniqid: 
b8242e18-143111e5-b1d0d0c3-ae5854ff, optype: 32) to changelog csn 
55802fcf000300040000
[16/Jun/2015:10:18:34 -0400] - SLAPI_PLUGIN_BE_TXN_POST_DELETE_FN plugin 
returned error code but did not set SLAPI_RESULT_CODE

The MAIN issue here is that replica2 successfully applied others updates 
after 55802fcf000300040000 from the same replica (e.g 
csn=55802fcf000400040000)
I do not know if master was able to detect this failure and to replay 
this update. but I am afraid it did not !!
It is looking like you hit https://fedorahosted.org/389/ticket/47788
Is it possible to access your VM ?

[16/Jun/2015:10:18:27 -0400] conn=8 op=6 DEL 
dn="uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net"
[16/Jun/2015:10:18:35 -0400] conn=8 op=6 RESULT err=1 tag=107 nentries=0 
etime=8 csn=55802fcf000300040000
[16/Jun/2015:10:18:35 -0400] conn=8 op=7 MOD 
dn="cn=ipausers,cn=groups,cn=accounts,dc=bagam,dc=net"
[16/Jun/2015:10:18:36 -0400] conn=8 op=7 RESULT err=0 tag=103 nentries=0 
etime=1 csn=55802fcf000400040000
[16/Jun/2015:10:18:36 -0400] conn=8 op=8 DEL 
dn="cn=onmaster,cn=groups,cn=accounts,dc=bagam,dc=net"
[16/Jun/2015:10:18:37 -0400] conn=8 op=8 RESULT err=0 tag=107 nentries=0 
etime=1 csn=55802fcf000700040000
[16/Jun/2015:10:18:37 -0400] conn=8 op=9 MOD 
dn="cn=ipausers,cn=groups,cn=accounts,dc=bagam,dc=net"
[16/Jun/2015:10:18:37 -0400] conn=8 op=9 RESULT err=0 tag=103 nentries=0 
etime=0 csn=55802fd0000000060000

On 06/16/2015 04:49 PM, Oleg Fayans wrote:
> Hi all,
>
> I've bumped into a strange problem with only a part of changes 
> implemented on master during replica outage get replicated after 
> replica recovery.
>
> Namely: when I delete an existing user on the master while the node is 
> offline, these changes do not get to the node when it's back online. 
> User creation, however, gets replicated as expected.
>
> Steps to reproduce:
>
> 1. Create the following tolopogy:
>
> replica1 <-> master <-> replica2 <-> replica3
>
> 2. Create user1 on master, make sure it appears on all replicas
> 3. Turn off replica2
> 4. On master delete user1 and create user2, make sure the changes get 
> replicated to replica1
> 5. Turn on replica2
>
> Expected results:
>
> A minute or so after repica2 is back up,
> 1. user1 does not exist neither on replica2 nor on replica3
> 2. user2 exists both on replica2 and replica3
>
> Actual results:
> 1. user1 coexist with user2 on replica2 and replica3
> 2. master and replica1 have only user2
>
>
> In my case, though, the topology was as follows:
> $ ipa topologysegment-find realm
> ------------------
> 3 segments matched
> ------------------
>   Segment name: f22master.bagam.net-to-f22replica3.bagam.net
>   Left node: f22master.bagam.net
>   Right node: f22replica3.bagam.net
>   Connectivity: both
>
>   Segment name: replica1-to-replica2
>   Left node: f22replica1.bagam.net
>   Right node: f22replica2.bagam.net
>   Connectivity: both
>
>   Segment name: replica2-to-master
>   Left node: f22replica2.bagam.net
>   Right node: f22master.bagam.net
>   Connectivity: both
> ----------------------------
> Number of entries returned 3
> ----------------------------
> And I was turning off replica2, leaving replica1 offline, but that 
> does not really matter.
>
> The dirsrv error message, most likely to be relevant is:
> ----------------------------------------------------------------------------------------------------------------------------------------------------- 
>
> Consumer failed to replay change (uniqueid 
> b8242e18-143111e5-b1d0d0c3-ae5854ff, CSN 55802fcf000300040000): 
> Operations error (1). Will retry later
> ----------------------------------------------------------------------------------------------------------------------------------------------------- 
>
>
> I attach dirsrv error and access logs from all nodes, in case they 
> could be useful
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20150616/a3ca8366/attachment.htm>