[Freeipa-devel] user deletion in offline mode does not get replicated after node recovery

Wed Jun 17 09:06:57 UTC 2015

Hi Oleg,

can you give a bit more info on the scenarios when this happens. Always 
or is it a timing problem ?

Ludwig

On 06/16/2015 07:02 PM, thierry bordaz wrote:
> Hello
>
>
> On Master:
>     User 'onmaster' was deleted
>
> [16/Jun/2015:10:16:45 -0400] conn=402 op=19 SRCH 
> base="cn=otp,dc=bagam,dc=net" scope=1 
> filter="(&(objectClass=ipatoken)(ipatokenOwner=uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net))" 
> attrs="ipatokenNotAfter description ipatokenOwner objectClass 
> ipatokenDisabled ipatokenVendor managedBy ipatokenModel 
> ipatokenNotBefore ipatokenUniqueID ipatokenSerial"
> [16/Jun/2015:10:16:45 -0400] conn=402 op=19 RESULT err=0 tag=101 
> nentries=0 etime=0
> [16/Jun/2015:10:16:45 -0400] conn=402 op=20 DEL 
> dn="uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net"
> [16/Jun/2015:10:16:45 -0400] conn=402 op=21 UNBIND
> [16/Jun/2015:10:16:45 -0400] conn=402 op=21 fd=120 closed - U1
> [16/Jun/2015:10:16:45 -0400] conn=402 op=20 RESULT err=0 tag=107 
> nentries=0 etime=0 csn=55802fcf000300040000
>
>     Replication agreement failed to replicate it to the replica2
> [16/Jun/2015:10:18:36 -0400] NSMMReplicationPlugin - 
> agmt="cn=f22master.bagam.net-to-f22replica2.bagam.net" 
> (f22replica2:389): Consumer failed to replay change (uniqueid 
> b8242e18-143111e5-b1d0d0c3-ae5854ff, CSN 55802fcf000300040000): 
> Operations error (1). Will retry later.
>
>
> On replica2:
>
>     The replicated operation failed
> [16/Jun/2015:10:18:27 -0400] conn=8 op=4 RESULT err=0 tag=101 
> nentries=1 etime=0
> [16/Jun/2015:10:18:27 -0400] conn=8 op=5 EXT 
> oid="2.16.840.1.113730.3.5.12" name="replication-multimaster-extop"
> [16/Jun/2015:10:18:27 -0400] conn=8 op=5 RESULT err=0 tag=120 
> nentries=0 etime=0
> [16/Jun/2015:10:18:27 -0400] conn=8 op=6 DEL 
> dn="uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net"
> [16/Jun/2015:10:18:35 -0400] conn=8 op=6 RESULT err=1 tag=107 
> nentries=0 etime=8 csn=55802fcf000300040000
>
>     because of DB failures to update.
>     The failures were E_AGAIN or E_DB_DEADLOCK. In such situation, DS 
> retries after a small delay.
>     The problem is that it retried 50 times without success.
> [16/Jun/2015:10:18:34 -0400] NSMMReplicationPlugin - changelog program 
> - _cl5WriteOperationTxn: retry (49) the transaction 
> (csn=55802fcf000300040000) failed (rc=-30993 (BDB0068 
> DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock))
> [16/Jun/2015:10:18:34 -0400] NSMMReplicationPlugin - changelog program 
> - _cl5WriteOperationTxn: failed to write entry with csn 
> (55802fcf000300040000); db error - -30993 BDB0068 DB_LOCK_DEADLOCK: 
> Locker killed to resolve a deadlock
> [16/Jun/2015:10:18:34 -0400] NSMMReplicationPlugin - 
> write_changelog_and_ruv: can't add a change for 
> uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net (uniqid: 
> b8242e18-143111e5-b1d0d0c3-ae5854ff, optype: 32) to changelog csn 
> 55802fcf000300040000
> [16/Jun/2015:10:18:34 -0400] - SLAPI_PLUGIN_BE_TXN_POST_DELETE_FN 
> plugin returned error code but did not set SLAPI_RESULT_CODE
>
>
> The MAIN issue here is that replica2 successfully applied others 
> updates after 55802fcf000300040000 from the same replica (e.g 
> csn=55802fcf000400040000)
> I do not know if master was able to detect this failure and to replay 
> this update. but I am afraid it did not !!
> It is looking like you hit https://fedorahosted.org/389/ticket/47788
> Is it possible to access your VM ?
>
> [16/Jun/2015:10:18:27 -0400] conn=8 op=6 DEL 
> dn="uid=onmaster,cn=users,cn=accounts,dc=bagam,dc=net"
> [16/Jun/2015:10:18:35 -0400] conn=8 op=6 RESULT err=1 tag=107 
> nentries=0 etime=8 csn=55802fcf000300040000
> [16/Jun/2015:10:18:35 -0400] conn=8 op=7 MOD 
> dn="cn=ipausers,cn=groups,cn=accounts,dc=bagam,dc=net"
> [16/Jun/2015:10:18:36 -0400] conn=8 op=7 RESULT err=0 tag=103 
> nentries=0 etime=1 csn=55802fcf000400040000
> [16/Jun/2015:10:18:36 -0400] conn=8 op=8 DEL 
> dn="cn=onmaster,cn=groups,cn=accounts,dc=bagam,dc=net"
> [16/Jun/2015:10:18:37 -0400] conn=8 op=8 RESULT err=0 tag=107 
> nentries=0 etime=1 csn=55802fcf000700040000
> [16/Jun/2015:10:18:37 -0400] conn=8 op=9 MOD 
> dn="cn=ipausers,cn=groups,cn=accounts,dc=bagam,dc=net"
> [16/Jun/2015:10:18:37 -0400] conn=8 op=9 RESULT err=0 tag=103 
> nentries=0 etime=0 csn=55802fd0000000060000
>
>
>
>
> On 06/16/2015 04:49 PM, Oleg Fayans wrote:
>> Hi all,
>>
>> I've bumped into a strange problem with only a part of changes 
>> implemented on master during replica outage get replicated after 
>> replica recovery.
>>
>> Namely: when I delete an existing user on the master while the node 
>> is offline, these changes do not get to the node when it's back 
>> online. User creation, however, gets replicated as expected.
>>
>> Steps to reproduce:
>>
>> 1. Create the following tolopogy:
>>
>> replica1 <-> master <-> replica2 <-> replica3
>>
>> 2. Create user1 on master, make sure it appears on all replicas
>> 3. Turn off replica2
>> 4. On master delete user1 and create user2, make sure the changes get 
>> replicated to replica1
>> 5. Turn on replica2
>>
>> Expected results:
>>
>> A minute or so after repica2 is back up,
>> 1. user1 does not exist neither on replica2 nor on replica3
>> 2. user2 exists both on replica2 and replica3
>>
>> Actual results:
>> 1. user1 coexist with user2 on replica2 and replica3
>> 2. master and replica1 have only user2
>>
>>
>> In my case, though, the topology was as follows:
>> $ ipa topologysegment-find realm
>> ------------------
>> 3 segments matched
>> ------------------
>>   Segment name: f22master.bagam.net-to-f22replica3.bagam.net
>>   Left node: f22master.bagam.net
>>   Right node: f22replica3.bagam.net
>>   Connectivity: both
>>
>>   Segment name: replica1-to-replica2
>>   Left node: f22replica1.bagam.net
>>   Right node: f22replica2.bagam.net
>>   Connectivity: both
>>
>>   Segment name: replica2-to-master
>>   Left node: f22replica2.bagam.net
>>   Right node: f22master.bagam.net
>>   Connectivity: both
>> ----------------------------
>> Number of entries returned 3
>> ----------------------------
>> And I was turning off replica2, leaving replica1 offline, but that 
>> does not really matter.
>>
>> The dirsrv error message, most likely to be relevant is:
>> ----------------------------------------------------------------------------------------------------------------------------------------------------- 
>>
>> Consumer failed to replay change (uniqueid 
>> b8242e18-143111e5-b1d0d0c3-ae5854ff, CSN 55802fcf000300040000): 
>> Operations error (1). Will retry later
>> ----------------------------------------------------------------------------------------------------------------------------------------------------- 
>>
>>
>> I attach dirsrv error and access logs from all nodes, in case they 
>> could be useful
>>
>>
>>
>>
>>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20150617/9a79804a/attachment.htm>