[Freeipa-users] replica_generate_next_csn messages in dirsrv error logs

Ludwig Krispenz lkrispen at redhat.com
Fri Aug 19 13:18:27 UTC 2016


On 08/18/2016 05:28 PM, John Desantis wrote:
> Ludwig,
>
>> unfortunately this is not enough to determine what is going on. The
>> intersting generated/used csn is only logged in the
>> corresponding RESULT message and these are only the replication connections,
>> it would be necessary to see the
>> original ADD operation, was it added once or twice by a client ?
>> you could pick one entry eg server-6-3-sp and grep for all references in the
>> access logs of both servers  (maybe there are mods as well) and then
>> get also get the RESULT line for the ops found
> Here are the updated log snippets looking for ADD and RESULT:
you still only grep the replication connection, but before being 
replicated the entry has to be added by some client connection, can you 
get all references to the entry ?
the log snippet you provide shows also csns with tag=103, which indicate 
a MOD, are these MODs for the added entries ? or other mods ?
>
> PROD:11:20:13-root at REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
> # grep -E '17/Aug/2016:13:50:4.*conn=602.*(RESULT|ADD)' access.2016081*
> access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4139
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:41 -0400] conn=602 op=4140
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4141
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4142
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
> ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb000300040000
> access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4144
> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bb000400040000
> access.20160817-124811:[17/Aug/2016:13:50:44 -0400] conn=602 op=4145
> RESULT err=0 tag=103 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4146
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4147
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
> ADD dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4bb000800040000
> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4149
> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4bc000100040000
> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4150
> RESULT err=0 tag=103 nentries=0 etime=0
> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
> ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
> RESULT err=0 tag=105 nentries=0 etime=0 csn=57b4a4c1000500040000
> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4152
> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c1000600040000
>
> PROD:11:19:54-root at MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
> # grep -E '17/Aug/2016:13:50:4.*conn=1395.*(RESULT|ADD)' access.2016081*
> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4148
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4149
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:41 -0400] conn=1395 op=4150
> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4b9000500160000
> access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
> ADD dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
> access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
> RESULT err=0 tag=105 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:44 -0400] conn=1395 op=4152
> RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4bc000000160000
> access.20160817-111940:[17/Aug/2016:13:50:46 -0400] conn=1395 op=4153
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4154
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4155
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:47 -0400] conn=1395 op=4156
> RESULT err=0 tag=120 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:48 -0400] conn=1395 op=4157
> RESULT err=0 tag=103 nentries=0 etime=1 csn=57b4a4c1000100160000
> access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
> ADD dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
> access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
> RESULT err=0 tag=105 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4159
> RESULT err=0 tag=103 nentries=0 etime=0
> access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4160
> RESULT err=0 tag=103 nentries=0 etime=0 csn=57b4a4c3000000160000
>
> I'm positive that I was the only one performing DNS updates during
> this time, and I was only using 1 console.
>
> Thanks,
> John DeSantis
>
>
> 2016-08-18 10:09 GMT-04:00 Ludwig Krispenz <lkrispen at redhat.com>:
>> On 08/18/2016 03:15 PM, John Desantis wrote:
>>> Ludwig,
>>>
>>> Thank you for your response!
>>>
>>>> This is a normal scenario, but you could check if the simultaneous
>>>> updates
>>>> on 4 and 16 are intentional.
>>> In regards to the simultaneous updates, the only items I have noted so far
>>> are:
>>>
>>> *  The time sync between the master (4) and replica (16) was off by
>>> about 1-2 seconds, with the latter being ahead;
>> yes, this happens, but the replication protocol tries to handle this, in a
>> replication session the supplier and consumer
>> exchange their ruvs and if the time differs the csn state generator is
>> updated with a local or remote offset so that the
>> generated time is always based on the most advanced clock - on all servers.
>> And even if you adjust the system time, the csn
>> time will never go back.
>>> *  There are continual log entries referencing
>>> "replication-multimaster-extop" and "Netscape Replication End Session"
>>> in the dirsrv "access" logs, and during one of the manifestations of
>>> "replica_generate_next_csn", I found this:
>>>
>>> PROD:08:46:08-root at REPLICA:/var/log/dirsrv/slapd-DOM-DOM-DOM
>>> # grep -E '17/Aug/2016:13:50:4.*conn=602.*ADD' access.2016081*
>>> access.20160817-124811:[17/Aug/2016:13:50:42 -0400] conn=602 op=4143
>>> ADD
>>> dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>>> access.20160817-124811:[17/Aug/2016:13:50:47 -0400] conn=602 op=4148
>>> ADD
>>> dn="idnsname=server-6-4-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>>> access.20160817-124811:[17/Aug/2016:13:50:49 -0400] conn=602 op=4151
>>> ADD
>>> dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>>>
>>> PROD:08:47:44-root at MASTER:/var/log/dirsrv/slapd-DOM-DOM-DOM
>>> # grep -E '17/Aug/2016:13:50:4.*conn=1395.*ADD' access.2016081*
>>> access.20160817-111940:[17/Aug/2016:13:50:43 -0400] conn=1395 op=4151
>>> ADD
>>> dn="idnsname=server-6-3-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>>> access.20160817-111940:[17/Aug/2016:13:50:49 -0400] conn=1395 op=4158
>>> ADD
>>> dn="idnsname=server-6-5-sp,idnsname=dom.dom.dom,cn=dns,dc=dom,dc=dom,dc=dom"
>>>
>>> It looks like the entries for server-6-3-sp and 6-5-sp were referenced
>>> twice.  Do you think that the time being off by 1-2 seconds between
>>> the master and replica could be the issue?  The connection 602 is the
>>> replication between the replica and master, and the connection 1395 is
>>> the replication between the master and replica.
>> unfortunately this is not enough to determine what is going on. The
>> intersting generated/used csn is only logged in the
>> corresponding RESULT message and these are only the replication connections,
>> it would be necessary to see the
>> original ADD operation, was it added once or twice by a client ?
>> you could pick one entry eg server-6-3-sp and grep for all references in the
>> access logs of both servers  (maybe there are mods as well) and then
>> get also get the RESULT line for the ops found
>>>
>>> Since I know these operations were performed using the console via a
>>> for loop 'ipa dnsrecord-add dom.dom.dom server-6-$i-sp
>>> --a-rec=10.250.12.$i' on one of our login nodes, do you think that
>>> specifying an _srv_ record in the DOMAIN configuration with the
>>> address of the master server, e.g.: ipa_server = _srv_,
>>> MASTER.dom.dom.dom could be the issue (coupled with the time syncing)?
>>>
>>> I know that these questions are probably leaning more towards the
>>> 389ds team, so feel free to pass me over to them if need be.
>> I think I can address the ds related questions, but I don't know about
>> console and dns to assess if the behaviour is normal
>>
>>> Again, thank you very much for responding!
>>>
>>> John DeSantis
>>>
>>> 2016-08-18 4:14 GMT-04:00 Ludwig Krispenz <lkrispen at redhat.com>:
>>>> On 08/17/2016 08:54 PM, John Desantis wrote:
>>>>> Hello all,
>>>>>
>>>>> We've been re-using old host names and IP addresses for a new
>>>>> deployment of nodes, and recently I've been seeing the messages pasted
>>>>> below in the slapd-DC.DC.DC "error" log on our nodes.
>>>>>
>>>>> [17/Aug/2016:10:30:30 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b475cd001200040000 <= basecsn=57b475cf000000160000, adjusted
>>>>> opcsn=57b475cf000100040000
>>>>> [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b47f00000200040000 <= basecsn=57b47f00000200160000, adjusted
>>>>> opcsn=57b47f00000300040000
>>>>> [17/Aug/2016:11:09:44 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b47f00000400040000 <= basecsn=57b47f00000400160000, adjusted
>>>>> opcsn=57b47f00000500040000
>>>>> [17/Aug/2016:11:10:33 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b47f2f001000040000 <= basecsn=57b47f30000200160000, adjusted
>>>>> opcsn=57b47f30000300040000
>>>>> [17/Aug/2016:13:50:45 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b4a4bb000900040000 <= basecsn=57b4a4bc000000160000, adjusted
>>>>> opcsn=57b4a4bc000100040000
>>>>> [17/Aug/2016:13:52:54 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b4a53e000a00040000 <= basecsn=57b4a53f000000160000, adjusted
>>>>> opcsn=57b4a53f000100040000
>>>>> [17/Aug/2016:13:53:15 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b4a552000700040000 <= basecsn=57b4a553000000160000, adjusted
>>>>> opcsn=57b4a553000100040000
>>>>> [17/Aug/2016:13:53:32 -0400] - replica_generate_next_csn:
>>>>> opcsn=57b4a562000900040000 <= basecsn=57b4a564000000160000, adjusted
>>>>> opcsn=57b4a564000100040000
>>>> Each modification (add/del/mod) gets a csn assignged used in replication
>>>> update resolution. And each assigned csn has to newer than an existing
>>>> one.
>>>> The messages you see is from code that double checks that the entry
>>>> doesn't
>>>> have already a lareg csn - and adjusts it.
>>>> The logs indicate that entries are more or less concurrently updated on
>>>> replica 4 and 16, and the updates from16 are received while processing
>>>> the
>>>> updates on 4.
>>>> This is a normal scenario, but you could check if the simultaneous
>>>> updates
>>>> on 4 and 16 are intentional.
>>>>
>>>>> They seem to only occur when updating DNS entries, whether on the
>>>>> console or via the GUI (tail -f'ing the log).
>>>>>
>>>>> A search in this mailing-list returns nothing, but a message is found
>>>>> on the 389-ds list [1];  it seems to suggest that the messages aren't
>>>>> fatal and are purely informational, yet if they are occurring
>>>>> constantly that there could be a problem with the replication
>>>>> algorithm and/or deployment.
>>>>>
>>>>> We're using ipa-server 3.0.0-47 and 389-ds 1.2.11.15-60.  Nothing has
>>>>> changed on the deployment side of things, and I don't recall seeing
>>>>> this message before.
>>>>>
>>>>> I'm wondering if it's safe to disregard these messages due to the
>>>>> re-use of the entries, or if something else should be looked into.
>>>>>
>>>>> Thank you,
>>>>> John DeSantis
>>>>>
>>>>> [1] https://fedorahosted.org/389/ticket/47959
>>>>>
>>>> --
>>>> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
>>>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>>>> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
>>>> Eric Shander
>>>>
>>>> --
>>>> Manage your subscription for the Freeipa-users mailing list:
>>>> https://www.redhat.com/mailman/listinfo/freeipa-users
>>>> Go to http://freeipa.org for more info on the project
>>
>> --
>> Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
>> Commercial register: Amtsgericht Muenchen, HRB 153243,
>> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill,
>> Eric Shander
>>
>> --
>> Manage your subscription for the Freeipa-users mailing list:
>> https://www.redhat.com/mailman/listinfo/freeipa-users
>> Go to http://freeipa.org for more info on the project

-- 
Red Hat GmbH, http://www.de.redhat.com/, Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander




More information about the Freeipa-users mailing list