<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 11/21/2014 04:51 AM, thierry bordaz
wrote:<br>
</div>
<blockquote cite="mid:546F193B.4020602@redhat.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 11/21/2014 10:59 AM, <a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:dbischof@hrz.uni-kassel.de">dbischof@hrz.uni-kassel.de</a>
wrote:<br>
</div>
<blockquote cite="mid:alpine.LSU.2.11.1411211033510.1449@fred"
type="cite">Hi, <br>
<br>
On Thu, 20 Nov 2014, thierry bordaz wrote: <br>
<br>
<blockquote type="cite">On 11/20/2014 12:03 PM, <a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:dbischof@hrz.uni-kassel.de">dbischof@hrz.uni-kassel.de</a>
wrote: <br>
<blockquote type="cite"> <br>
On Thu, 20 Nov 2014, thierry bordaz wrote: <br>
<br>
<blockquote type="cite">Server1 successfully replicated to
Server2, but Server2 fails to replicated to Server1. <br>
<br>
The replication Server2->Server1 is done with kerberos
authentication. Server1 receives the replication session,
successfully identify the replication manager, start to
receives replication extop but suddenly closes the
connection. <br>
<br>
<br>
[19/Nov/2014:14:21:39 +0100] conn=2980 fd=78 slot=78
connection from <br>
xxx to yyy <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=0 BIND dn=""
method=sasl <br>
version=3 mech=GSSAPI <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=0 RESULT
err=14 tag=97 <br>
nentries=0 etime=0, SASL bind in progress <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=1 BIND dn=""
method=sasl <br>
version=3 mech=GSSAPI <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=1 RESULT
err=14 tag=97 <br>
nentries=0 etime=0, SASL bind in progress <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=2 BIND dn=""
method=sasl <br>
version=3 mech=GSSAPI <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=2 RESULT err=0
tag=97 <br>
nentries=0 etime=0 dn="krbprincipalname=xxx" <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=3 SRCH base=""
scope=0 <br>
filter="(objectClass=*)" attrs="supportedControl
supportedExtension" <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=3 RESULT err=0
tag=101 <br>
nentries=1 etime=0 <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=4 SRCH base=""
scope=0 <br>
filter="(objectClass=*)" attrs="supportedControl
supportedExtension" <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=4 RESULT err=0
tag=101 <br>
nentries=1 etime=0 <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=5 EXT <br>
oid="2.16.840.1.113730.3.5.12"
name="replication-multimaster-extop" <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=5 RESULT err=0
tag=120 <br>
nentries=0 etime=0 <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=6 SRCH
base="cn=schema" <br>
scope=0 filter="(objectClass=*)" attrs="nsSchemaCSN" <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=6 RESULT err=0
tag=101 <br>
nentries=1 etime=0 <br>
[19/Nov/2014:14:21:39 +0100] conn=2980 op=-1 fd=78
closed - I/O <br>
function error. <br>
<br>
The reason of this closure is logged in server1 error log.
sasl_decode fails to decode a received PDU. <br>
<br>
[19/Nov/2014:14:21:39 +0100] - sasl_io_recv failed to
decode packet <br>
for connection 2980 <br>
<br>
I do not know why it fails but I wonder if the received
PDU is not larger than the maximum configured value. The
attribute nsslapd-maxsasliosize is set to 2Mb by default.
Would it be possible to increase its value (5Mb) to see if
it has an impact <br>
<br>
[...] <br>
</blockquote>
<br>
I set nsslapd-maxsasliosize to 6164480 on both machines, but
the problem remains. <br>
</blockquote>
<br>
The sasl-decode fails but the exact returned value is not
logged. With standard version we may need to attach a debugger
and then set a conditional breakpoint in sasl-decode just
after conn->oparams.decode that will fire if result !=0.
Now this can change the dynamic and possibly prevent the
problem to occur again. The other option is to use an
instrumented version to log this value. <br>
</blockquote>
<br>
If I understand the mechanism correctly, Server1 needs to have
debug versions of the relevant packages (probably 389-ds-base
and cyrus-sasl) installed in order to track down the problem.
Unfortunately, my Server1 is in production use - if I break it,
my colleagues will grab forks and torches and be after me. A
short downtime would be ok, though. <br>
<br>
Is there something else I could do? <br>
</blockquote>
<br>
Hello, <br>
<br>
Sure I do not want to trigger so much trouble <span
class="moz-smiley-s3"><span> ;-) </span></span><br>
<br>
<br>
I think my email was not clear. To go further we would need to
know the exact reason why sasl_decode fails. I see two options:<br>
<ul>
<li>Prepare a debug version, that would report in the error logs
the returned valud of sasl_decode (when it fails). Except
downtime to install the debug version, it has no impact in
production.<br>
<br>
</li>
<li>Do a debug session (gdb) on Server1. The debug session will
install a breakpoint at a specific place, let the server run,
catch the sasl_decode failure and note the return code, exit
from debugger. <br>
When the problem occurs, it happens regularly (each 5 seconds)
so we should not have to wait long.<br>
That means that debugging Server1 should disturb production
for 5 to 10 min.<br>
A detailed procedure to do the debug session is required.<br>
</li>
</ul>
</blockquote>
For starters:
<a class="moz-txt-link-freetext" href="http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-crashes">http://www.port389.org/docs/389ds/FAQ/faq.html#debugging-crashes</a><br>
<br>
Since this is IPA you will need debuginfo packages for ipa and
slapi-nis in addition to the ones for 389.<br>
<br>
Take a look at the Debugging Hangs section where it describes how to
use gdb to get a stack trace. If you can use that gdb command to
get a stack trace with the full debugging symbols (and if you don't
know what that means, just post the redacted stack trace somewhere
and provide us with a link to it), then you should be all ready to
do a gdb session to reproduce the error and "catch it in the act".<br>
<br>
<blockquote cite="mid:546F193B.4020602@redhat.com" type="cite">
<ul>
<li> </li>
</ul>
<p>thanks<br>
thierry<br>
</p>
<blockquote cite="mid:alpine.LSU.2.11.1411211033510.1449@fred"
type="cite"> <br>
<br>
Mit freundlichen Gruessen/With best regards, <br>
<br>
--Daniel. <br>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
</blockquote>
<br>
</body>
</html>