<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">On 05/21/2015 09:59 AM, Janelle wrote:<br>
</div>
<blockquote cite="mid:555DE4D2.9050801@gmail.com" type="cite">On
5/21/15 6:46 AM, Ludwig Krispenz wrote:
<br>
<blockquote type="cite">
<br>
On 05/21/2015 03:28 PM, Janelle wrote:
<br>
<blockquote type="cite">I think I found the problem.
<br>
<br>
There was a lone replica running in another DC. It was
installed as a replica some time ago with all the others.
Think of this -- the original config had 5 servers, one of
them was this server. Then the other 4 servers were RE-BUILT
from scratch, so all the replication agreements were changed
AND - this is the important part - the 5th server was never
added back in. BUT - the 5th server was left running and never
told it that it was not a member anymore. It still thought it
had a replication agreement with original "server 1", but
server 1 knew otherwise.
<br>
<br>
Now, although the first 4 servers were rebuilt, the same
domain, realm, AND passwords were used.
<br>
<br>
I am guessing that somehow, this 5th server keeps trying to
interject its info into the ring of 4 servers, kind of forcing
its way in. Somehow, because the original credentials still
work (but certs are all different) is leaving the first 4
servers with a "can't decode" issue.
<br>
<br>
There should be some security checks so this can't happen. It
should also be easy to replicate.
<br>
<br>
Now I have to go re-initialize all the servers from a good
server, so everyone is happy again. The "problem" server has
been shutdown completely. (and yes, there were actually 3 of
them in my scenario - I just used 1 to simplify my example -
but that explains the 3 CSNs that just kept "appearing")
<br>
<br>
What concerns me most about this - were the servers outside of
the "good ring" somehow able to inject data into replication
which might have been causing bad data??? This is bad if it is
true.
<br>
</blockquote>
it depends a bit on what you mean by rebuilt from scratch.
<br>
A replication session needs to meet three conditions to be able
to send data:
<br>
- the supplier side needs to be able to authenticate and the
authenticated users has to be in the list of binddns of the
replica
<br>
- the data generation of supplier and consumer side need to be
the same (they all have to have the same common origin)
<br>
- the supplier needs to have the changes (CSNs) to be able to
position in its changelog to send updates
<br>
<br>
now if you have 5 servers, forget about one of them and do not
change the credentials in the others and do not reinitialize the
database by an ldif import to generate a new database
generation, the fifth server will still be able to connect and
eventually send updates - how should the other servers know that
this one is no longer a "good" one
<br>
<blockquote type="cite">
<br>
~Janelle
<br>
<br>
</blockquote>
<br>
</blockquote>
The only problem left now - is no matter what, this last entry
will NOT go away and now I have 2 "stuck" cleanruvs that will not
"abort" either.
<br>
<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000
<br>
<br>
CLEANALLRUV tasks
<br>
RID 24 None
<br>
No abort CLEANALLRUV tasks running
<br>
=====================================
<br>
<br>
ldapmodify -D "cn=directory manager" -W -a
<br>
<br>
dn: cn=abort 24, cn=abort cleanallruv, cn=tasks, cn=config
<br>
objectclass: extensibleObject
<br>
replica-base-dn: dc=example,dc=com
<br>
cn: abort 24
<br>
replica-id: 24
<br>
replica-certify-all: no
<br>
adding new entry <b>" cn=abort 24, cn=abort cleanallruv,
cn=tasks, cn=config"
</b><br>
ldap_add: No such object (32)
<br>
</blockquote>
There should not be a white space at the beginning: <b> " cn=abort
24, cn=abort cleanallruv, cn=tasks, cn=config"
</b><b><br>
</b><br>
When I run the abort task I don't have that extra white space, and
the task is successfully added:<br>
<br>
[root@localhost ~]# ldapmodify -D cn=dm -w password -a<br>
dn: cn=abort 24, cn=abort cleanallruv, cn=tasks, cn=config<br>
objectclass: extensibleObject<br>
replica-base-dn: dc=example,dc=com<br>
cn: abort 24<br>
replica-id: 24<br>
replica-certify-all: no <br>
<br>
adding new entry <b>"cn=abort 24, cn=abort cleanallruv, cn=tasks,
cn=config"</b><br>
<br>
The extra white space is the probable cause of the error 32 (no such
object) you were seeing. You can verify this by looking at the
access log (/var/log/dirsrv/slapd-INSTANCE/access)<br>
<br>
Like I said before you could also check the errors log for the
reason why the cleanAllRUV task is not completing as well.<br>
<br>
Regards,<br>
Mark<br>
<br>
<br>
</body>
</html>