<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 5/21/15 5:20 AM, thierry bordaz wrote:<br>
<blockquote cite="mid:555DCD90.4070306@redhat.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 05/21/2015 01:36 PM, Janelle
wrote:<br>
</div>
<blockquote cite="mid:555DC320.30203@gmail.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
On 5/20/15 7:53 AM, Mark Reynolds wrote:<br>
<blockquote cite="mid:555CA000.4030208@redhat.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<br>
<br>
<div class="moz-cite-prefix">On 05/20/2015 10:17 AM, thierry
bordaz wrote:<br>
</div>
<blockquote cite="mid:555C976E.30202@redhat.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 05/20/2015 03:46 PM, Janelle
wrote:<br>
</div>
<blockquote cite="mid:555C9049.70607@gmail.com" type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
On 5/20/15 6:01 AM, thierry bordaz wrote:<br>
<blockquote cite="mid:555C85B6.2050104@redhat.com"
type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 05/20/2015 02:57 AM,
Janelle wrote:<br>
</div>
<blockquote cite="mid:555BDBE1.8060604@gmail.com"
type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
On 5/19/15 12:04 AM, thierry bordaz wrote:<br>
<blockquote cite="mid:555AE069.4010901@redhat.com"
type="cite">
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 05/19/2015 03:42 AM,
Janelle wrote:<br>
</div>
<blockquote cite="mid:555A9510.6040403@gmail.com"
type="cite">On 5/18/15 6:23 PM, Janelle wrote: <br>
<blockquote type="cite">Once again,
replication/sync has been lost. I really wish
the product was more stable, it is so much
potential and yet. <br>
<br>
Servers running for 6 days no issues. No new
accounts or changes (maybe a few users changing
passwords) and again, 5 out of 16 servers are no
longer in sync. <br>
<br>
I can test it easily by adding an account and
then waiting a few minutes, then run "ipa
user-show --all username" on all the servers,
and only a few of them have the account. I have
now waited 15 minutes, still no luck. <br>
<br>
Oh well.. I guess I will go look at
alternatives. I had such high hopes for this
tool. Thanks so much everyone for all your help
in trying to get things stable, but for whatever
reason, there is a random loss of sync among the
servers and obviously this is not acceptable. <br>
<br>
regards <br>
~J <br>
</blockquote>
<br>
</blockquote>
</blockquote>
<br>
All the replicas are happy again. I found these again:<br>
<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 23} 5553e3a3000000170000
55543240000300170000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
<br>
What I also found to be interesting is that I have not
deleted any masters at all, so this was quite
perplexing where the orphaned entries came from.
However I did find 3 of the replicas did not show
complete RUV lists... While most of the replicas had a
list of all 16 servers, a couple of them listed only 4
or 5. (using ipa-replica-manage list-ruv)<br>
</blockquote>
I don't know about the orphaned entries. Did you get
entries below deleted parents ?<br>
<br>
AFAIK all replicas are master and so have an entry
{replica <rid>} in the RUV. We should expect all
servers having the same number of RUVelements (16, 4 or
5). The servers with 4 or 5 may be isolated so that they
did not received updates from those with 16 RUVelements.<br>
would you copy/paste an example of RUV with 16 and with
4-5 ?<br>
</blockquote>
<br>
Now, the steps to clear this were:<br>
<br>
Removed the "unable to decode" with the direct
ldapmodify's. This worked across all replicas, which was
nice and did not have to be repeated in each one. In other
words, entered on a single server, and it was removed on
all.<br>
</blockquote>
Hello,<br>
<br>
Did you do direct ldapmodify onto the RUV entry (<tt><font
size="-1">nsuniqueid=ffffffff-ffffffff-ffffffff-ffffffff,</font></tt><tt>
SUFFIX</tt>) , clean RUV ?<br>
</blockquote>
Thierry,<br>
<br>
Janelle just manually added a cleanallruv task (that I had
recommended the other week). <br>
<br>
Mark<br>
<blockquote cite="mid:555C976E.30202@redhat.com" type="cite">
<br>
dc1-ipa1 and dc1-ipa2 are missing some RUVelement. If you
do an update on dc3-ipa1, is it replicated to dc1-ipa[12] ?<br>
<br>
Also there are duplicated RID (9, 25) for
dc1-ipa2.example.com:389. You may see some messages like
'attrlist_replace' in some error logs. <br>
25 seems to be the new RID.<br>
<br>
thanks<br>
thierry<br>
<br>
<blockquote cite="mid:555C9049.70607@gmail.com" type="cite">
<br>
re-initialized --from=good server on the ones with the
short list.<br>
<br>
Waited 5 minutes to let everything settle, then started
running tests of adds/deletes which seemed to be just
fine.<br>
<br>
Here are 2 of the DCs<br>
<br>
-------------------------------------<br>
Node dc1-ipa1 <br>
-------------------------------------<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa4.example.com 389 4<br>
-------------------------------------<br>
Node dc1-ipa2 <br>
-------------------------------------<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa2.example.com 389 25<br>
dc1-ipa3.example.com 389 8<br>
dc1-ipa4.example.com 389 4<br>
-------------------------------------<br>
Node dc1-ipa3 <br>
-------------------------------------<br>
dc3-ipa1.example.com 389 14<br>
dc3-ipa2.example.com 389 13<br>
dc3-ipa3.example.com 389 12<br>
dc3-ipa4.example.com 389 11<br>
dc2-ipa1.example.com 389 7<br>
dc2-ipa2.example.com 389 6<br>
dc2-ipa3.example.com 389 5<br>
dc2-ipa4.example.com 389 3<br>
dc4-ipa1.example.com 389 18<br>
dc4-ipa2.example.com 389 19<br>
dc4-ipa3.example.com 389 20<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa2.example.com 389 25<br>
dc1-ipa2.example.com 389 9<br>
dc1-ipa3.example.com 389 8<br>
dc1-ipa4.example.com 389 4<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
dc5-ipa1.example.com 389 26<br>
dc5-ipa2.example.com 389 15<br>
dc5-ipa3.example.com 389 17<br>
-------------------------------------<br>
Node dc1-ipa4 <br>
-------------------------------------<br>
dc3-ipa1.example.com 389 14<br>
dc3-ipa2.example.com 389 13<br>
dc3-ipa3.example.com 389 12<br>
dc3-ipa4.example.com 389 11<br>
dc2-ipa1.example.com 389 7<br>
dc2-ipa2.example.com 389 6<br>
dc2-ipa3.example.com 389 5<br>
dc2-ipa4.example.com 389 3<br>
dc4-ipa1.example.com 389 18<br>
dc4-ipa2.example.com 389 19<br>
dc4-ipa3.example.com 389 20<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa2.example.com 389 25<br>
dc1-ipa2.example.com 389 9<br>
dc1-ipa3.example.com 389 8<br>
dc1-ipa4.example.com 389 4<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
dc5-ipa1.example.com 389 26<br>
dc5-ipa2.example.com 389 15<br>
dc5-ipa3.example.com 389 17<br>
-------------------------------------<br>
Node dc2-ipa1 <br>
-------------------------------------<br>
dc3-ipa1.example.com 389 14<br>
dc3-ipa2.example.com 389 13<br>
dc3-ipa3.example.com 389 12<br>
dc3-ipa4.example.com 389 11<br>
dc2-ipa1.example.com 389 7<br>
dc2-ipa2.example.com 389 6<br>
dc2-ipa3.example.com 389 5<br>
dc2-ipa4.example.com 389 3<br>
dc4-ipa1.example.com 389 18<br>
dc4-ipa2.example.com 389 19<br>
dc4-ipa3.example.com 389 20<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa2.example.com 389 25<br>
dc1-ipa2.example.com 389 9<br>
dc1-ipa3.example.com 389 8<br>
dc1-ipa4.example.com 389 4<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 23} 5553e3a3000000170000
55543240000300170000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
dc5-ipa1.example.com 389 26<br>
dc5-ipa2.example.com 389 15<br>
dc5-ipa3.example.com 389 17<br>
-------------------------------------<br>
Node dc2-ipa2 <br>
-------------------------------------<br>
dc3-ipa1.example.com 389 14<br>
dc3-ipa2.example.com 389 13<br>
dc3-ipa3.example.com 389 12<br>
dc3-ipa4.example.com 389 11<br>
dc2-ipa1.example.com 389 7<br>
dc2-ipa2.example.com 389 6<br>
dc2-ipa3.example.com 389 5<br>
dc2-ipa4.example.com 389 3<br>
dc4-ipa1.example.com 389 18<br>
dc4-ipa2.example.com 389 19<br>
dc4-ipa3.example.com 389 20<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa2.example.com 389 25<br>
dc1-ipa2.example.com 389 9<br>
dc1-ipa3.example.com 389 8<br>
dc1-ipa4.example.com 389 4<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
dc5-ipa1.example.com 389 26<br>
dc5-ipa2.example.com 389 15<br>
dc5-ipa3.example.com 389 17<br>
-------------------------------------<br>
Node dc2-ipa3 <br>
-------------------------------------<br>
dc3-ipa1.example.com 389 14<br>
dc3-ipa2.example.com 389 13<br>
dc3-ipa3.example.com 389 12<br>
dc3-ipa4.example.com 389 11<br>
dc2-ipa1.example.com 389 7<br>
dc2-ipa2.example.com 389 6<br>
dc2-ipa3.example.com 389 5<br>
dc2-ipa4.example.com 389 3<br>
dc4-ipa1.example.com 389 18<br>
dc4-ipa2.example.com 389 19<br>
dc4-ipa3.example.com 389 20<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa2.example.com 389 25<br>
dc1-ipa2.example.com 389 9<br>
dc1-ipa3.example.com 389 8<br>
dc1-ipa4.example.com 389 4<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
dc5-ipa1.example.com 389 26<br>
dc5-ipa2.example.com 389 15<br>
dc5-ipa3.example.com 389 17<br>
-------------------------------------<br>
Node dc2-ipa4 <br>
-------------------------------------<br>
dc3-ipa1.example.com 389 14<br>
dc3-ipa2.example.com 389 13<br>
dc3-ipa3.example.com 389 12<br>
dc3-ipa4.example.com 389 11<br>
dc2-ipa1.example.com 389 7<br>
dc2-ipa2.example.com 389 6<br>
dc2-ipa3.example.com 389 5<br>
dc2-ipa4.example.com 389 3<br>
dc4-ipa1.example.com 389 18<br>
dc4-ipa2.example.com 389 19<br>
dc4-ipa3.example.com 389 20<br>
dc4-ipa4.example.com 389 21<br>
dc1-ipa1.example.com 389 10<br>
dc1-ipa2.example.com 389 25<br>
dc1-ipa2.example.com 389 9<br>
dc1-ipa3.example.com 389 8<br>
dc1-ipa4.example.com 389 4<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
dc5-ipa1.example.com 389 26<br>
dc5-ipa2.example.com 389 15<br>
dc5-ipa3.example.com 389 17<br>
<br>
<br>
Happy Wednesday<br>
~Janelle<br>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
</blockquote>
<br>
</blockquote>
<br>
And just like that - for no reason, they all reappeared:<br>
<br>
unable to decode {replica 16} 55356472000300100000
55356472000300100000<br>
unable to decode {replica 23} 5545d61f000200170000
5552f718000300170000<br>
unable to decode {replica 24} 554d53d3000000180000
554d54a4000200180000<br>
<br>
:-(<br>
~J<br>
<br>
</blockquote>
Hello Janelle,<br>
<br>
Those 3 RIDs were already present in Node dc2-ipa1, correct ? They
reappeared on others nodes as well ?<br>
May be ds2-ipa1 established a replication session with its peers
and send those RIDs.<br>
Could you track in all the access logs, when the op
csn=5552f718000300170000 was applied.<br>
<br>
Note that the two hexa values of replica 23 changed
(5545d61f000200170000 5552f718000300170000 vs 5553e3a3000000170000
55543240000300170000). Have you recreated a replica 23 ?.<br>
<br>
Do you have replication logging enabled ?<br>
<br>
thanks<br>
thierry<br>
<br>
<br>
</blockquote>
As I mentioned in the email I just sent and to be clear - NOTHING
changed in the environment. No new replicas. No changes in the
servers at all other than some simple add and deletes of users.
This just happens randomly. In the process of trying to clean them
to get back into production, as it is causing issues, and I need
production to run. Back later once I am running again.<br>
<br>
~Janelle<br>
</body>
</html>