[Freeipa-users] Antwort: Re: Haunted servers?

Fri May 29 12:59:47 UTC 2015

> 
> On May 29, 2015, at 00:41, thierry bordaz <tbordaz at redhat.com> wrote:
> 
>> On 05/29/2015 08:16 AM, Christoph Kaminski wrote:
>> freeipa-users-bounces at redhat.com schrieb am 28.05.2015 13:23:26:
>> 
>> > Von: Alexander Frolushkin <Alexander.Frolushkin at megafon.ru> 
>> > An: "'thierry bordaz'" <tbordaz at redhat.com> 
>> > Kopie: "freeipa-users at redhat.com" <freeipa-users at redhat.com> 
>> > Datum: 28.05.2015 13:24 
>> > Betreff: Re: [Freeipa-users] Haunted servers? 
>> > Gesendet von: freeipa-users-bounces at redhat.com 
>> > 
>> > Unfortunately, after a couple of minutes, on two of three servers 
>> > error comes back in little changed form:
>> > # ipa-replica-manage list-ruv
>> > unable to decode: {replica 16}
>> > ....
>> > 
>> > Before cleanruv it looked like:
>> > # ipa-replica-manage list-ruv
>> > unable to decode: {replica 16} 548a8126000000100000 548a8126000000100000
>> > ....
>> > 
>> > And one server seems to be fixed completely.
>> > 
>> > WBR,
>> > Alexander Frolushkin
>> > 
>> > 
>> 
>> we had the same problem (and some more) and yesterday we have successfully cleaned the gohst rid's 
>> 
>> our fix: 
> 
> Hi Christoph,
> 
> THanks for sharing this procedure. This bug is difficult to workaround and that is a good idea to write it down.
> 
>> 
>> 1. stop all cleanallruv Tasks, if it works with ipa-replica-manage abort-clean-ruv. It hasnt worked here. We have done it manually on ALL replicas with: 
>>         a) replica stop 
>>         b) delete all nsds5ReplicaClean from /etc/dirsrv/slapd-HSO/dse.ldif 
>>         c) replica start
> Yes the ability to abort clean ruv hits the same retry issue that cleanallruv. It has been addressed with https://fedorahosted.org/389/ticket/48154
>> 2. prepare on EACH ipa a cleanruv ldif file with ALL ghost rids inside (really ALL from all ipa replicas, we has had some rids only on some replicas...) 
>> Example: 
>> 
>> dn: cn=replica,cn=dc\3Dexample,cn=mapping tree,cn=config 
>> changetype: modify 
>> replace: nsds5task 
>> nsds5task:CLEANRUV11 
>> 
>> dn: cn=replica,cn=dc\3Dexample,cn=mapping tree,cn=config 
>> changetype: modify 
>> replace: nsds5task 
>> nsds5task:CLEANRUV22 
>> 
>> dn: cn=replica,cn=dc\3Dexample,cn=mapping tree,cn=config 
>> changetype: modify 
>> replace: nsds5task 
>> nsds5task:CLEANRUV37 
>> ... 
> 
> It should work but I would prefer to do it in an other order.
> We need to clean a specific RID, on all replica, at the same time. We do not need to clean all RIDs at the same time.
> Having several CLEANRUV in parallel for differents RID should work but I do not know how much it has been tested that way.
> 
> So I would recommend to clean, in parallel on all replicas, RID 11. Then when it is completed, RID 22. Then RID 37.
> 
>> 
>> 3. do a "ldapmodify -h 127.0.0.1 -D "cn=Directory Manager" -W -x -f $your-cleanruv-file.ldif" on all replicas AT THE SAME TIME :) we used terminator  for it (https://launchpad.net/terminator). You can open multiple shell windows inside one window and send to all at the same time the same commands... 
> 
> same remark I would split your-cleanruv-file.ldif into three files cleanruv-11-file.ldif,...
>> 
>> 4. we have done a re-initialize of each IPA from our first master 
> 
> Do you mean a total init ? I do not see a real need for that.
> If you are ready to reinit all replicas, there is a very simple way to get rid of all these ghost RIDs.
> Select the "good" master that is having all the updates
> do a ldif export without the replication data
> do a ldif import of exported file
> do online reinit of the full topology, cascading from the "good" master down to the "consumers"
> Most of the time we try to avoid asking a full reinit of the topology because DB are large.
>> 
>> 5. restart of all replicas 
>> 
>> we are not sure about the point 3 and 4. Maybe they are not necessary, but we have done it. 
>> 
>> If something fails look at defect LDAP entries in whole ldap, we have had some entries with 'nsunique-$HASH' after the 'normal' name. We have deleted them. 
> do you mean entries with 'nsuniqueid' attribute in the RDN. This could be create during replication conflicts when updates are received in parallele on different replicas.
> 
> 
> thanks
> thierry
>> 
>> MfG
>> Christoph Kaminski
> 
> -- 
> Manage your subscription for the Freeipa-users mailing list:
> https://www.redhat.com/mailman/listinfo/freeipa-users
> Go to http://freeipa.org for more info on the project

Looks like I'll be giving this a try. So glad someone else is seeing exactly the same issues.  Hopefully soon we can find the cause.

~J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150529/1931a0fe/attachment.htm>