[Freeipa-users] Antwort: Re: Haunted servers?
Janelle
janellenicole80 at gmail.com
Fri May 29 12:59:47 UTC 2015
>
> On May 29, 2015, at 00:41, thierry bordaz <tbordaz at redhat.com> wrote:
>
>> On 05/29/2015 08:16 AM, Christoph Kaminski wrote:
>> freeipa-users-bounces at redhat.com schrieb am 28.05.2015 13:23:26:
>>
>> > Von: Alexander Frolushkin <Alexander.Frolushkin at megafon.ru>
>> > An: "'thierry bordaz'" <tbordaz at redhat.com>
>> > Kopie: "freeipa-users at redhat.com" <freeipa-users at redhat.com>
>> > Datum: 28.05.2015 13:24
>> > Betreff: Re: [Freeipa-users] Haunted servers?
>> > Gesendet von: freeipa-users-bounces at redhat.com
>> >
>> > Unfortunately, after a couple of minutes, on two of three servers
>> > error comes back in little changed form:
>> > # ipa-replica-manage list-ruv
>> > unable to decode: {replica 16}
>> > ....
>> >
>> > Before cleanruv it looked like:
>> > # ipa-replica-manage list-ruv
>> > unable to decode: {replica 16} 548a8126000000100000 548a8126000000100000
>> > ....
>> >
>> > And one server seems to be fixed completely.
>> >
>> > WBR,
>> > Alexander Frolushkin
>> >
>> >
>>
>> we had the same problem (and some more) and yesterday we have successfully cleaned the gohst rid's
>>
>> our fix:
>
> Hi Christoph,
>
> THanks for sharing this procedure. This bug is difficult to workaround and that is a good idea to write it down.
>
>>
>> 1. stop all cleanallruv Tasks, if it works with ipa-replica-manage abort-clean-ruv. It hasnt worked here. We have done it manually on ALL replicas with:
>> a) replica stop
>> b) delete all nsds5ReplicaClean from /etc/dirsrv/slapd-HSO/dse.ldif
>> c) replica start
> Yes the ability to abort clean ruv hits the same retry issue that cleanallruv. It has been addressed with https://fedorahosted.org/389/ticket/48154
>> 2. prepare on EACH ipa a cleanruv ldif file with ALL ghost rids inside (really ALL from all ipa replicas, we has had some rids only on some replicas...)
>> Example:
>>
>> dn: cn=replica,cn=dc\3Dexample,cn=mapping tree,cn=config
>> changetype: modify
>> replace: nsds5task
>> nsds5task:CLEANRUV11
>>
>> dn: cn=replica,cn=dc\3Dexample,cn=mapping tree,cn=config
>> changetype: modify
>> replace: nsds5task
>> nsds5task:CLEANRUV22
>>
>> dn: cn=replica,cn=dc\3Dexample,cn=mapping tree,cn=config
>> changetype: modify
>> replace: nsds5task
>> nsds5task:CLEANRUV37
>> ...
>
> It should work but I would prefer to do it in an other order.
> We need to clean a specific RID, on all replica, at the same time. We do not need to clean all RIDs at the same time.
> Having several CLEANRUV in parallel for differents RID should work but I do not know how much it has been tested that way.
>
> So I would recommend to clean, in parallel on all replicas, RID 11. Then when it is completed, RID 22. Then RID 37.
>
>>
>> 3. do a "ldapmodify -h 127.0.0.1 -D "cn=Directory Manager" -W -x -f $your-cleanruv-file.ldif" on all replicas AT THE SAME TIME :) we used terminator for it (https://launchpad.net/terminator). You can open multiple shell windows inside one window and send to all at the same time the same commands...
>
> same remark I would split your-cleanruv-file.ldif into three files cleanruv-11-file.ldif,...
>>
>> 4. we have done a re-initialize of each IPA from our first master
>
> Do you mean a total init ? I do not see a real need for that.
> If you are ready to reinit all replicas, there is a very simple way to get rid of all these ghost RIDs.
> Select the "good" master that is having all the updates
> do a ldif export without the replication data
> do a ldif import of exported file
> do online reinit of the full topology, cascading from the "good" master down to the "consumers"
> Most of the time we try to avoid asking a full reinit of the topology because DB are large.
>>
>> 5. restart of all replicas
>>
>> we are not sure about the point 3 and 4. Maybe they are not necessary, but we have done it.
>>
>> If something fails look at defect LDAP entries in whole ldap, we have had some entries with 'nsunique-$HASH' after the 'normal' name. We have deleted them.
> do you mean entries with 'nsuniqueid' attribute in the RDN. This could be create during replication conflicts when updates are received in parallele on different replicas.
>
>
> thanks
> thierry
>>
>> MfG
>> Christoph Kaminski
>
> --
> Manage your subscription for the Freeipa-users mailing list:
> https://www.redhat.com/mailman/listinfo/freeipa-users
> Go to http://freeipa.org for more info on the project
Looks like I'll be giving this a try. So glad someone else is seeing exactly the same issues. Hopefully soon we can find the cause.
~J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20150529/1931a0fe/attachment.htm>
More information about the Freeipa-users
mailing list