[Freeipa-users] Haunted servers?

Alexander Frolushkin Alexander.Frolushkin at megafon.ru
Thu May 28 07:33:16 UTC 2015


Hello!
Thank you for this info.

Things seems to be complicated for now...
We have this:
"unable to decode: {replica 16} 548a8126000000100000  548a8126000000100000"
on all of our 17 servers.
After launching cleanallruv we have it disappeared from 16 servers and one server hangs (any requests addressed ldap just freezes, including "ipactl status"). After dirsrv restart (via "systemctl restart ipa") I found
"unable to decode: {replica 16} 548a8126000000100000  548a8126000000100000"
on this server (and only on it), run cleanallruv and get it from this server, but right after that
unable to decode: {replica 16} 548a8126000000100000  548a8126000000100000
reappeared on three other servers.
Now I'm waiting response from support, they requested dirsrv logs form hanged server and from servers where error appeared again.

WBR,
Alexander Frolushkin
Cell +79232508764
Work +79232507764


-----Original Message-----
From: thierry bordaz [mailto:tbordaz at redhat.com]
Sent: Thursday, May 28, 2015 1:24 PM
To: Alexander Frolushkin (SIB)
Cc: freeipa-users at redhat.com; 'Janelle'
Subject: Re: [Freeipa-users] Haunted servers?

Hello Alexander,

Cleanallruv can hang to do the cleanup (depending on task options and if replica are reachable).
Did you try using CLEANRUV that is a more basic tool but that should not fail to do the cleanup.

Before using cleanruv, you need to abort all cleanallruv pending tasks.

Then for each RID that you want to clean, you have to log on each replica and run
dn: cn=replica,cn=<suffix>,cn=mapping tree,cn=config
changetype: modify
replace: nsds5task
nsds5task:CLEANRUV<RID>

This task should succeeds but there is possibility that a given RID resurects in case a replication session occurs before all cleanRUV are completed.
So we may have to do cleanRUV a second time.

thanks
thierry

On 05/27/2015 11:06 AM, Alexander Frolushkin wrote:
> For common information - we also have a "ghost" replica id:
> unable to decode: {replica 16} 548a8126000000100000
> 548a8126000000100000 and trying to get it away with help of Red Hat support, but at this point - no luck...
>
> WBR,
> Alexander Frolushkin
>
> -----Original Message-----
> From: freeipa-users-bounces at redhat.com
> [mailto:freeipa-users-bounces at redhat.com] On Behalf Of Janelle
> Sent: Tuesday, May 26, 2015 8:56 PM
> To: thierry bordaz; Martin Kosek
> Cc: freeipa-users at redhat.com
> Subject: Re: [Freeipa-users] Haunted servers?
>
> On 5/26/15 7:04 AM, thierry bordaz wrote:
>> On 05/26/2015 08:47 AM, Martin Kosek wrote:
>>> On 05/26/2015 12:20 AM, Janelle wrote:
>>>> On 5/24/15 3:12 AM, Janelle wrote:
>>>>> And just like that, my haunted servers have all returned.
>>>>> I am going to just put a gun to my head and be done with it. :-(
>>>>>
>>>>> Why do things run perfectly and then suddenly ???
>>>>> Logs show little to nothing, mostly because the servers are so
>>>>> busy, they have already rotated out.
>>>>>
>>>>> unable to decode  {replica 16} 55356472000300100000
>>>>> 55356472000300100000
>>>>> unable to decode  {replica 22} 55371e9e000000160000
>>>>> 553eec64000400160000
>>>>> unable to decode  {replica 23} 5545d61f000200170000
>>>>> 55543240000300170000
>>>>> unable to decode  {replica 24} 554d53d3000000180000
>>>>> 554d54a4000200180000
>>>>> unable to decode  {replica 25} 554d78bf000000190000
>>>>> 555af302000400190000
>>>>> unable to decode  {replica 9} 55402c39000300090000
>>>>> 55402c39000300090000
>>>>>
>>>>> Don't know what to do anymore. At my wit's end..
>>>>>
>>>>> ~J
>>>> So things are getting more interesting.  Still trying to find the
>>>> "leaking server(s)".  here is what I mean by that. As you see, I
>>>> continue to find these
>>>> -- BUT, notice a new symptom -- replica 9 does NOT show any other
>>>> data - it is blank?
>>> Hello Janelle,
>>>
>>> Thanks for update. So you worry that there might still be the "rogue
>>> IPA replica" that would be injecting the wrong replica data?
>>>
>>> In any case, I bet Ludwig and Thierry will follow up with your
>>> thread, there is just delay caused by the various public holidays
>>> and PTOs this week and we need to rest before digging into the fun
>>> with RUVs - as you already know yourself :-)
>>>
>>>> unable to decode  {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode  {replica 22} 55371e9e000000160000
>>>> 553eec64000400160000
>>>> unable to decode  {replica 24} 554d53d3000100180000
>>>> 554d54a4000200180000
>>>> unable to decode  {replica 25} 554d78bf000200190000
>>>> 555af302000400190000
>>>> unable to decode  {replica 9}
>>>>
>>>> Now, if I delete these from a server using the ldapmodify method -
>>>> they go away briefly, but then if I restart the server, they come
>>>> back.
>>>>
>>>> Let me try to explain -- given a number of servers, say 8, if I
>>>> user ldapmodify to delete from 1 of those, they seem to go away
>>>> from maybe 4 of them
>>>> -- but if
>>>> I wait a few minutes, it is almost as though "replication" is
>>>> re-adding these bad replicas from the servers that I have NOT
>>>> deleted them from.
>> On each replica (master/replica) there are one RUV in the database
>> and one RUV in the changelog.
>> When cleanallruv succeeds it clears both of them. All replica should
>> be reachable when you issue cleanallruv, so that it can clean the
>> RUVs on all the replicas in almost "single"
>> operation. If some replica are not reachable, they keep information
>> of about the cleaned RID and then can later propagate those "old" RID
>> to the rest of the replica.
>>
>> Ludwig managed to reproduce the issue with a quite complex test case
>> (3 replicas and multiple cleanallruv).
>> We have not yet identified the reason how a cleaned replicaId can get
>> resurrected.
>> In parallel we just reproduced it without a clear test case but in a
>> 2 replica topology.
>>
>>
>>>> So my question is simple - is there something in the logs I can
>>>> look for that would indicate the SOURCE of these bogus entries?  Is
>>>> the replica 9 with NO extra data any indication of something I
>>>> could look for?
>> I guess that if I have the answer to your question we would have
>> understood the bug ..
>>
>>
> A little more information to go on:
>
> I changed my password on a master (actually, the original master) and was able to login to each replica within a few seconds with the new password. This tells me replication is working across all the servers.
> I also created a new account and it showed up on all the servers, again within 15-20 seconds.  This tells me replication is working just fine.
>
> I don't understand why the cleanallruv does not process across all the servers the same way. Baffling indeed.
>
> Perhaps the most important question -- does these bogus entries actually cause a problem? I mean they don't seem to be. What if I just ignored them?
>
> ~J
>
> --
> Manage your subscription for the Freeipa-users mailing list:
> https://www.redhat.com/mailman/listinfo/freeipa-users
> Go to http://freeipa.org for more info on the project
>
> ________________________________
>
> Информация в этом сообщении предназначена исключительно для конкретных лиц, которым она адресована. В сообщении может содержаться конфиденциальная информация, которая не может быть раскрыта или использована кем-либо, кроме адресатов. Если вы не адресат этого сообщения, то использование, переадресация, копирование или распространение содержания сообщения или его части незаконно и запрещено. Если Вы получили это сообщение ошибочно, пожалуйста, незамедлительно сообщите отправителю об этом и удалите со всем содержимым само сообщение и любые возможные его копии и приложения.
>
> The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. The contents may not be disclosed or used by anyone other than the addressee. If you are not the intended recipient(s), any use, disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it is prohibited and may be unlawful. If you have received this communication in error please notify us immediately by responding to this email and then delete the e-mail and all attachments and any copies thereof.
>
> (c)20mf50
>


________________________________

Информация в этом сообщении предназначена исключительно для конкретных лиц, которым она адресована. В сообщении может содержаться конфиденциальная информация, которая не может быть раскрыта или использована кем-либо, кроме адресатов. Если вы не адресат этого сообщения, то использование, переадресация, копирование или распространение содержания сообщения или его части незаконно и запрещено. Если Вы получили это сообщение ошибочно, пожалуйста, незамедлительно сообщите отправителю об этом и удалите со всем содержимым само сообщение и любые возможные его копии и приложения.

The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. The contents may not be disclosed or used by anyone other than the addressee. If you are not the intended recipient(s), any use, disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it is prohibited and may be unlawful. If you have received this communication in error please notify us immediately by responding to this email and then delete the e-mail and all attachments and any copies thereof.

(c)20mf50




More information about the Freeipa-users mailing list