[Freeipa-users] Haunted servers?

Janelle janellenicole80 at gmail.com
Thu May 28 01:26:49 UTC 2015


On 5/26/15 7:04 AM, thierry bordaz wrote:
> On 05/26/2015 08:47 AM, Martin Kosek wrote:
>> On 05/26/2015 12:20 AM, Janelle wrote:
>>> On 5/24/15 3:12 AM, Janelle wrote:
>>>> And just like that, my haunted servers have all returned.
>>>> I am going to just put a gun to my head and be done with it. :-(
>>>>
>>>> Why do things run perfectly and then suddenly ???
>>>> Logs show little to nothing, mostly because the servers are so 
>>>> busy, they
>>>> have already rotated out.
>>>>
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 22} 55371e9e000000160000 
>>>> 553eec64000400160000
>>>> unable to decode  {replica 23} 5545d61f000200170000 
>>>> 55543240000300170000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> unable to decode  {replica 25} 554d78bf000000190000 
>>>> 555af302000400190000
>>>> unable to decode  {replica 9} 55402c39000300090000 
>>>> 55402c39000300090000
>>>>
>>>> Don't know what to do anymore. At my wit's end..
>>>>
>>>> ~J
>>> So things are getting more interesting.  Still trying to find the 
>>> "leaking
>>> server(s)".  here is what I mean by that. As you see, I continue to 
>>> find these
>>> -- BUT, notice a new symptom -- replica 9 does NOT show any other 
>>> data - it is
>>> blank?
>>
>> Hello Janelle,
>>
>> Thanks for update. So you worry that there might still be the "rogue 
>> IPA replica" that would be injecting the wrong replica data?
>>
>> In any case, I bet Ludwig and Thierry will follow up with your 
>> thread, there is just delay caused by the various public holidays and 
>> PTOs this week and we need to rest before digging into the fun with 
>> RUVs - as you already know yourself :-)
>>
>>> unable to decode  {replica 16} 55356472000300100000 
>>> 55356472000300100000
>>> unable to decode  {replica 22} 55371e9e000000160000 
>>> 553eec64000400160000
>>> unable to decode  {replica 24} 554d53d3000100180000 
>>> 554d54a4000200180000
>>> unable to decode  {replica 25} 554d78bf000200190000 
>>> 555af302000400190000
>>> unable to decode  {replica 9}
>>>
>>> Now, if I delete these from a server using the ldapmodify method - 
>>> they go away
>>> briefly, but then if I restart the server, they come back.
>>>
>>> Let me try to explain -- given a number of servers, say 8, if I user 
>>> ldapmodify
>>> to delete from 1 of those, they seem to go away from maybe 4 of them 
>>> -- but if
>>> I wait a few minutes, it is almost as though "replication" is 
>>> re-adding these
>>> bad replicas from the servers that I have NOT deleted them from.
>
> On each replica (master/replica) there are one RUV in the database and 
> one RUV in the changelog.
> When cleanallruv succeeds it clears both of them. All replica should 
> be reachable when you issue cleanallruv, so that
> it can clean the RUVs on all the replicas in almost "single" 
> operation. If some replica are not reachable, they keep
> information of about the cleaned RID and then can later propagate 
> those "old" RID to the rest of the replica.
>
> Ludwig managed to reproduce the issue with a quite complex test case 
> (3 replicas and multiple cleanallruv).
> We have not yet identified the reason how a cleaned replicaId can get 
> resurrected.
> In parallel we just reproduced it without a clear test case but in a 2 
> replica topology.
>

After spending well over 2 days trying to clean things -- I am now here:

CLEANALLRUV tasks
RID 16  Not all replicas finished cleaning, retrying in 14400 seconds
RID 19  None
RID 22  None

What is going on here? All the same data still exists as shown above in 
the original thread, but I seem to be stuck. I know I am not the only 
person having replica issues. Is there anything else I can try?

~J




More information about the Freeipa-users mailing list