[Freeipa-users] Haunted servers?

Janelle janellenicole80 at gmail.com
Tue May 26 14:56:16 UTC 2015


On 5/26/15 7:04 AM, thierry bordaz wrote:
> On 05/26/2015 08:47 AM, Martin Kosek wrote:
>> On 05/26/2015 12:20 AM, Janelle wrote:
>>> On 5/24/15 3:12 AM, Janelle wrote:
>>>> And just like that, my haunted servers have all returned.
>>>> I am going to just put a gun to my head and be done with it. :-(
>>>>
>>>> Why do things run perfectly and then suddenly ???
>>>> Logs show little to nothing, mostly because the servers are so 
>>>> busy, they
>>>> have already rotated out.
>>>>
>>>> unable to decode  {replica 16} 55356472000300100000 
>>>> 55356472000300100000
>>>> unable to decode  {replica 22} 55371e9e000000160000 
>>>> 553eec64000400160000
>>>> unable to decode  {replica 23} 5545d61f000200170000 
>>>> 55543240000300170000
>>>> unable to decode  {replica 24} 554d53d3000000180000 
>>>> 554d54a4000200180000
>>>> unable to decode  {replica 25} 554d78bf000000190000 
>>>> 555af302000400190000
>>>> unable to decode  {replica 9} 55402c39000300090000 
>>>> 55402c39000300090000
>>>>
>>>> Don't know what to do anymore. At my wit's end..
>>>>
>>>> ~J
>>> So things are getting more interesting.  Still trying to find the 
>>> "leaking
>>> server(s)".  here is what I mean by that. As you see, I continue to 
>>> find these
>>> -- BUT, notice a new symptom -- replica 9 does NOT show any other 
>>> data - it is
>>> blank?
>>
>> Hello Janelle,
>>
>> Thanks for update. So you worry that there might still be the "rogue 
>> IPA replica" that would be injecting the wrong replica data?
>>
>> In any case, I bet Ludwig and Thierry will follow up with your 
>> thread, there is just delay caused by the various public holidays and 
>> PTOs this week and we need to rest before digging into the fun with 
>> RUVs - as you already know yourself :-)
>>
>>> unable to decode  {replica 16} 55356472000300100000 
>>> 55356472000300100000
>>> unable to decode  {replica 22} 55371e9e000000160000 
>>> 553eec64000400160000
>>> unable to decode  {replica 24} 554d53d3000100180000 
>>> 554d54a4000200180000
>>> unable to decode  {replica 25} 554d78bf000200190000 
>>> 555af302000400190000
>>> unable to decode  {replica 9}
>>>
>>> Now, if I delete these from a server using the ldapmodify method - 
>>> they go away
>>> briefly, but then if I restart the server, they come back.
>>>
>>> Let me try to explain -- given a number of servers, say 8, if I user 
>>> ldapmodify
>>> to delete from 1 of those, they seem to go away from maybe 4 of them 
>>> -- but if
>>> I wait a few minutes, it is almost as though "replication" is 
>>> re-adding these
>>> bad replicas from the servers that I have NOT deleted them from.
>
> On each replica (master/replica) there are one RUV in the database and 
> one RUV in the changelog.
> When cleanallruv succeeds it clears both of them. All replica should 
> be reachable when you issue cleanallruv, so that
> it can clean the RUVs on all the replicas in almost "single" 
> operation. If some replica are not reachable, they keep
> information of about the cleaned RID and then can later propagate 
> those "old" RID to the rest of the replica.
>
> Ludwig managed to reproduce the issue with a quite complex test case 
> (3 replicas and multiple cleanallruv).
> We have not yet identified the reason how a cleaned replicaId can get 
> resurrected.
> In parallel we just reproduced it without a clear test case but in a 2 
> replica topology.
>
>
>>>
>>> So my question is simple - is there something in the logs I can look 
>>> for that
>>> would indicate the SOURCE of these bogus entries?  Is the replica 9 
>>> with NO
>>> extra data any indication of something I could look for?
>
> I guess that if I have the answer to your question we would have 
> understood the bug ..
>
>
A little more information to go on:

I changed my password on a master (actually, the original master) and 
was able to login to each replica within a few seconds with the new 
password. This tells me replication is working across all the servers.  
I also created a new account and it showed up on all the servers, again 
within 15-20 seconds.  This tells me replication is working just fine.

I don't understand why the cleanallruv does not process across all the 
servers the same way. Baffling indeed.

Perhaps the most important question -- does these bogus entries actually 
cause a problem? I mean they don't seem to be. What if I just ignored them?

~J




More information about the Freeipa-users mailing list