[Freeipa-users] Haunted servers?
Janelle
janellenicole80 at gmail.com
Tue May 26 14:56:16 UTC 2015
On 5/26/15 7:04 AM, thierry bordaz wrote:
> On 05/26/2015 08:47 AM, Martin Kosek wrote:
>> On 05/26/2015 12:20 AM, Janelle wrote:
>>> On 5/24/15 3:12 AM, Janelle wrote:
>>>> And just like that, my haunted servers have all returned.
>>>> I am going to just put a gun to my head and be done with it. :-(
>>>>
>>>> Why do things run perfectly and then suddenly ???
>>>> Logs show little to nothing, mostly because the servers are so
>>>> busy, they
>>>> have already rotated out.
>>>>
>>>> unable to decode {replica 16} 55356472000300100000
>>>> 55356472000300100000
>>>> unable to decode {replica 22} 55371e9e000000160000
>>>> 553eec64000400160000
>>>> unable to decode {replica 23} 5545d61f000200170000
>>>> 55543240000300170000
>>>> unable to decode {replica 24} 554d53d3000000180000
>>>> 554d54a4000200180000
>>>> unable to decode {replica 25} 554d78bf000000190000
>>>> 555af302000400190000
>>>> unable to decode {replica 9} 55402c39000300090000
>>>> 55402c39000300090000
>>>>
>>>> Don't know what to do anymore. At my wit's end..
>>>>
>>>> ~J
>>> So things are getting more interesting. Still trying to find the
>>> "leaking
>>> server(s)". here is what I mean by that. As you see, I continue to
>>> find these
>>> -- BUT, notice a new symptom -- replica 9 does NOT show any other
>>> data - it is
>>> blank?
>>
>> Hello Janelle,
>>
>> Thanks for update. So you worry that there might still be the "rogue
>> IPA replica" that would be injecting the wrong replica data?
>>
>> In any case, I bet Ludwig and Thierry will follow up with your
>> thread, there is just delay caused by the various public holidays and
>> PTOs this week and we need to rest before digging into the fun with
>> RUVs - as you already know yourself :-)
>>
>>> unable to decode {replica 16} 55356472000300100000
>>> 55356472000300100000
>>> unable to decode {replica 22} 55371e9e000000160000
>>> 553eec64000400160000
>>> unable to decode {replica 24} 554d53d3000100180000
>>> 554d54a4000200180000
>>> unable to decode {replica 25} 554d78bf000200190000
>>> 555af302000400190000
>>> unable to decode {replica 9}
>>>
>>> Now, if I delete these from a server using the ldapmodify method -
>>> they go away
>>> briefly, but then if I restart the server, they come back.
>>>
>>> Let me try to explain -- given a number of servers, say 8, if I user
>>> ldapmodify
>>> to delete from 1 of those, they seem to go away from maybe 4 of them
>>> -- but if
>>> I wait a few minutes, it is almost as though "replication" is
>>> re-adding these
>>> bad replicas from the servers that I have NOT deleted them from.
>
> On each replica (master/replica) there are one RUV in the database and
> one RUV in the changelog.
> When cleanallruv succeeds it clears both of them. All replica should
> be reachable when you issue cleanallruv, so that
> it can clean the RUVs on all the replicas in almost "single"
> operation. If some replica are not reachable, they keep
> information of about the cleaned RID and then can later propagate
> those "old" RID to the rest of the replica.
>
> Ludwig managed to reproduce the issue with a quite complex test case
> (3 replicas and multiple cleanallruv).
> We have not yet identified the reason how a cleaned replicaId can get
> resurrected.
> In parallel we just reproduced it without a clear test case but in a 2
> replica topology.
>
>
>>>
>>> So my question is simple - is there something in the logs I can look
>>> for that
>>> would indicate the SOURCE of these bogus entries? Is the replica 9
>>> with NO
>>> extra data any indication of something I could look for?
>
> I guess that if I have the answer to your question we would have
> understood the bug ..
>
>
A little more information to go on:
I changed my password on a master (actually, the original master) and
was able to login to each replica within a few seconds with the new
password. This tells me replication is working across all the servers.
I also created a new account and it showed up on all the servers, again
within 15-20 seconds. This tells me replication is working just fine.
I don't understand why the cleanallruv does not process across all the
servers the same way. Baffling indeed.
Perhaps the most important question -- does these bogus entries actually
cause a problem? I mean they don't seem to be. What if I just ignored them?
~J
More information about the Freeipa-users
mailing list