[Freeipa-devel] [PATCH] 1050 prevent replica orphans

Martin Kosek mkosek at redhat.com
Mon Sep 17 15:59:20 UTC 2012


On 09/17/2012 04:06 PM, Martin Kosek wrote:
> On 09/14/2012 09:16 PM, Rob Crittenden wrote:
>> Martin Kosek wrote:
>>> On 09/10/2012 08:34 PM, Rob Crittenden wrote:
>>>> Martin Kosek wrote:
>>>>> On Thu, 2012-09-06 at 17:22 -0400, Rob Crittenden wrote:
>>>>>> Martin Kosek wrote:
>>>>>>> On 08/31/2012 07:40 PM, Rob Crittenden wrote:
>>>>>>>> Rob Crittenden wrote:
>>>>>>>>> It was possible use ipa-replica-manage connect/disconnect/del to end up
>>>>>>>>> orphaning or or more IPA masters. This is an attempt to catch and
>>>>>>>>> prevent that case.
>>>>>>>>>
>>>>>>>>> I tested with this topology, trying to delete B.
>>>>>>>>>
>>>>>>>>> A <-> B <-> C
>>>>>>>>>
>>>>>>>>> I got here by creating B and C from A, connecting B to C then deleting
>>>>>>>>> the link from A to B, so it went from A -> B and A -> C to the above.
>>>>>>>>>
>>>>>>>>> What I do is look up the servers that the delete candidate host has
>>>>>>>>> connections to and see if we're the last link.
>>>>>>>>>
>>>>>>>>> I added an escape clause if there are only two masters.
>>>>>>>>>
>>>>>>>>> rob
>>>>>>>>
>>>>>>>> Oh, this relies on my cleanruv patch 1031.
>>>>>>>>
>>>>>>>> rob
>>>>>>>>
>>>>>>>
>>>>>>> 1) When I run ipa-replica-manage del --force to an already uninstalled host,
>>>>>>> the new code will prevent me the deletation because it cannot connect to
>>>>>>> it. It
>>>>>>> also crashes with UnboundLocalError:
>>>>>>>
>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>>>>>>>
>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing removal
>>>>>>> Traceback (most recent call last):
>>>>>>>      File "/sbin/ipa-replica-manage", line 708, in <module>
>>>>>>>        main()
>>>>>>>      File "/sbin/ipa-replica-manage", line 677, in main
>>>>>>>        del_master(realm, args[1], options)
>>>>>>>      File "/sbin/ipa-replica-manage", line 476, in del_master
>>>>>>>        sys.exit("Failed read master data from '%s': %s" % (delrepl.hostname,
>>>>>>> str(e)))
>>>>>>> UnboundLocalError: local variable 'delrepl' referenced before assignment
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I also hit this error when removing a winsync replica.
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2) As I wrote before, I think having --force option override the user
>>>>>>> inquiries
>>>>>>> would benefit test automation:
>>>>>>>
>>>>>>> +            if not ipautil.user_input("Continue to delete?", False):
>>>>>>> +                sys.exit("Aborted")
>>>>>>
>>>>>> Fixed.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 3) I don't think this code won't cover this topology:
>>>>>>>
>>>>>>> A - B - C - D - E
>>>>>>>
>>>>>>> It would allow you deleting a replica C even though it would separate A-B
>>>>>>> and
>>>>>>> D-E. Though we may not want to cover this situation now, what you got is
>>>>>>> definitely helping.
>>>>>>
>>>>>> I think you may be right. I only tested with 4 servers. With this B and
>>>>>> D would both still have 2 agreements so wouldn't be covered by the last
>>>>>> link test.
>>>>>
>>>>> Everything looks good now, so ACK. We just need to push it along with
>>>>> CLEANALLRUV patch.
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> Not to look a gift ACK In the mouth but here is a revised patch. I've added a
>>>> cleanup routine to remove an orphaned master properly. If you had tried the
>>>> mechanism I outlined in the man page it would have worked but was
>>>> less-than-complete. This way is better, just don't try it on a live master.
>>>>
>>>> I also added a cleanruv abort command, in case you want to kill an existing
>>>> task.
>>>>
>>>> rob
>>>>
>>>
>>> 1) CLEANRUV stuff should be in your patch 1031 and not here (but I will comment
>>> on the code in this mail since it is in this patch)
>>>
>>>
>>> 2) In new command defitinion:
>>>
>>> +    "abort-clean-ruv":(1, 1, "Replica ID of to abort cleaning", ""),
>>>
>>> I miss error message in case REPLICA_ID is not passed, this way error message
>>> when I omit the parameter is confusing:
>>>
>>> # ipa-replica-manage abort-clean-ruv
>>> Usage: ipa-replica-manage [options]
>>>
>>> ipa-replica-manage: error: must provide a command [force-sync | clean-ruv |
>>> disconnect | connect | list-ruv | del | re-initialize | list | abort-clean-ruv]
>>>
>>> On another note, I am thinking about the new command(s). Since we now have
>>> abort-clean-ruv command, we may want to also implement list-clean-ruv commands
>>> in the future to see all CLEANALLRUV commands in process... Or we may enhance
>>> list-ruv to write a flag like [CLEANALLRUV in process] for RUV's for which
>>> CLEANALLRUV task is in process.
>>>
>>>
>>> 3) Will clean-ruv - abort-clean-ruv - clean-ruv sequence work? If the aborted
>>> CLEANALLRUV task stays in DIT, we may not be able to enter new CLEANALLRUV task
>>> because we always use the same DN...
>>>
>>> Btw. did abort CLEANALLRUV command worked for you? Mine seemed to be stuck on
>>> replicas that are down just like CLEANALLRUV command. I had then paralelly
>>> running CLEANALLRUV and ABORT CLEANALLRUV running for the same RUV ID. Then, it
>>> is unclear to me what this command is actually good for.
>>>
>>>
>>> 4) Man page now contains non-existent command:
>>>
>>> --- a/install/tools/man/ipa-replica-manage.1
>>> +++ b/install/tools/man/ipa-replica-manage.1
>>> @@ -42,12 +42,18 @@ Manages the replication agreements of an IPA server.
>>>   \fBforce\-sync\fR
>>>   \- Immediately flush any data to be replicated from a server specified with
>>> the \-\-from option
>>>   .TP
>>> +\fBcleanup\fR
>>> +\- Remove leftover references to a deleted master.
>>> +.TP
>>>
>>>
>>> The cleanup procedure was implemented rather as an option to the del command
>>> than a separate command.
>>>
>>>
>>> 5) My favorite - new cleanup procedure user prompt miss the --force option
>>> useful for test automation
>>>
>>> +            if not ipautil.user_input("Continue to clean master?", False):
>>> +                sys.exit("Cleanup aborted")
>>>
>>>
>>> Otherwise the patch looks good.
>>>
>>> Martin
>>>
>>
>> I pulled the abort-ruv stuff out. It was just easier to stuff it in here than
>> rebasing back, but yeah, its cleaner this way.
>>
>> No need to check force for clean because it is already required (can't contact
>> the deleted master, it's gone).
>>
>> rob
> 
> The patch works fine now. So ACK when the dependent patch 1031 is pushed.
> 
> Martin
> 

Pushed to master, ipa-3-0.

Martin




More information about the Freeipa-devel mailing list