[Freeipa-devel] [PATCH] 1031 run cleanallruv task

Mon Sep 17 14:11:38 UTC 2012

On 09/17/2012 04:04 PM, Rob Crittenden wrote:
> Martin Kosek wrote:
>> On 09/14/2012 09:17 PM, Rob Crittenden wrote:
>>> Martin Kosek wrote:
>>>> On 09/06/2012 11:17 PM, Rob Crittenden wrote:
>>>>> Martin Kosek wrote:
>>>>>> On 09/06/2012 05:55 PM, Rob Crittenden wrote:
>>>>>>> Rob Crittenden wrote:
>>>>>>>> Rob Crittenden wrote:
>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>> On 09/05/2012 08:06 PM, Rob Crittenden wrote:
>>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>>> On 07/05/2012 08:39 PM, Rob Crittenden wrote:
>>>>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
>>>>>>>>>>>>>>>> Deleting a replica can leave a replication vector (RUV) on the
>>>>>>>>>>>>>>>> other servers.
>>>>>>>>>>>>>>>> This can confuse things if the replica is re-added, and it also
>>>>>>>>>>>>>>>> causes the
>>>>>>>>>>>>>>>> server to calculate changes against a server that may no longer
>>>>>>>>>>>>>>>> exist.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 389-ds-base provides a new task that self-propogates itself to all
>>>>>>>>>>>>>>>> available
>>>>>>>>>>>>>>>> replicas to clean this RUV data.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This patch will create this task at deletion time to hopefully
>>>>>>>>>>>>>>>> clean things up.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It isn't perfect. If any replica is down or unavailable at the
>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> cleanruv task fires, and then comes back up, the old RUV data
>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>> re-propogated around.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To make things easier in this case I've added two new commands to
>>>>>>>>>>>>>>>> ipa-replica-manage. The first lists the replication ids of all the
>>>>>>>>>>>>>>>> servers we
>>>>>>>>>>>>>>>> have a RUV for. Using this you can call clean_ruv with the
>>>>>>>>>>>>>>>> replication id of a
>>>>>>>>>>>>>>>> server that no longer exists to try the cleanallruv step again.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is quite dangerous though. If you run cleanruv against a
>>>>>>>>>>>>>>>> replica id that
>>>>>>>>>>>>>>>> does exist it can cause a loss of data. I believe I've put in
>>>>>>>>>>>>>>>> enough scary
>>>>>>>>>>>>>>>> warnings about this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Good work there, this should make cleaning RUVs much easier than
>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>> previous version.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is what I found during review:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>> it would
>>>>>>>>>>>>>>> help if we for example have all info for commands indented. This
>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>> user could
>>>>>>>>>>>>>>> simply over-look the new commands in the man page.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) I would rename new commands to clean-ruv and list-ruv to make
>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>> consistent with the rest of the commands (re-initialize,
>>>>>>>>>>>>>>> force-sync).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 3) It would be nice to be able to run clean_ruv command in an
>>>>>>>>>>>>>>> unattended way
>>>>>>>>>>>>>>> (for better testing), i.e. respect --force option as we already
>>>>>>>>>>>>>>> do for
>>>>>>>>>>>>>>> ipa-replica-manage del. This fix would aid test automation in the
>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4) (minor) The new question (and the del too) does not react too
>>>>>>>>>>>>>>> well for
>>>>>>>>>>>>>>> CTRL+D:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv 3 --force
>>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>>> Continue to clean? [no]: unexpected error:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 5) Help for clean_ruv command without a required parameter is quite
>>>>>>>>>>>>>>> confusing
>>>>>>>>>>>>>>> as it reports that command is wrong and not the parameter:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv
>>>>>>>>>>>>>>> Usage: ipa-replica-manage [options]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ipa-replica-manage: error: must provide a command [clean_ruv |
>>>>>>>>>>>>>>> force-sync |
>>>>>>>>>>>>>>> disconnect | connect | del | re-initialize | list | list_ruv]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It seems you just forgot to specify the error message in the
>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 6) When the remote replica is down, the clean_ruv command fails
>>>>>>>>>>>>>>> with an
>>>>>>>>>>>>>>> unexpected error:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [root at vm-086 ~]# ipa-replica-manage clean_ruv 5
>>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>>> Continue to clean? [no]: y
>>>>>>>>>>>>>>> unexpected error: {'desc': 'Operations error'}
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>> cleanAllRUV_task: failed
>>>>>>>>>>>>>>> to connect to repl        agreement connection
>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> tree,cn=config), error 105
>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>> cleanAllRUV_task: replica
>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.
>>>>>>>>>>>>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> tree,   cn=config) has not been cleaned.  You will need to rerun
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> CLEANALLRUV task on this replica.
>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>> cleanAllRUV_task: Task
>>>>>>>>>>>>>>> failed (1)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In this case I think we should inform user that the command failed,
>>>>>>>>>>>>>>> possibly
>>>>>>>>>>>>>>> because of disconnected replicas and that they could enable the
>>>>>>>>>>>>>>> replicas and
>>>>>>>>>>>>>>> try again.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 7) (minor) "pass" is now redundant in replication.py:
>>>>>>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>>>>>>>> +            # We can't make the server we're removing read-only
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>>>>>>>> +            root_logger.debug("No permission to switch replica to
>>>>>>>>>>>>>>> read-only,
>>>>>>>>>>>>>>> continuing anyway")
>>>>>>>>>>>>>>> +            pass
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think this addresses everything.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, almost there! I just found one more issue which needs to be
>>>>>>>>>>>>> fixed
>>>>>>>>>>>>> before we push:
>>>>>>>>>>>>>
>>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>>>>>>>>>>>>> Directory Manager password:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing
>>>>>>>>>>>>> removal
>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>>> "Can't
>>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>>>>>
>>>>>>>>>>>>> There were issues removing a connection: %d format: a number is
>>>>>>>>>>>>> required, not str
>>>>>>>>>>>>>
>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>>> "Can't
>>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a traceback I retrieved:
>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>        File "/sbin/ipa-replica-manage", line 425, in del_master
>>>>>>>>>>>>>          del_link(realm, r, hostname, options.dirman_passwd,
>>>>>>>>>>>>> force=True)
>>>>>>>>>>>>>        File "/sbin/ipa-replica-manage", line 271, in del_link
>>>>>>>>>>>>>          repl1.cleanallruv(replica_id)
>>>>>>>>>>>>>        File
>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>>>>>>>>>> line 1094, in cleanallruv
>>>>>>>>>>>>>          root_logger.debug("Creating CLEANALLRUV task for replica id
>>>>>>>>>>>>> %d" %
>>>>>>>>>>>>> replicaId)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The problem here is that you don't convert replica_id to int in this
>>>>>>>>>>>>> part:
>>>>>>>>>>>>> +    replica_id = None
>>>>>>>>>>>>> +    if repl2:
>>>>>>>>>>>>> +        replica_id = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>>>>>> +    else:
>>>>>>>>>>>>> +        servers = get_ruv(realm, replica1, dirman_passwd)
>>>>>>>>>>>>> +        for (netloc, rid) in servers:
>>>>>>>>>>>>> +            if netloc.startswith(replica2):
>>>>>>>>>>>>> +                replica_id = rid
>>>>>>>>>>>>> +                break
>>>>>>>>>>>>>
>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Updated patch using new mechanism in 389-ds-base. This should more
>>>>>>>>>>>> thoroughly clean out RUV data when a replica is being deleted, and
>>>>>>>>>>>> provide for a way to delete RUV data afterwards too if necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> rob
>>>>>>>>>>>
>>>>>>>>>>> Rebased patch
>>>>>>>>>>>
>>>>>>>>>>> rob
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 0) As I wrote in a review for your patch 1041, changelog entry slipped
>>>>>>>>>> elsewhere.
>>>>>>>>>>
>>>>>>>>>> 1) The following KeyboardInterrupt except class looks suspicious. I
>>>>>>>>>> know why
>>>>>>>>>> you have it there, but since it is generally a bad thing to do, some
>>>>>>>>>> comment
>>>>>>>>>> why it is needed would be useful.
>>>>>>>>>>
>>>>>>>>>> @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2,
>>>>>>>>>> dirman_passwd,
>>>>>>>>>> force=False):
>>>>>>>>>>          repl1.delete_agreement(replica2)
>>>>>>>>>>          repl1.delete_referral(replica2)
>>>>>>>>>>
>>>>>>>>>> +    if type1 == replication.IPA_REPLICA:
>>>>>>>>>> +        if repl2:
>>>>>>>>>> +            ruv = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>>> +        else:
>>>>>>>>>> +            ruv = get_ruv_by_host(realm, replica1, replica2,
>>>>>>>>>> dirman_passwd)
>>>>>>>>>> +
>>>>>>>>>> +        try:
>>>>>>>>>> +            repl1.cleanallruv(ruv)
>>>>>>>>>> +        except KeyboardInterrupt:
>>>>>>>>>> +            pass
>>>>>>>>>> +
>>>>>>>>>>
>>>>>>>>>> Maybe you just wanted to do some cleanup and then "raise" again?
>>>>>>>>>
>>>>>>>>> No, it is there because it is safe to break out of it. The task will
>>>>>>>>> continue to run. I added some verbiage.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2) This is related to 1), but when some replica is down,
>>>>>>>>>> "ipa-replica-manage
>>>>>>>>>> del" may wait indefinitely when some remote replica is down, right?
>>>>>>>>>>
>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>>> Deleting a master is irreversible.
>>>>>>>>>> To reconnect to the remote master you will need to prepare a new
>>>>>>>>>> replica file
>>>>>>>>>> and re-install.
>>>>>>>>>> Continue to delete? [no]: y
>>>>>>>>>> ipa: INFO: Setting agreement
>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> tree,cn=config
>>>>>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica
>>>>>>>>>> acquired
>>>>>>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>>>>>>> Background task created to clean replication data
>>>>>>>>>>
>>>>>>>>>> ... after about a minute I hit CTRL+C
>>>>>>>>>>
>>>>>>>>>> ^CDeleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
>>>>>>>>>> 'vm-055.idm.lab.bos.redhat.com'
>>>>>>>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>>> does not
>>>>>>>>>> contain 'vm-055.idm.lab.bos.redhat.com.'
>>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>>
>>>>>>>>>> I think it would be better to inform user that some remote replica is
>>>>>>>>>> down or
>>>>>>>>>> at least that we are waiting for the task to complete. Something like
>>>>>>>>>> that:
>>>>>>>>>>
>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>>> ...
>>>>>>>>>> Background task created to clean replication data
>>>>>>>>>> Replication data clean up may take very long time if some replica is
>>>>>>>>>> unreachable
>>>>>>>>>> Hit CTRL+C to interrupt the wait
>>>>>>>>>> ^C Clean up wait interrupted
>>>>>>>>>> ....
>>>>>>>>>> [continue with del]
>>>>>>>>>
>>>>>>>>> Yup, did this in #1.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 3) (minor) When there is a cleanruv task running and you run
>>>>>>>>>> "ipa-replica-manage del", there is a unexpected error message with
>>>>>>>>>> duplicate
>>>>>>>>>> task object in LDAP:
>>>>>>>>>>
>>>>>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force
>>>>>>>>>> Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing
>>>>>>>>>> removal
>>>>>>>>>> FAIL
>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
>>>>>>>>>> contact LDAP server"}
>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>>
>>>>>>>>>> There were issues removing a connection: This entry already exists
>>>>>>>>>> <<<<<<<<<
>>>>>>>>>>
>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
>>>>>>>>>> contact LDAP server"}
>>>>>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>>> does not
>>>>>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I think it should be enough to just catch for "entry already exists" in
>>>>>>>>>> cleanallruv function, and in such case print a relevant error message
>>>>>>>>>> bail out.
>>>>>>>>>> Thus, self.conn.checkTask(dn, dowait=True) would not be called too.
>>>>>>>>>
>>>>>>>>> Good catch, fixed.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 4) (minor): In make_readonly function, there is a redundant "pass"
>>>>>>>>>> statement:
>>>>>>>>>>
>>>>>>>>>> +    def make_readonly(self):
>>>>>>>>>> +        """
>>>>>>>>>> +        Make the current replication agreement read-only.
>>>>>>>>>> +        """
>>>>>>>>>> +        dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
>>>>>>>>>> +                ('cn', 'plugins'), ('cn', 'config'))
>>>>>>>>>> +
>>>>>>>>>> +        mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
>>>>>>>>>> +        try:
>>>>>>>>>> +            self.conn.modify_s(dn, mod)
>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>>> +            # We can't make the server we're removing read-only but
>>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>>> +            root_logger.debug("No permission to switch replica to
>>>>>>>>>> read-only,
>>>>>>>>>> continuing anyway")
>>>>>>>>>> +            pass         <<<<<<<<<<<<<<<
>>>>>>>>>
>>>>>>>>> Yeah, this is one of my common mistakes. I put in a pass initially, then
>>>>>>>>> add logging in front of it and forget to delete the pass. Its gone now.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 5) In clean_ruv, I think allowing a --force option to bypass the
>>>>>>>>>> user_input
>>>>>>>>>> would be helpful (at least for test automation):
>>>>>>>>>>
>>>>>>>>>> +    if not ipautil.user_input("Continue to clean?", False):
>>>>>>>>>> +        sys.exit("Aborted")
>>>>>>>>>
>>>>>>>>> Yup, added.
>>>>>>>>>
>>>>>>>>> rob
>>>>>>>>
>>>>>>>> Slightly revised patch. I still had a window open with one unsaved change.
>>>>>>>>
>>>>>>>> rob
>>>>>>>>
>>>>>>>
>>>>>>> Apparently there were two unsaved changes, one of which was lost. This
>>>>>>> adds in
>>>>>>> the 'entry already exists' fix.
>>>>>>>
>>>>>>> rob
>>>>>>>
>>>>>>
>>>>>> Just one last thing (otherwise the patch is OK) - I don't think this is
>>>>>> what we
>>>>>> want :-)
>>>>>>
>>>>>> # ipa-replica-manage clean-ruv 8
>>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>>
>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>> longer replicate so it may miss updates while the process
>>>>>> is running. It would need to be re-initialized to maintain
>>>>>> consistency. Be very careful.
>>>>>> Continue to clean? [no]: y   <<<<<<
>>>>>> Aborted
>>>>>>
>>>>>>
>>>>>> Nor this exception, (your are checking for wrong exception):
>>>>>>
>>>>>> # ipa-replica-manage clean-ruv 8
>>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>>
>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>> longer replicate so it may miss updates while the process
>>>>>> is running. It would need to be re-initialized to maintain
>>>>>> consistency. Be very careful.
>>>>>> Continue to clean? [no]:
>>>>>> unexpected error: This entry already exists
>>>>>>
>>>>>> This is the exception:
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>      File "/sbin/ipa-replica-manage", line 651, in <module>
>>>>>>        main()
>>>>>>      File "/sbin/ipa-replica-manage", line 648, in main
>>>>>>        clean_ruv(realm, args[1], options)
>>>>>>      File "/sbin/ipa-replica-manage", line 373, in clean_ruv
>>>>>>        thisrepl.cleanallruv(ruv)
>>>>>>      File
>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>>> line 1136, in cleanallruv
>>>>>>        self.conn.addEntry(e)
>>>>>>      File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
>>>>>> 503, in
>>>>>> addEntry
>>>>>>        self.__handle_errors(e, arg_desc=arg_desc)
>>>>>>      File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
>>>>>> 321, in
>>>>>> __handle_errors
>>>>>>        raise errors.DuplicateEntry()
>>>>>> ipalib.errors.DuplicateEntry: This entry already exists
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>
>>>>> Fixed that and a couple of other problems. When doing a disconnect we should
>>>>> not also call clean-ruv.
>>>>
>>>> Ah, good self-catch.
>>>>
>>>>>
>>>>> I also got tired of seeing crappy error messages so I added a little convert
>>>>> utility.
>>>>>
>>>>> rob
>>>>
>>>> 1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are also
>>>> some finding for this new code.
>>>>
>>>>
>>>> 2) We may want to bump Requires to higher version of 389-ds-base
>>>> (389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync bug I
>>>> found earlier.
>>>>
>>>>
>>>> 3) I just discovered another suspicious behavior. When we are deleting a
>>>> master
>>>> that has links also to other master(s) we delete those too. But we also
>>>> automatically run CLEANALLRUV in these cases, so we may end up in multiple
>>>> tasks being started on different masters - this does not look right.
>>>>
>>>> I think we may rather want to at first delete all links and then run
>>>> CLEANALLRUV task, just for one time. This is what I get with current code:
>>>>
>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com
>>>> Directory Manager password:
>>>>
>>>> Deleting a master is irreversible.
>>>> To reconnect to the remote master you will need to prepare a new replica file
>>>> and re-install.
>>>> Continue to delete? [no]: yes
>>>> ipa: INFO: Setting agreement
>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>
>>>>
>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>
>>>>
>>>> tree,cn=config
>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>> Background task created to clean replication data. This may take a while.
>>>> This may be safely interrupted with Ctrl+C
>>>>
>>>> ^CWait for task interrupted. It will continue to run in the background
>>>>
>>>> Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to
>>>> 'vm-072.idm.lab.bos.redhat.com'
>>>> ipa: INFO: Setting agreement
>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>
>>>>
>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>
>>>>
>>>> tree,cn=config
>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>> Background task created to clean replication data. This may take a while.
>>>> This may be safely interrupted with Ctrl+C
>>>>
>>>> ^CWait for task interrupted. It will continue to run in the background
>>>>
>>>> Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
>>>> 'vm-072.idm.lab.bos.redhat.com'
>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record does
>>>> not
>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>>> You may need to manually remove them from the tree
>>>>
>>>> Martin
>>>>
>>>
>>> All issues addressed and I pulled in abort-clean-ruv from 1050. I added a
>>> list-clean-ruv command as well.
>>>
>>> rob
>>
>> 1) Patch 1031-9 needs to get squashed with 1031-8
>>
>>
>> 2) Patch needs a rebase (conflict in freeipa.spec.in)
>>
>>
>> 3) New list-clean-ruv man entry is not right:
>>
>>         list-clean-ruv [REPLICATION_ID]
>>                - List all running CLEANALLRUV and abort CLEANALLRUV tasks.
>>
>> REPLICATION_ID is not its argument.
> 
> Fixed 1-3.
> 
>> Btw. new list-clean-ruv command proved very useful for me.
>>
>> 4) I just found out we need to do a better job with make_readonly() command. I
>> get into trouble when disconnecting one link to a remote replica as it was
>> marked readonly and then I was then unable to manage the disconnected replica
>> properly (vm-072 is the replica made readonly):
> 
> Ok, I reset read-only after we delete the agreements. That fixed things up for
> me. I disconnected a replica and was able to modify entries on that replica
> afterwards.
> 
> This affected the --cleanup command too, it would otherwise have succeeded I
> think.
> 
> I tested with an A - B - C - A agreement loop. I disconnected A and C and
> confirmed I could still update entries on C. Then I deleted C, then B, and made
> sure output looked right, I could still manage entries, etc.
> 
> rob
> 
>>
>> [root at vm-055 ~]# ipa-replica-manage disconnect vm-072.idm.lab.bos.redhat.com
>>
>> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>> Deleting a master is irreversible.
>> To reconnect to the remote master you will need to prepare a new replica file
>> and re-install.
>> Continue to delete? [no]: yes
>> Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and
>> vm-072.idm.lab.bos.redhat.com
>> ipa: INFO: Setting agreement
>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>
>> tree,cn=config schedule to 2358-2359 0 to force synch
>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>
>> tree,cn=config
>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>> successfully: Incremental update succeeded: start: 0: end: 0
>> Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to
>> 'vm-055.idm.lab.bos.redhat.com'
>> Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com from
>> vm-072.idm.lab.bos.redhat.com.
>> Background task created to clean replication data. This may take a while.
>> This may be safely interrupted with Ctrl+C
>> ^CWait for task interrupted. It will continue to run in the background
>>
>> Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is unwilling to
>> perform: database is read-only arguments:
>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>>
>>
>> You may need to manually remove them from the tree
>> ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc':
>> 'Server is unwilling to perform'}
>>
>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is
>> unwilling to perform: database is read-only
>>
>> You may need to manually remove them from the tree
>>
>>
>> --cleanup did not work for me as well:
>> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>> --cleanup
>> Cleaning a master is irreversible.
>> This should not normally be require, so use cautiously.
>> Continue to clean master? [no]: yes
>> unexpected error: Server is unwilling to perform: database is read-only
>> arguments:
>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>>
>>
>> Martin
>>
> 

I think you sent a wrong patch...

Martin