[Freeipa-devel] [PATCH] 1031 run cleanallruv task

Mon Sep 17 14:04:31 UTC 2012

Martin Kosek wrote:
> On 09/14/2012 09:17 PM, Rob Crittenden wrote:
>> Martin Kosek wrote:
>>> On 09/06/2012 11:17 PM, Rob Crittenden wrote:
>>>> Martin Kosek wrote:
>>>>> On 09/06/2012 05:55 PM, Rob Crittenden wrote:
>>>>>> Rob Crittenden wrote:
>>>>>>> Rob Crittenden wrote:
>>>>>>>> Martin Kosek wrote:
>>>>>>>>> On 09/05/2012 08:06 PM, Rob Crittenden wrote:
>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>> On 07/05/2012 08:39 PM, Rob Crittenden wrote:
>>>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
>>>>>>>>>>>>>>> Deleting a replica can leave a replication vector (RUV) on the
>>>>>>>>>>>>>>> other servers.
>>>>>>>>>>>>>>> This can confuse things if the replica is re-added, and it also
>>>>>>>>>>>>>>> causes the
>>>>>>>>>>>>>>> server to calculate changes against a server that may no longer
>>>>>>>>>>>>>>> exist.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 389-ds-base provides a new task that self-propogates itself to all
>>>>>>>>>>>>>>> available
>>>>>>>>>>>>>>> replicas to clean this RUV data.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch will create this task at deletion time to hopefully
>>>>>>>>>>>>>>> clean things up.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It isn't perfect. If any replica is down or unavailable at the
>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> cleanruv task fires, and then comes back up, the old RUV data
>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>> re-propogated around.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To make things easier in this case I've added two new commands to
>>>>>>>>>>>>>>> ipa-replica-manage. The first lists the replication ids of all the
>>>>>>>>>>>>>>> servers we
>>>>>>>>>>>>>>> have a RUV for. Using this you can call clean_ruv with the
>>>>>>>>>>>>>>> replication id of a
>>>>>>>>>>>>>>> server that no longer exists to try the cleanallruv step again.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is quite dangerous though. If you run cleanruv against a
>>>>>>>>>>>>>>> replica id that
>>>>>>>>>>>>>>> does exist it can cause a loss of data. I believe I've put in
>>>>>>>>>>>>>>> enough scary
>>>>>>>>>>>>>>> warnings about this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Good work there, this should make cleaning RUVs much easier than
>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>> previous version.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is what I found during review:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>> it would
>>>>>>>>>>>>>> help if we for example have all info for commands indented. This
>>>>>>>>>>>>>> way
>>>>>>>>>>>>>> user could
>>>>>>>>>>>>>> simply over-look the new commands in the man page.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) I would rename new commands to clean-ruv and list-ruv to make
>>>>>>>>>>>>>> them
>>>>>>>>>>>>>> consistent with the rest of the commands (re-initialize,
>>>>>>>>>>>>>> force-sync).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 3) It would be nice to be able to run clean_ruv command in an
>>>>>>>>>>>>>> unattended way
>>>>>>>>>>>>>> (for better testing), i.e. respect --force option as we already
>>>>>>>>>>>>>> do for
>>>>>>>>>>>>>> ipa-replica-manage del. This fix would aid test automation in the
>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4) (minor) The new question (and the del too) does not react too
>>>>>>>>>>>>>> well for
>>>>>>>>>>>>>> CTRL+D:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv 3 --force
>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>> Continue to clean? [no]: unexpected error:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 5) Help for clean_ruv command without a required parameter is quite
>>>>>>>>>>>>>> confusing
>>>>>>>>>>>>>> as it reports that command is wrong and not the parameter:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv
>>>>>>>>>>>>>> Usage: ipa-replica-manage [options]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ipa-replica-manage: error: must provide a command [clean_ruv |
>>>>>>>>>>>>>> force-sync |
>>>>>>>>>>>>>> disconnect | connect | del | re-initialize | list | list_ruv]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems you just forgot to specify the error message in the
>>>>>>>>>>>>>> command
>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 6) When the remote replica is down, the clean_ruv command fails
>>>>>>>>>>>>>> with an
>>>>>>>>>>>>>> unexpected error:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root at vm-086 ~]# ipa-replica-manage clean_ruv 5
>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>> Continue to clean? [no]: y
>>>>>>>>>>>>>> unexpected error: {'desc': 'Operations error'}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>> cleanAllRUV_task: failed
>>>>>>>>>>>>>> to connect to repl        agreement connection
>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> tree,cn=config), error 105
>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>> cleanAllRUV_task: replica
>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.
>>>>>>>>>>>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> tree,   cn=config) has not been cleaned.  You will need to rerun
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> CLEANALLRUV task on this replica.
>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>> cleanAllRUV_task: Task
>>>>>>>>>>>>>> failed (1)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In this case I think we should inform user that the command failed,
>>>>>>>>>>>>>> possibly
>>>>>>>>>>>>>> because of disconnected replicas and that they could enable the
>>>>>>>>>>>>>> replicas and
>>>>>>>>>>>>>> try again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 7) (minor) "pass" is now redundant in replication.py:
>>>>>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>>>>>>> +            # We can't make the server we're removing read-only
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>>>>>>> +            root_logger.debug("No permission to switch replica to
>>>>>>>>>>>>>> read-only,
>>>>>>>>>>>>>> continuing anyway")
>>>>>>>>>>>>>> +            pass
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think this addresses everything.
>>>>>>>>>>>>>
>>>>>>>>>>>>> rob
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, almost there! I just found one more issue which needs to be
>>>>>>>>>>>> fixed
>>>>>>>>>>>> before we push:
>>>>>>>>>>>>
>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>>>>>>>>>>>> Directory Manager password:
>>>>>>>>>>>>
>>>>>>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing
>>>>>>>>>>>> removal
>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>> "Can't
>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>>>>
>>>>>>>>>>>> There were issues removing a connection: %d format: a number is
>>>>>>>>>>>> required, not str
>>>>>>>>>>>>
>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>> "Can't
>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>>
>>>>>>>>>>>> This is a traceback I retrieved:
>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>        File "/sbin/ipa-replica-manage", line 425, in del_master
>>>>>>>>>>>>          del_link(realm, r, hostname, options.dirman_passwd, force=True)
>>>>>>>>>>>>        File "/sbin/ipa-replica-manage", line 271, in del_link
>>>>>>>>>>>>          repl1.cleanallruv(replica_id)
>>>>>>>>>>>>        File
>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>>>>>>>>> line 1094, in cleanallruv
>>>>>>>>>>>>          root_logger.debug("Creating CLEANALLRUV task for replica id
>>>>>>>>>>>> %d" %
>>>>>>>>>>>> replicaId)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The problem here is that you don't convert replica_id to int in this
>>>>>>>>>>>> part:
>>>>>>>>>>>> +    replica_id = None
>>>>>>>>>>>> +    if repl2:
>>>>>>>>>>>> +        replica_id = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>>>>> +    else:
>>>>>>>>>>>> +        servers = get_ruv(realm, replica1, dirman_passwd)
>>>>>>>>>>>> +        for (netloc, rid) in servers:
>>>>>>>>>>>> +            if netloc.startswith(replica2):
>>>>>>>>>>>> +                replica_id = rid
>>>>>>>>>>>> +                break
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Updated patch using new mechanism in 389-ds-base. This should more
>>>>>>>>>>> thoroughly clean out RUV data when a replica is being deleted, and
>>>>>>>>>>> provide for a way to delete RUV data afterwards too if necessary.
>>>>>>>>>>>
>>>>>>>>>>> rob
>>>>>>>>>>
>>>>>>>>>> Rebased patch
>>>>>>>>>>
>>>>>>>>>> rob
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 0) As I wrote in a review for your patch 1041, changelog entry slipped
>>>>>>>>> elsewhere.
>>>>>>>>>
>>>>>>>>> 1) The following KeyboardInterrupt except class looks suspicious. I
>>>>>>>>> know why
>>>>>>>>> you have it there, but since it is generally a bad thing to do, some
>>>>>>>>> comment
>>>>>>>>> why it is needed would be useful.
>>>>>>>>>
>>>>>>>>> @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2,
>>>>>>>>> dirman_passwd,
>>>>>>>>> force=False):
>>>>>>>>>          repl1.delete_agreement(replica2)
>>>>>>>>>          repl1.delete_referral(replica2)
>>>>>>>>>
>>>>>>>>> +    if type1 == replication.IPA_REPLICA:
>>>>>>>>> +        if repl2:
>>>>>>>>> +            ruv = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>> +        else:
>>>>>>>>> +            ruv = get_ruv_by_host(realm, replica1, replica2,
>>>>>>>>> dirman_passwd)
>>>>>>>>> +
>>>>>>>>> +        try:
>>>>>>>>> +            repl1.cleanallruv(ruv)
>>>>>>>>> +        except KeyboardInterrupt:
>>>>>>>>> +            pass
>>>>>>>>> +
>>>>>>>>>
>>>>>>>>> Maybe you just wanted to do some cleanup and then "raise" again?
>>>>>>>>
>>>>>>>> No, it is there because it is safe to break out of it. The task will
>>>>>>>> continue to run. I added some verbiage.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2) This is related to 1), but when some replica is down,
>>>>>>>>> "ipa-replica-manage
>>>>>>>>> del" may wait indefinitely when some remote replica is down, right?
>>>>>>>>>
>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>> Deleting a master is irreversible.
>>>>>>>>> To reconnect to the remote master you will need to prepare a new
>>>>>>>>> replica file
>>>>>>>>> and re-install.
>>>>>>>>> Continue to delete? [no]: y
>>>>>>>>> ipa: INFO: Setting agreement
>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> tree,cn=config
>>>>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica
>>>>>>>>> acquired
>>>>>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>>>>>> Background task created to clean replication data
>>>>>>>>>
>>>>>>>>> ... after about a minute I hit CTRL+C
>>>>>>>>>
>>>>>>>>> ^CDeleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
>>>>>>>>> 'vm-055.idm.lab.bos.redhat.com'
>>>>>>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>> does not
>>>>>>>>> contain 'vm-055.idm.lab.bos.redhat.com.'
>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>
>>>>>>>>> I think it would be better to inform user that some remote replica is
>>>>>>>>> down or
>>>>>>>>> at least that we are waiting for the task to complete. Something like
>>>>>>>>> that:
>>>>>>>>>
>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>> ...
>>>>>>>>> Background task created to clean replication data
>>>>>>>>> Replication data clean up may take very long time if some replica is
>>>>>>>>> unreachable
>>>>>>>>> Hit CTRL+C to interrupt the wait
>>>>>>>>> ^C Clean up wait interrupted
>>>>>>>>> ....
>>>>>>>>> [continue with del]
>>>>>>>>
>>>>>>>> Yup, did this in #1.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 3) (minor) When there is a cleanruv task running and you run
>>>>>>>>> "ipa-replica-manage del", there is a unexpected error message with
>>>>>>>>> duplicate
>>>>>>>>> task object in LDAP:
>>>>>>>>>
>>>>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force
>>>>>>>>> Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing
>>>>>>>>> removal
>>>>>>>>> FAIL
>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
>>>>>>>>> contact LDAP server"}
>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>
>>>>>>>>> There were issues removing a connection: This entry already exists
>>>>>>>>> <<<<<<<<<
>>>>>>>>>
>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
>>>>>>>>> contact LDAP server"}
>>>>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>> does not
>>>>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think it should be enough to just catch for "entry already exists" in
>>>>>>>>> cleanallruv function, and in such case print a relevant error message
>>>>>>>>> bail out.
>>>>>>>>> Thus, self.conn.checkTask(dn, dowait=True) would not be called too.
>>>>>>>>
>>>>>>>> Good catch, fixed.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 4) (minor): In make_readonly function, there is a redundant "pass"
>>>>>>>>> statement:
>>>>>>>>>
>>>>>>>>> +    def make_readonly(self):
>>>>>>>>> +        """
>>>>>>>>> +        Make the current replication agreement read-only.
>>>>>>>>> +        """
>>>>>>>>> +        dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
>>>>>>>>> +                ('cn', 'plugins'), ('cn', 'config'))
>>>>>>>>> +
>>>>>>>>> +        mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
>>>>>>>>> +        try:
>>>>>>>>> +            self.conn.modify_s(dn, mod)
>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>> +            # We can't make the server we're removing read-only but
>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>> +            root_logger.debug("No permission to switch replica to
>>>>>>>>> read-only,
>>>>>>>>> continuing anyway")
>>>>>>>>> +            pass         <<<<<<<<<<<<<<<
>>>>>>>>
>>>>>>>> Yeah, this is one of my common mistakes. I put in a pass initially, then
>>>>>>>> add logging in front of it and forget to delete the pass. Its gone now.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 5) In clean_ruv, I think allowing a --force option to bypass the
>>>>>>>>> user_input
>>>>>>>>> would be helpful (at least for test automation):
>>>>>>>>>
>>>>>>>>> +    if not ipautil.user_input("Continue to clean?", False):
>>>>>>>>> +        sys.exit("Aborted")
>>>>>>>>
>>>>>>>> Yup, added.
>>>>>>>>
>>>>>>>> rob
>>>>>>>
>>>>>>> Slightly revised patch. I still had a window open with one unsaved change.
>>>>>>>
>>>>>>> rob
>>>>>>>
>>>>>>
>>>>>> Apparently there were two unsaved changes, one of which was lost. This
>>>>>> adds in
>>>>>> the 'entry already exists' fix.
>>>>>>
>>>>>> rob
>>>>>>
>>>>>
>>>>> Just one last thing (otherwise the patch is OK) - I don't think this is
>>>>> what we
>>>>> want :-)
>>>>>
>>>>> # ipa-replica-manage clean-ruv 8
>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>
>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>> longer replicate so it may miss updates while the process
>>>>> is running. It would need to be re-initialized to maintain
>>>>> consistency. Be very careful.
>>>>> Continue to clean? [no]: y   <<<<<<
>>>>> Aborted
>>>>>
>>>>>
>>>>> Nor this exception, (your are checking for wrong exception):
>>>>>
>>>>> # ipa-replica-manage clean-ruv 8
>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>
>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>> longer replicate so it may miss updates while the process
>>>>> is running. It would need to be re-initialized to maintain
>>>>> consistency. Be very careful.
>>>>> Continue to clean? [no]:
>>>>> unexpected error: This entry already exists
>>>>>
>>>>> This is the exception:
>>>>>
>>>>> Traceback (most recent call last):
>>>>>      File "/sbin/ipa-replica-manage", line 651, in <module>
>>>>>        main()
>>>>>      File "/sbin/ipa-replica-manage", line 648, in main
>>>>>        clean_ruv(realm, args[1], options)
>>>>>      File "/sbin/ipa-replica-manage", line 373, in clean_ruv
>>>>>        thisrepl.cleanallruv(ruv)
>>>>>      File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>> line 1136, in cleanallruv
>>>>>        self.conn.addEntry(e)
>>>>>      File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 503, in
>>>>> addEntry
>>>>>        self.__handle_errors(e, arg_desc=arg_desc)
>>>>>      File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line 321, in
>>>>> __handle_errors
>>>>>        raise errors.DuplicateEntry()
>>>>> ipalib.errors.DuplicateEntry: This entry already exists
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> Fixed that and a couple of other problems. When doing a disconnect we should
>>>> not also call clean-ruv.
>>>
>>> Ah, good self-catch.
>>>
>>>>
>>>> I also got tired of seeing crappy error messages so I added a little convert
>>>> utility.
>>>>
>>>> rob
>>>
>>> 1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are also
>>> some finding for this new code.
>>>
>>>
>>> 2) We may want to bump Requires to higher version of 389-ds-base
>>> (389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync bug I
>>> found earlier.
>>>
>>>
>>> 3) I just discovered another suspicious behavior. When we are deleting a master
>>> that has links also to other master(s) we delete those too. But we also
>>> automatically run CLEANALLRUV in these cases, so we may end up in multiple
>>> tasks being started on different masters - this does not look right.
>>>
>>> I think we may rather want to at first delete all links and then run
>>> CLEANALLRUV task, just for one time. This is what I get with current code:
>>>
>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com
>>> Directory Manager password:
>>>
>>> Deleting a master is irreversible.
>>> To reconnect to the remote master you will need to prepare a new replica file
>>> and re-install.
>>> Continue to delete? [no]: yes
>>> ipa: INFO: Setting agreement
>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>
>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>
>>> tree,cn=config
>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>> successfully: Incremental update succeeded: start: 0: end: 0
>>> Background task created to clean replication data. This may take a while.
>>> This may be safely interrupted with Ctrl+C
>>>
>>> ^CWait for task interrupted. It will continue to run in the background
>>>
>>> Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to
>>> 'vm-072.idm.lab.bos.redhat.com'
>>> ipa: INFO: Setting agreement
>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>
>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>
>>> tree,cn=config
>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>> successfully: Incremental update succeeded: start: 0: end: 0
>>> Background task created to clean replication data. This may take a while.
>>> This may be safely interrupted with Ctrl+C
>>>
>>> ^CWait for task interrupted. It will continue to run in the background
>>>
>>> Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
>>> 'vm-072.idm.lab.bos.redhat.com'
>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record does not
>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>> You may need to manually remove them from the tree
>>>
>>> Martin
>>>
>>
>> All issues addressed and I pulled in abort-clean-ruv from 1050. I added a
>> list-clean-ruv command as well.
>>
>> rob
>
> 1) Patch 1031-9 needs to get squashed with 1031-8
>
>
> 2) Patch needs a rebase (conflict in freeipa.spec.in)
>
>
> 3) New list-clean-ruv man entry is not right:
>
>         list-clean-ruv [REPLICATION_ID]
>                - List all running CLEANALLRUV and abort CLEANALLRUV tasks.
>
> REPLICATION_ID is not its argument.

Fixed 1-3.

> Btw. new list-clean-ruv command proved very useful for me.
>
> 4) I just found out we need to do a better job with make_readonly() command. I
> get into trouble when disconnecting one link to a remote replica as it was
> marked readonly and then I was then unable to manage the disconnected replica
> properly (vm-072 is the replica made readonly):

Ok, I reset read-only after we delete the agreements. That fixed things 
up for me. I disconnected a replica and was able to modify entries on 
that replica afterwards.

This affected the --cleanup command too, it would otherwise have 
succeeded I think.

I tested with an A - B - C - A agreement loop. I disconnected A and C 
and confirmed I could still update entries on C. Then I deleted C, then 
B, and made sure output looked right, I could still manage entries, etc.

rob

>
> [root at vm-055 ~]# ipa-replica-manage disconnect vm-072.idm.lab.bos.redhat.com
>
> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
> Deleting a master is irreversible.
> To reconnect to the remote master you will need to prepare a new replica file
> and re-install.
> Continue to delete? [no]: yes
> Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and
> vm-072.idm.lab.bos.redhat.com
> ipa: INFO: Setting agreement
> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
> tree,cn=config schedule to 2358-2359 0 to force synch
> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
> tree,cn=config
> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
> successfully: Incremental update succeeded: start: 0: end: 0
> Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to
> 'vm-055.idm.lab.bos.redhat.com'
> Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com from
> vm-072.idm.lab.bos.redhat.com.
> Background task created to clean replication data. This may take a while.
> This may be safely interrupted with Ctrl+C
> ^CWait for task interrupted. It will continue to run in the background
>
> Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is unwilling to
> perform: database is read-only arguments:
> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>
> You may need to manually remove them from the tree
> ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc':
> 'Server is unwilling to perform'}
>
> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is
> unwilling to perform: database is read-only
>
> You may need to manually remove them from the tree
>
>
> --cleanup did not work for me as well:
> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
> --cleanup
> Cleaning a master is irreversible.
> This should not normally be require, so use cautiously.
> Continue to clean master? [no]: yes
> unexpected error: Server is unwilling to perform: database is read-only
> arguments:
> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>
> Martin
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-rcrit-1031-10-cleanruv.patch
Type: text/x-diff
Size: 9525 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20120917/cadc8413/attachment.bin>