[Freeipa-devel] [PATCH] 1031 run cleanallruv task

Mon Sep 17 14:15:30 UTC 2012

Martin Kosek wrote:
> On 09/17/2012 04:04 PM, Rob Crittenden wrote:
>> Martin Kosek wrote:
>>> On 09/14/2012 09:17 PM, Rob Crittenden wrote:
>>>> Martin Kosek wrote:
>>>>> On 09/06/2012 11:17 PM, Rob Crittenden wrote:
>>>>>> Martin Kosek wrote:
>>>>>>> On 09/06/2012 05:55 PM, Rob Crittenden wrote:
>>>>>>>> Rob Crittenden wrote:
>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>> On 09/05/2012 08:06 PM, Rob Crittenden wrote:
>>>>>>>>>>>> Rob Crittenden wrote:
>>>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>>>> On 07/05/2012 08:39 PM, Rob Crittenden wrote:
>>>>>>>>>>>>>>> Martin Kosek wrote:
>>>>>>>>>>>>>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
>>>>>>>>>>>>>>>>> Deleting a replica can leave a replication vector (RUV) on the
>>>>>>>>>>>>>>>>> other servers.
>>>>>>>>>>>>>>>>> This can confuse things if the replica is re-added, and it also
>>>>>>>>>>>>>>>>> causes the
>>>>>>>>>>>>>>>>> server to calculate changes against a server that may no longer
>>>>>>>>>>>>>>>>> exist.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 389-ds-base provides a new task that self-propogates itself to all
>>>>>>>>>>>>>>>>> available
>>>>>>>>>>>>>>>>> replicas to clean this RUV data.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This patch will create this task at deletion time to hopefully
>>>>>>>>>>>>>>>>> clean things up.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It isn't perfect. If any replica is down or unavailable at the
>>>>>>>>>>>>>>>>> time
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> cleanruv task fires, and then comes back up, the old RUV data
>>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>>> re-propogated around.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> To make things easier in this case I've added two new commands to
>>>>>>>>>>>>>>>>> ipa-replica-manage. The first lists the replication ids of all the
>>>>>>>>>>>>>>>>> servers we
>>>>>>>>>>>>>>>>> have a RUV for. Using this you can call clean_ruv with the
>>>>>>>>>>>>>>>>> replication id of a
>>>>>>>>>>>>>>>>> server that no longer exists to try the cleanallruv step again.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is quite dangerous though. If you run cleanruv against a
>>>>>>>>>>>>>>>>> replica id that
>>>>>>>>>>>>>>>>> does exist it can cause a loss of data. I believe I've put in
>>>>>>>>>>>>>>>>> enough scary
>>>>>>>>>>>>>>>>> warnings about this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Good work there, this should make cleaning RUVs much easier than
>>>>>>>>>>>>>>>> with the
>>>>>>>>>>>>>>>> previous version.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is what I found during review:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I
>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>> it would
>>>>>>>>>>>>>>>> help if we for example have all info for commands indented. This
>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>> user could
>>>>>>>>>>>>>>>> simply over-look the new commands in the man page.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) I would rename new commands to clean-ruv and list-ruv to make
>>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>>> consistent with the rest of the commands (re-initialize,
>>>>>>>>>>>>>>>> force-sync).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 3) It would be nice to be able to run clean_ruv command in an
>>>>>>>>>>>>>>>> unattended way
>>>>>>>>>>>>>>>> (for better testing), i.e. respect --force option as we already
>>>>>>>>>>>>>>>> do for
>>>>>>>>>>>>>>>> ipa-replica-manage del. This fix would aid test automation in the
>>>>>>>>>>>>>>>> future.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 4) (minor) The new question (and the del too) does not react too
>>>>>>>>>>>>>>>> well for
>>>>>>>>>>>>>>>> CTRL+D:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv 3 --force
>>>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>>>> Continue to clean? [no]: unexpected error:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 5) Help for clean_ruv command without a required parameter is quite
>>>>>>>>>>>>>>>> confusing
>>>>>>>>>>>>>>>> as it reports that command is wrong and not the parameter:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> # ipa-replica-manage clean_ruv
>>>>>>>>>>>>>>>> Usage: ipa-replica-manage [options]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ipa-replica-manage: error: must provide a command [clean_ruv |
>>>>>>>>>>>>>>>> force-sync |
>>>>>>>>>>>>>>>> disconnect | connect | del | re-initialize | list | list_ruv]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It seems you just forgot to specify the error message in the
>>>>>>>>>>>>>>>> command
>>>>>>>>>>>>>>>> definition
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 6) When the remote replica is down, the clean_ruv command fails
>>>>>>>>>>>>>>>> with an
>>>>>>>>>>>>>>>> unexpected error:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [root at vm-086 ~]# ipa-replica-manage clean_ruv 5
>>>>>>>>>>>>>>>> Clean the Replication Update Vector for
>>>>>>>>>>>>>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>>>>>>>>>>> longer replicate so it may miss updates while the process
>>>>>>>>>>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>>>>>>>>>>> consistency. Be very careful.
>>>>>>>>>>>>>>>> Continue to clean? [no]: y
>>>>>>>>>>>>>>>> unexpected error: {'desc': 'Operations error'}
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>>> cleanAllRUV_task: failed
>>>>>>>>>>>>>>>> to connect to repl        agreement connection
>>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> tree,cn=config), error 105
>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>>> cleanAllRUV_task: replica
>>>>>>>>>>>>>>>> (cn=meTovm-055.idm.lab.
>>>>>>>>>>>>>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> tree,   cn=config) has not been cleaned.  You will need to rerun
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> CLEANALLRUV task on this replica.
>>>>>>>>>>>>>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>>>>>>>>>>>>>> cleanAllRUV_task: Task
>>>>>>>>>>>>>>>> failed (1)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In this case I think we should inform user that the command failed,
>>>>>>>>>>>>>>>> possibly
>>>>>>>>>>>>>>>> because of disconnected replicas and that they could enable the
>>>>>>>>>>>>>>>> replicas and
>>>>>>>>>>>>>>>> try again.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 7) (minor) "pass" is now redundant in replication.py:
>>>>>>>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>>>>>>>>> +            # We can't make the server we're removing read-only
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>>>>>>>>> +            root_logger.debug("No permission to switch replica to
>>>>>>>>>>>>>>>> read-only,
>>>>>>>>>>>>>>>> continuing anyway")
>>>>>>>>>>>>>>>> +            pass
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think this addresses everything.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rob
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks, almost there! I just found one more issue which needs to be
>>>>>>>>>>>>>> fixed
>>>>>>>>>>>>>> before we push:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>>>>>>>>>>>>>> Directory Manager password:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing
>>>>>>>>>>>>>> removal
>>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>>>> "Can't
>>>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There were issues removing a connection: %d format: a number is
>>>>>>>>>>>>>> required, not str
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc':
>>>>>>>>>>>>>> "Can't
>>>>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is a traceback I retrieved:
>>>>>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>>>>>         File "/sbin/ipa-replica-manage", line 425, in del_master
>>>>>>>>>>>>>>           del_link(realm, r, hostname, options.dirman_passwd,
>>>>>>>>>>>>>> force=True)
>>>>>>>>>>>>>>         File "/sbin/ipa-replica-manage", line 271, in del_link
>>>>>>>>>>>>>>           repl1.cleanallruv(replica_id)
>>>>>>>>>>>>>>         File
>>>>>>>>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>>>>>>>>>>> line 1094, in cleanallruv
>>>>>>>>>>>>>>           root_logger.debug("Creating CLEANALLRUV task for replica id
>>>>>>>>>>>>>> %d" %
>>>>>>>>>>>>>> replicaId)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The problem here is that you don't convert replica_id to int in this
>>>>>>>>>>>>>> part:
>>>>>>>>>>>>>> +    replica_id = None
>>>>>>>>>>>>>> +    if repl2:
>>>>>>>>>>>>>> +        replica_id = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>>>>>>> +    else:
>>>>>>>>>>>>>> +        servers = get_ruv(realm, replica1, dirman_passwd)
>>>>>>>>>>>>>> +        for (netloc, rid) in servers:
>>>>>>>>>>>>>> +            if netloc.startswith(replica2):
>>>>>>>>>>>>>> +                replica_id = rid
>>>>>>>>>>>>>> +                break
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Martin
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Updated patch using new mechanism in 389-ds-base. This should more
>>>>>>>>>>>>> thoroughly clean out RUV data when a replica is being deleted, and
>>>>>>>>>>>>> provide for a way to delete RUV data afterwards too if necessary.
>>>>>>>>>>>>>
>>>>>>>>>>>>> rob
>>>>>>>>>>>>
>>>>>>>>>>>> Rebased patch
>>>>>>>>>>>>
>>>>>>>>>>>> rob
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 0) As I wrote in a review for your patch 1041, changelog entry slipped
>>>>>>>>>>> elsewhere.
>>>>>>>>>>>
>>>>>>>>>>> 1) The following KeyboardInterrupt except class looks suspicious. I
>>>>>>>>>>> know why
>>>>>>>>>>> you have it there, but since it is generally a bad thing to do, some
>>>>>>>>>>> comment
>>>>>>>>>>> why it is needed would be useful.
>>>>>>>>>>>
>>>>>>>>>>> @@ -256,6 +263,17 @@ def del_link(realm, replica1, replica2,
>>>>>>>>>>> dirman_passwd,
>>>>>>>>>>> force=False):
>>>>>>>>>>>           repl1.delete_agreement(replica2)
>>>>>>>>>>>           repl1.delete_referral(replica2)
>>>>>>>>>>>
>>>>>>>>>>> +    if type1 == replication.IPA_REPLICA:
>>>>>>>>>>> +        if repl2:
>>>>>>>>>>> +            ruv = repl2._get_replica_id(repl2.conn, None)
>>>>>>>>>>> +        else:
>>>>>>>>>>> +            ruv = get_ruv_by_host(realm, replica1, replica2,
>>>>>>>>>>> dirman_passwd)
>>>>>>>>>>> +
>>>>>>>>>>> +        try:
>>>>>>>>>>> +            repl1.cleanallruv(ruv)
>>>>>>>>>>> +        except KeyboardInterrupt:
>>>>>>>>>>> +            pass
>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> Maybe you just wanted to do some cleanup and then "raise" again?
>>>>>>>>>>
>>>>>>>>>> No, it is there because it is safe to break out of it. The task will
>>>>>>>>>> continue to run. I added some verbiage.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2) This is related to 1), but when some replica is down,
>>>>>>>>>>> "ipa-replica-manage
>>>>>>>>>>> del" may wait indefinitely when some remote replica is down, right?
>>>>>>>>>>>
>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>>>> Deleting a master is irreversible.
>>>>>>>>>>> To reconnect to the remote master you will need to prepare a new
>>>>>>>>>>> replica file
>>>>>>>>>>> and re-install.
>>>>>>>>>>> Continue to delete? [no]: y
>>>>>>>>>>> ipa: INFO: Setting agreement
>>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>>>>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>>>>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> tree,cn=config
>>>>>>>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica
>>>>>>>>>>> acquired
>>>>>>>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>>>>>>>> Background task created to clean replication data
>>>>>>>>>>>
>>>>>>>>>>> ... after about a minute I hit CTRL+C
>>>>>>>>>>>
>>>>>>>>>>> ^CDeleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
>>>>>>>>>>> 'vm-055.idm.lab.bos.redhat.com'
>>>>>>>>>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>>>> does not
>>>>>>>>>>> contain 'vm-055.idm.lab.bos.redhat.com.'
>>>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>>>
>>>>>>>>>>> I think it would be better to inform user that some remote replica is
>>>>>>>>>>> down or
>>>>>>>>>>> at least that we are waiting for the task to complete. Something like
>>>>>>>>>>> that:
>>>>>>>>>>>
>>>>>>>>>>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>>>>>>>>>> ...
>>>>>>>>>>> Background task created to clean replication data
>>>>>>>>>>> Replication data clean up may take very long time if some replica is
>>>>>>>>>>> unreachable
>>>>>>>>>>> Hit CTRL+C to interrupt the wait
>>>>>>>>>>> ^C Clean up wait interrupted
>>>>>>>>>>> ....
>>>>>>>>>>> [continue with del]
>>>>>>>>>>
>>>>>>>>>> Yup, did this in #1.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 3) (minor) When there is a cleanruv task running and you run
>>>>>>>>>>> "ipa-replica-manage del", there is a unexpected error message with
>>>>>>>>>>> duplicate
>>>>>>>>>>> task object in LDAP:
>>>>>>>>>>>
>>>>>>>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com --force
>>>>>>>>>>> Unable to connect to replica vm-072.idm.lab.bos.redhat.com, forcing
>>>>>>>>>>> removal
>>>>>>>>>>> FAIL
>>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>>>>>>>>>>
>>>>>>>>>>> There were issues removing a connection: This entry already exists
>>>>>>>>>>> <<<<<<<<<
>>>>>>>>>>>
>>>>>>>>>>> Failed to get data from 'vm-072.idm.lab.bos.redhat.com': {'desc': "Can't
>>>>>>>>>>> contact LDAP server"}
>>>>>>>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record
>>>>>>>>>>> does not
>>>>>>>>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>>>>>>>>>> You may need to manually remove them from the tree
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think it should be enough to just catch for "entry already exists" in
>>>>>>>>>>> cleanallruv function, and in such case print a relevant error message
>>>>>>>>>>> bail out.
>>>>>>>>>>> Thus, self.conn.checkTask(dn, dowait=True) would not be called too.
>>>>>>>>>>
>>>>>>>>>> Good catch, fixed.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 4) (minor): In make_readonly function, there is a redundant "pass"
>>>>>>>>>>> statement:
>>>>>>>>>>>
>>>>>>>>>>> +    def make_readonly(self):
>>>>>>>>>>> +        """
>>>>>>>>>>> +        Make the current replication agreement read-only.
>>>>>>>>>>> +        """
>>>>>>>>>>> +        dn = DN(('cn', 'userRoot'), ('cn', 'ldbm database'),
>>>>>>>>>>> +                ('cn', 'plugins'), ('cn', 'config'))
>>>>>>>>>>> +
>>>>>>>>>>> +        mod = [(ldap.MOD_REPLACE, 'nsslapd-readonly', 'on')]
>>>>>>>>>>> +        try:
>>>>>>>>>>> +            self.conn.modify_s(dn, mod)
>>>>>>>>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>>>>>>>>> +            # We can't make the server we're removing read-only but
>>>>>>>>>>> +            # this isn't a show-stopper
>>>>>>>>>>> +            root_logger.debug("No permission to switch replica to
>>>>>>>>>>> read-only,
>>>>>>>>>>> continuing anyway")
>>>>>>>>>>> +            pass         <<<<<<<<<<<<<<<
>>>>>>>>>>
>>>>>>>>>> Yeah, this is one of my common mistakes. I put in a pass initially, then
>>>>>>>>>> add logging in front of it and forget to delete the pass. Its gone now.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 5) In clean_ruv, I think allowing a --force option to bypass the
>>>>>>>>>>> user_input
>>>>>>>>>>> would be helpful (at least for test automation):
>>>>>>>>>>>
>>>>>>>>>>> +    if not ipautil.user_input("Continue to clean?", False):
>>>>>>>>>>> +        sys.exit("Aborted")
>>>>>>>>>>
>>>>>>>>>> Yup, added.
>>>>>>>>>>
>>>>>>>>>> rob
>>>>>>>>>
>>>>>>>>> Slightly revised patch. I still had a window open with one unsaved change.
>>>>>>>>>
>>>>>>>>> rob
>>>>>>>>>
>>>>>>>>
>>>>>>>> Apparently there were two unsaved changes, one of which was lost. This
>>>>>>>> adds in
>>>>>>>> the 'entry already exists' fix.
>>>>>>>>
>>>>>>>> rob
>>>>>>>>
>>>>>>>
>>>>>>> Just one last thing (otherwise the patch is OK) - I don't think this is
>>>>>>> what we
>>>>>>> want :-)
>>>>>>>
>>>>>>> # ipa-replica-manage clean-ruv 8
>>>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>>>
>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>> longer replicate so it may miss updates while the process
>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>> consistency. Be very careful.
>>>>>>> Continue to clean? [no]: y   <<<<<<
>>>>>>> Aborted
>>>>>>>
>>>>>>>
>>>>>>> Nor this exception, (your are checking for wrong exception):
>>>>>>>
>>>>>>> # ipa-replica-manage clean-ruv 8
>>>>>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>>>>>
>>>>>>> Cleaning the wrong replica ID will cause that server to no
>>>>>>> longer replicate so it may miss updates while the process
>>>>>>> is running. It would need to be re-initialized to maintain
>>>>>>> consistency. Be very careful.
>>>>>>> Continue to clean? [no]:
>>>>>>> unexpected error: This entry already exists
>>>>>>>
>>>>>>> This is the exception:
>>>>>>>
>>>>>>> Traceback (most recent call last):
>>>>>>>       File "/sbin/ipa-replica-manage", line 651, in <module>
>>>>>>>         main()
>>>>>>>       File "/sbin/ipa-replica-manage", line 648, in main
>>>>>>>         clean_ruv(realm, args[1], options)
>>>>>>>       File "/sbin/ipa-replica-manage", line 373, in clean_ruv
>>>>>>>         thisrepl.cleanallruv(ruv)
>>>>>>>       File
>>>>>>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>>>>>>> line 1136, in cleanallruv
>>>>>>>         self.conn.addEntry(e)
>>>>>>>       File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
>>>>>>> 503, in
>>>>>>> addEntry
>>>>>>>         self.__handle_errors(e, arg_desc=arg_desc)
>>>>>>>       File "/usr/lib/python2.7/site-packages/ipaserver/ipaldap.py", line
>>>>>>> 321, in
>>>>>>> __handle_errors
>>>>>>>         raise errors.DuplicateEntry()
>>>>>>> ipalib.errors.DuplicateEntry: This entry already exists
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>
>>>>>> Fixed that and a couple of other problems. When doing a disconnect we should
>>>>>> not also call clean-ruv.
>>>>>
>>>>> Ah, good self-catch.
>>>>>
>>>>>>
>>>>>> I also got tired of seeing crappy error messages so I added a little convert
>>>>>> utility.
>>>>>>
>>>>>> rob
>>>>>
>>>>> 1) There is CLEANALLRUV stuff included in 1050-3 and not here. There are also
>>>>> some finding for this new code.
>>>>>
>>>>>
>>>>> 2) We may want to bump Requires to higher version of 389-ds-base
>>>>> (389-ds-base-1.2.11.14-1) - it contains a fix for CLEANALLRUV+winsync bug I
>>>>> found earlier.
>>>>>
>>>>>
>>>>> 3) I just discovered another suspicious behavior. When we are deleting a
>>>>> master
>>>>> that has links also to other master(s) we delete those too. But we also
>>>>> automatically run CLEANALLRUV in these cases, so we may end up in multiple
>>>>> tasks being started on different masters - this does not look right.
>>>>>
>>>>> I think we may rather want to at first delete all links and then run
>>>>> CLEANALLRUV task, just for one time. This is what I get with current code:
>>>>>
>>>>> # ipa-replica-manage del vm-072.idm.lab.bos.redhat.com
>>>>> Directory Manager password:
>>>>>
>>>>> Deleting a master is irreversible.
>>>>> To reconnect to the remote master you will need to prepare a new replica file
>>>>> and re-install.
>>>>> Continue to delete? [no]: yes
>>>>> ipa: INFO: Setting agreement
>>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>
>>>>>
>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>> cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>
>>>>>
>>>>> tree,cn=config
>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>> Background task created to clean replication data. This may take a while.
>>>>> This may be safely interrupted with Ctrl+C
>>>>>
>>>>> ^CWait for task interrupted. It will continue to run in the background
>>>>>
>>>>> Deleted replication agreement from 'vm-055.idm.lab.bos.redhat.com' to
>>>>> 'vm-072.idm.lab.bos.redhat.com'
>>>>> ipa: INFO: Setting agreement
>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>
>>>>>
>>>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>>>> cn=meTovm-086.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>>>
>>>>>
>>>>> tree,cn=config
>>>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>>>> successfully: Incremental update succeeded: start: 0: end: 0
>>>>> Background task created to clean replication data. This may take a while.
>>>>> This may be safely interrupted with Ctrl+C
>>>>>
>>>>> ^CWait for task interrupted. It will continue to run in the background
>>>>>
>>>>> Deleted replication agreement from 'vm-086.idm.lab.bos.redhat.com' to
>>>>> 'vm-072.idm.lab.bos.redhat.com'
>>>>> Failed to cleanup vm-072.idm.lab.bos.redhat.com DNS entries: NS record does
>>>>> not
>>>>> contain 'vm-072.idm.lab.bos.redhat.com.'
>>>>> You may need to manually remove them from the tree
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>> All issues addressed and I pulled in abort-clean-ruv from 1050. I added a
>>>> list-clean-ruv command as well.
>>>>
>>>> rob
>>>
>>> 1) Patch 1031-9 needs to get squashed with 1031-8
>>>
>>>
>>> 2) Patch needs a rebase (conflict in freeipa.spec.in)
>>>
>>>
>>> 3) New list-clean-ruv man entry is not right:
>>>
>>>          list-clean-ruv [REPLICATION_ID]
>>>                 - List all running CLEANALLRUV and abort CLEANALLRUV tasks.
>>>
>>> REPLICATION_ID is not its argument.
>>
>> Fixed 1-3.
>>
>>> Btw. new list-clean-ruv command proved very useful for me.
>>>
>>> 4) I just found out we need to do a better job with make_readonly() command. I
>>> get into trouble when disconnecting one link to a remote replica as it was
>>> marked readonly and then I was then unable to manage the disconnected replica
>>> properly (vm-072 is the replica made readonly):
>>
>> Ok, I reset read-only after we delete the agreements. That fixed things up for
>> me. I disconnected a replica and was able to modify entries on that replica
>> afterwards.
>>
>> This affected the --cleanup command too, it would otherwise have succeeded I
>> think.
>>
>> I tested with an A - B - C - A agreement loop. I disconnected A and C and
>> confirmed I could still update entries on C. Then I deleted C, then B, and made
>> sure output looked right, I could still manage entries, etc.
>>
>> rob
>>
>>>
>>> [root at vm-055 ~]# ipa-replica-manage disconnect vm-072.idm.lab.bos.redhat.com
>>>
>>> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com
>>> Deleting a master is irreversible.
>>> To reconnect to the remote master you will need to prepare a new replica file
>>> and re-install.
>>> Continue to delete? [no]: yes
>>> Deleting replication agreements between vm-055.idm.lab.bos.redhat.com and
>>> vm-072.idm.lab.bos.redhat.com
>>> ipa: INFO: Setting agreement
>>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>
>>> tree,cn=config schedule to 2358-2359 0 to force synch
>>> ipa: INFO: Deleting schedule 2358-2359 0 from agreement
>>> cn=meTovm-072.idm.lab.bos.redhat.com,cn=replica,cn=dc\=idm\,dc\=lab\,dc\=bos\,dc\=redhat\,dc\=com,cn=mapping
>>>
>>> tree,cn=config
>>> ipa: INFO: Replication Update in progress: FALSE: status: 0 Replica acquired
>>> successfully: Incremental update succeeded: start: 0: end: 0
>>> Deleted replication agreement from 'vm-072.idm.lab.bos.redhat.com' to
>>> 'vm-055.idm.lab.bos.redhat.com'
>>> Unable to remove replication agreement for vm-055.idm.lab.bos.redhat.com from
>>> vm-072.idm.lab.bos.redhat.com.
>>> Background task created to clean replication data. This may take a while.
>>> This may be safely interrupted with Ctrl+C
>>> ^CWait for task interrupted. It will continue to run in the background
>>>
>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com entries: Server is unwilling to
>>> perform: database is read-only arguments:
>>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>>>
>>>
>>> You may need to manually remove them from the tree
>>> ipa: INFO: Unhandled LDAPError: {'info': 'database is read-only', 'desc':
>>> 'Server is unwilling to perform'}
>>>
>>> Failed to cleanup vm-055.idm.lab.bos.redhat.com DNS entries: Server is
>>> unwilling to perform: database is read-only
>>>
>>> You may need to manually remove them from the tree
>>>
>>>
>>> --cleanup did not work for me as well:
>>> [root at vm-072 ~]# ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>>> --cleanup
>>> Cleaning a master is irreversible.
>>> This should not normally be require, so use cautiously.
>>> Continue to clean master? [no]: yes
>>> unexpected error: Server is unwilling to perform: database is read-only
>>> arguments:
>>> dn=krbprincipalname=ldap/vm-055.idm.lab.bos.redhat.com at IDM.LAB.BOS.REDHAT.COM,cn=services,cn=accounts,dc=idm,dc=lab,dc=bos,dc=redhat,dc=com
>>>
>>>
>>> Martin
>>>
>>
>
> I think you sent a wrong patch...
>
> Martin
>

I hate Mondays.

rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-rcrit-1031-11-cleanruv.patch
Type: text/x-diff
Size: 27405 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20120917/695d5e0a/attachment.bin>