[Freeipa-devel] [PATCH] 1031 run cleanallruv task

Rob Crittenden rcritten at redhat.com
Fri Aug 24 17:19:26 UTC 2012


Martin Kosek wrote:
> On 07/05/2012 08:39 PM, Rob Crittenden wrote:
>> Martin Kosek wrote:
>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
>>>> Deleting a replica can leave a replication vector (RUV) on the other servers.
>>>> This can confuse things if the replica is re-added, and it also causes the
>>>> server to calculate changes against a server that may no longer exist.
>>>>
>>>> 389-ds-base provides a new task that self-propogates itself to all available
>>>> replicas to clean this RUV data.
>>>>
>>>> This patch will create this task at deletion time to hopefully clean things up.
>>>>
>>>> It isn't perfect. If any replica is down or unavailable at the time the
>>>> cleanruv task fires, and then comes back up, the old RUV data may be
>>>> re-propogated around.
>>>>
>>>> To make things easier in this case I've added two new commands to
>>>> ipa-replica-manage. The first lists the replication ids of all the servers we
>>>> have a RUV for. Using this you can call clean_ruv with the replication id of a
>>>> server that no longer exists to try the cleanallruv step again.
>>>>
>>>> This is quite dangerous though. If you run cleanruv against a replica id that
>>>> does exist it can cause a loss of data. I believe I've put in enough scary
>>>> warnings about this.
>>>>
>>>> rob
>>>>
>>>
>>> Good work there, this should make cleaning RUVs much easier than with the
>>> previous version.
>>>
>>> This is what I found during review:
>>>
>>> 1) list_ruv and clean_ruv command help in man is quite lost. I think it would
>>> help if we for example have all info for commands indented. This way user could
>>> simply over-look the new commands in the man page.
>>>
>>>
>>> 2) I would rename new commands to clean-ruv and list-ruv to make them
>>> consistent with the rest of the commands (re-initialize, force-sync).
>>>
>>>
>>> 3) It would be nice to be able to run clean_ruv command in an unattended way
>>> (for better testing), i.e. respect --force option as we already do for
>>> ipa-replica-manage del. This fix would aid test automation in the future.
>>>
>>>
>>> 4) (minor) The new question (and the del too) does not react too well for
>>> CTRL+D:
>>>
>>> # ipa-replica-manage clean_ruv 3 --force
>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>
>>> Cleaning the wrong replica ID will cause that server to no
>>> longer replicate so it may miss updates while the process
>>> is running. It would need to be re-initialized to maintain
>>> consistency. Be very careful.
>>> Continue to clean? [no]: unexpected error:
>>>
>>>
>>> 5) Help for clean_ruv command without a required parameter is quite confusing
>>> as it reports that command is wrong and not the parameter:
>>>
>>> # ipa-replica-manage clean_ruv
>>> Usage: ipa-replica-manage [options]
>>>
>>> ipa-replica-manage: error: must provide a command [clean_ruv | force-sync |
>>> disconnect | connect | del | re-initialize | list | list_ruv]
>>>
>>> It seems you just forgot to specify the error message in the command definition
>>>
>>>
>>> 6) When the remote replica is down, the clean_ruv command fails with an
>>> unexpected error:
>>>
>>> [root at vm-086 ~]# ipa-replica-manage clean_ruv 5
>>> Clean the Replication Update Vector for vm-055.idm.lab.bos.redhat.com:389
>>>
>>> Cleaning the wrong replica ID will cause that server to no
>>> longer replicate so it may miss updates while the process
>>> is running. It would need to be re-initialized to maintain
>>> consistency. Be very careful.
>>> Continue to clean? [no]: y
>>> unexpected error: {'desc': 'Operations error'}
>>>
>>>
>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: failed
>>> to connect to repl        agreement connection
>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
>>>       cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>> tree,cn=config), error 105
>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: replica
>>> (cn=meTovm-055.idm.lab.
>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>
>>> tree,   cn=config) has not been cleaned.  You will need to rerun the
>>> CLEANALLRUV task on this replica.
>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin - cleanAllRUV_task: Task
>>> failed (1)
>>>
>>> In this case I think we should inform user that the command failed, possibly
>>> because of disconnected replicas and that they could enable the replicas and
>>> try again.
>>>
>>>
>>> 7) (minor) "pass" is now redundant in replication.py:
>>> +        except ldap.INSUFFICIENT_ACCESS:
>>> +            # We can't make the server we're removing read-only but
>>> +            # this isn't a show-stopper
>>> +            root_logger.debug("No permission to switch replica to read-only,
>>> continuing anyway")
>>> +            pass
>>>
>>
>> I think this addresses everything.
>>
>> rob
>
> Thanks, almost there! I just found one more issue which needs to be fixed
> before we push:
>
> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
> Directory Manager password:
>
> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing removal
> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
> contact LDAP server"}
> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>
> There were issues removing a connection: %d format: a number is required, not str
>
> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
> contact LDAP server"}
>
> This is a traceback I retrieved:
> Traceback (most recent call last):
>    File "/sbin/ipa-replica-manage", line 425, in del_master
>      del_link(realm, r, hostname, options.dirman_passwd, force=True)
>    File "/sbin/ipa-replica-manage", line 271, in del_link
>      repl1.cleanallruv(replica_id)
>    File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
> line 1094, in cleanallruv
>      root_logger.debug("Creating CLEANALLRUV task for replica id %d" % replicaId)
>
>
> The problem here is that you don't convert replica_id to int in this part:
> +    replica_id = None
> +    if repl2:
> +        replica_id = repl2._get_replica_id(repl2.conn, None)
> +    else:
> +        servers = get_ruv(realm, replica1, dirman_passwd)
> +        for (netloc, rid) in servers:
> +            if netloc.startswith(replica2):
> +                replica_id = rid
> +                break
>
> Martin
>

Updated patch using new mechanism in 389-ds-base. This should more 
thoroughly clean out RUV data when a replica is being deleted, and 
provide for a way to delete RUV data afterwards too if necessary.

rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-rcrit-1031-3-cleanruv.patch
Type: text/x-diff
Size: 14509 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20120824/9c09f694/attachment.bin>


More information about the Freeipa-devel mailing list