[Freeipa-devel] [PATCH] 1031 run cleanallruv task

Rob Crittenden rcritten at redhat.com
Wed Sep 5 18:06:09 UTC 2012


Rob Crittenden wrote:
> Martin Kosek wrote:
>> On 07/05/2012 08:39 PM, Rob Crittenden wrote:
>>> Martin Kosek wrote:
>>>> On 07/03/2012 04:41 PM, Rob Crittenden wrote:
>>>>> Deleting a replica can leave a replication vector (RUV) on the
>>>>> other servers.
>>>>> This can confuse things if the replica is re-added, and it also
>>>>> causes the
>>>>> server to calculate changes against a server that may no longer exist.
>>>>>
>>>>> 389-ds-base provides a new task that self-propogates itself to all
>>>>> available
>>>>> replicas to clean this RUV data.
>>>>>
>>>>> This patch will create this task at deletion time to hopefully
>>>>> clean things up.
>>>>>
>>>>> It isn't perfect. If any replica is down or unavailable at the time
>>>>> the
>>>>> cleanruv task fires, and then comes back up, the old RUV data may be
>>>>> re-propogated around.
>>>>>
>>>>> To make things easier in this case I've added two new commands to
>>>>> ipa-replica-manage. The first lists the replication ids of all the
>>>>> servers we
>>>>> have a RUV for. Using this you can call clean_ruv with the
>>>>> replication id of a
>>>>> server that no longer exists to try the cleanallruv step again.
>>>>>
>>>>> This is quite dangerous though. If you run cleanruv against a
>>>>> replica id that
>>>>> does exist it can cause a loss of data. I believe I've put in
>>>>> enough scary
>>>>> warnings about this.
>>>>>
>>>>> rob
>>>>>
>>>>
>>>> Good work there, this should make cleaning RUVs much easier than
>>>> with the
>>>> previous version.
>>>>
>>>> This is what I found during review:
>>>>
>>>> 1) list_ruv and clean_ruv command help in man is quite lost. I think
>>>> it would
>>>> help if we for example have all info for commands indented. This way
>>>> user could
>>>> simply over-look the new commands in the man page.
>>>>
>>>>
>>>> 2) I would rename new commands to clean-ruv and list-ruv to make them
>>>> consistent with the rest of the commands (re-initialize, force-sync).
>>>>
>>>>
>>>> 3) It would be nice to be able to run clean_ruv command in an
>>>> unattended way
>>>> (for better testing), i.e. respect --force option as we already do for
>>>> ipa-replica-manage del. This fix would aid test automation in the
>>>> future.
>>>>
>>>>
>>>> 4) (minor) The new question (and the del too) does not react too
>>>> well for
>>>> CTRL+D:
>>>>
>>>> # ipa-replica-manage clean_ruv 3 --force
>>>> Clean the Replication Update Vector for
>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>
>>>> Cleaning the wrong replica ID will cause that server to no
>>>> longer replicate so it may miss updates while the process
>>>> is running. It would need to be re-initialized to maintain
>>>> consistency. Be very careful.
>>>> Continue to clean? [no]: unexpected error:
>>>>
>>>>
>>>> 5) Help for clean_ruv command without a required parameter is quite
>>>> confusing
>>>> as it reports that command is wrong and not the parameter:
>>>>
>>>> # ipa-replica-manage clean_ruv
>>>> Usage: ipa-replica-manage [options]
>>>>
>>>> ipa-replica-manage: error: must provide a command [clean_ruv |
>>>> force-sync |
>>>> disconnect | connect | del | re-initialize | list | list_ruv]
>>>>
>>>> It seems you just forgot to specify the error message in the command
>>>> definition
>>>>
>>>>
>>>> 6) When the remote replica is down, the clean_ruv command fails with an
>>>> unexpected error:
>>>>
>>>> [root at vm-086 ~]# ipa-replica-manage clean_ruv 5
>>>> Clean the Replication Update Vector for
>>>> vm-055.idm.lab.bos.redhat.com:389
>>>>
>>>> Cleaning the wrong replica ID will cause that server to no
>>>> longer replicate so it may miss updates while the process
>>>> is running. It would need to be re-initialized to maintain
>>>> consistency. Be very careful.
>>>> Continue to clean? [no]: y
>>>> unexpected error: {'desc': 'Operations error'}
>>>>
>>>>
>>>> /var/log/dirsrv/slapd-IDM-LAB-BOS-REDHAT-COM/errors:
>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>> cleanAllRUV_task: failed
>>>> to connect to repl        agreement connection
>>>> (cn=meTovm-055.idm.lab.bos.redhat.com,cn=replica,
>>>>
>>>> cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>> tree,cn=config), error 105
>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>> cleanAllRUV_task: replica
>>>> (cn=meTovm-055.idm.lab.
>>>> bos.redhat.com,cn=replica,cn=dc\3Didm\2Cdc\3Dlab\2Cdc\3Dbos\2Cdc\3Dredhat\2Cdc\3Dcom,cn=mapping
>>>>
>>>>
>>>> tree,   cn=config) has not been cleaned.  You will need to rerun the
>>>> CLEANALLRUV task on this replica.
>>>> [04/Jul/2012:06:28:16 -0400] NSMMReplicationPlugin -
>>>> cleanAllRUV_task: Task
>>>> failed (1)
>>>>
>>>> In this case I think we should inform user that the command failed,
>>>> possibly
>>>> because of disconnected replicas and that they could enable the
>>>> replicas and
>>>> try again.
>>>>
>>>>
>>>> 7) (minor) "pass" is now redundant in replication.py:
>>>> +        except ldap.INSUFFICIENT_ACCESS:
>>>> +            # We can't make the server we're removing read-only but
>>>> +            # this isn't a show-stopper
>>>> +            root_logger.debug("No permission to switch replica to
>>>> read-only,
>>>> continuing anyway")
>>>> +            pass
>>>>
>>>
>>> I think this addresses everything.
>>>
>>> rob
>>
>> Thanks, almost there! I just found one more issue which needs to be fixed
>> before we push:
>>
>> # ipa-replica-manage del vm-055.idm.lab.bos.redhat.com --force
>> Directory Manager password:
>>
>> Unable to connect to replica vm-055.idm.lab.bos.redhat.com, forcing
>> removal
>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
>> contact LDAP server"}
>> Forcing removal on 'vm-086.idm.lab.bos.redhat.com'
>>
>> There were issues removing a connection: %d format: a number is
>> required, not str
>>
>> Failed to get data from 'vm-055.idm.lab.bos.redhat.com': {'desc': "Can't
>> contact LDAP server"}
>>
>> This is a traceback I retrieved:
>> Traceback (most recent call last):
>>    File "/sbin/ipa-replica-manage", line 425, in del_master
>>      del_link(realm, r, hostname, options.dirman_passwd, force=True)
>>    File "/sbin/ipa-replica-manage", line 271, in del_link
>>      repl1.cleanallruv(replica_id)
>>    File
>> "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>> line 1094, in cleanallruv
>>      root_logger.debug("Creating CLEANALLRUV task for replica id %d" %
>> replicaId)
>>
>>
>> The problem here is that you don't convert replica_id to int in this
>> part:
>> +    replica_id = None
>> +    if repl2:
>> +        replica_id = repl2._get_replica_id(repl2.conn, None)
>> +    else:
>> +        servers = get_ruv(realm, replica1, dirman_passwd)
>> +        for (netloc, rid) in servers:
>> +            if netloc.startswith(replica2):
>> +                replica_id = rid
>> +                break
>>
>> Martin
>>
>
> Updated patch using new mechanism in 389-ds-base. This should more
> thoroughly clean out RUV data when a replica is being deleted, and
> provide for a way to delete RUV data afterwards too if necessary.
>
> rob

Rebased patch

rob

-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeipa-rcrit-1031-4-cleanruv.patch
Type: text/x-diff
Size: 14038 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/freeipa-devel/attachments/20120905/08fb7d02/attachment.bin>


More information about the Freeipa-devel mailing list