[Freeipa-devel] [PATCH] 0064 Rework task naming in LDAP updates to avoid conflicts

Wed Jul 25 08:22:51 UTC 2012

On 07/24/2012 08:36 PM, Alexander Bokovoy wrote:
> On Tue, 24 Jul 2012, Petr Viktorin wrote:
>> On 07/24/2012 04:50 PM, Alexander Bokovoy wrote:
>>> On Tue, 24 Jul 2012, Rob Crittenden wrote:
>>>> Petr Viktorin wrote:
>>>>> On 07/24/2012 02:49 PM, Alexander Bokovoy wrote:
>>>>>> On Tue, 24 Jul 2012, Petr Viktorin wrote:
>>>>>>> On 07/24/2012 02:06 PM, Alexander Bokovoy wrote:
>>>>>>>> On Tue, 24 Jul 2012, Petr Viktorin wrote:
>>>>>>>>> On 07/24/2012 12:01 PM, Alexander Bokovoy wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> There are two problems in task naming in LDAP updates:
>>>>>>>>>>
>>>>>>>>>> 1. Randomness may be scarce in virtual machines
>>>>>>>>>> 2. Random number is added to the time value rounded to a second
>>>>>>>>>>
>>>>>>>>>> The second issue leads to values that may repeat themselves as
>>>>>>>>>> time
>>>>>>>>>> only grows and random number is non-negative as well, so
>>>>>>>>>> t2+r2 can be equal to t1+t2 generated earlier.
>>>>>>>>>>
>>>>>>>>>> Since task name is a DN, there is no strict requirement to use an
>>>>>>>>>> integer value.  Instead, we can take time and attribute name. To
>>>>>>>>>> get
>>>>>>>>>> reasonable 'randomness' these values are then hashed with sha1
>>>>>>>>>> and
>>>>>>>>>> use
>>>>>>>>>> the resulting string as task name.
>>>>>>>>>>
>>>>>>>>>> SHA1 may technically be an overkill here as we could simply use
>>>>>>>>>>
>>>>>>>>>> indextask_$date_$attribute
>>>>>>>>>>
>>>>>>>>>> where $date is a value of time.time() but SHA1 gives a resonable
>>>>>>>>>> 'randomness' into the string.
>>>>>>>>>
>>>>>>>>> What kind of randomness do you mean? SHA1 is deterministic, it
>>>>>>>>> doesn't
>>>>>>>>> add any randomness at all. It just obscures what's really
>>>>>>>>> happening.
>>>>>>>> Hence using quotes to describe it. We don't need randomness in the
>>>>>>>> task
>>>>>>>> names, we need something that avoids collisions.
>>>>>>>>
>>>>>>>> An issue here is in time.time() -- it may give us sub-second
>>>>>>>> resolution
>>>>>>>> if underlying OS supports it, it may not. Having a second-level
>>>>>>>> resolution is not enough, especially on fast machines, so we can't
>>>>>>>> simply use int(times.time()) as it was in the original version.
>>>>>>>>
>>>>>>>> indextask_$date_$attribute has this issue that we don't have enough
>>>>>>>> guarantee for $date (time.time()) to be unique in sufficiently
>>>>>>>> tight
>>>>>>>> conditions, thus use of SHA-1 to generate something that has better
>>>>>>>> chances to avoid collisions than $data_$attribute.
>>>>>>>
>>>>>>> My point is that if "indextask_$date_$attribute" is not unique,
>>>>>>> neither is SHA1("indextask_$date_$attribute"). Hashing has no effect
>>>>>>> on the chance of collisions.
>>>>>>>
>>>>>>> You could use Python's pseudorandom number generator
>>>>>>> (random.randint)
>>>>>>> instead of random.SystemRandom. It's not cryptographically secure
>>>>>>> but
>>>>>>> it's enough to avoid collisions, and it doesn't use up system
>>>>>>> entropy
>>>>>>> (except for initial seeding, through `import random`).
>>>>>>> "indextask_$date_$attribute_$pseudorandomvalue" should be unique
>>>>>>> enough.
>>>>>> Using random here is really bad.
>>>>>> What we ideally need is a method to increment sequential calls for
>>>>>> the same attribute.We use seconds to differentiate
>>>>>> between all tasks but that is not really required, tasks that were
>>>>>> processed will be removed.
>>>>>>
>>>>>
>>>>> Or maybe use $date_$attribute and just warn (ignore the error) if
>>>>> there's a duplicate -- if a reindex task for the same attribute
>>>>> already
>>>>> exists from the same second, do we really need to start a new one?
>>>>>
>>>>
>>>> That is true.
>>>>
>>>> We can generate a UUID, I think that is probably a better/safer thing
>>>> to use overall.
>>> Updated patch attached.
>>>
>>> 2012-07-24T14:36:31Z INFO Creating task to index attribute: memberOf
>>> 2012-07-24T14:36:31Z DEBUG Task id:
>>> cn=indextask_memberOf_135624333918176170_14302,cn=index,cn=tasks,cn=config
>>>
>>> 2012-07-24T14:36:32Z INFO Indexing finished
>>> ...
>>> 2012-07-24T14:36:43Z INFO Creating task to index attribute: memberUser
>>> 2012-07-24T14:36:43Z DEBUG Task id:
>>> cn=indextask_memberUser_135624334038532670_14302,cn=index,cn=tasks,cn=config
>>>
>>>
>>> 2012-07-24T14:36:44Z INFO Indexing finished
>>>
>>> You can note that clock_seq does not change, it is because uuid.uuid1()
>>> uses nanosecond resolution and uuid_generate_time() if the latter is
>>> available. If we happen to ask for uuids within sub-nanosecond interval,
>>> clock_seq will be different then.
>>>
>>> I'm extracting only time and clock_seq instead of pasting full uuid
>>> value because uuid_generate_time() will leak out ethernet MAC address
>>> for 48-bit node. We don't need more bits here so I drop these 48 bits
>>> and avoid publishing the ethernet MAC address, even in logs.
>>>
>>> --
>>> / Alexander Bokovoy
>>
>> Yes, this approach will work. Just some technical comments:
>>
>>
>>> +        self.sub_dict['TIME'] = cn_uuid.get_time()
>>> +        self.sub_dict['SEQ'] = cn_uuid.get_clock_seq()
>>
>> Use the attributes, `cn_uuid.time` and `cn_uuid.clock_seq`. The
>> accessor methods are undocumented.
> Fixed.
>
>>
>>> +        self.sub_dict['ATTR'] = attribute
>>>
>>> -        # Refresh the time to make uniqueness more probable. Add on
>>> some
>>> -        # randomness for good measure.
>>> -        self.sub_dict['TIME'] = int(time.time()) + r.randint(0,10000)
>>> +        cn = self._template_str("indextask_${ATTR}_${TIME}_${SEQ}")
>>
>> You're overwriting sub_dict['TIME'], which is used elsewhere in the
>> class. That could cause trouble.
> Since it was set here and others are using it, I kept it updated in a
> new version as well, based on cn_uuid.time. But since cn_uuid.time is in
> nanoseconds resolution, I have to divide the value by 1e9.
>
>> There's no reason to use sub_dict here; you can simply do:
>>
>> cn = 'indextask_%s_%s_%s' % (attribute, cn_uuid.time, cn_uuid.clock_seq)
> Fixed.
>
> This version works fine for me.
>

ACK

-- 
Petr³