[Freeipa-devel] [PATCH] 0064 Rework task naming in LDAP updates to avoid conflicts

Tue Jul 24 13:55:33 UTC 2012

Petr Viktorin wrote:
> On 07/24/2012 02:49 PM, Alexander Bokovoy wrote:
>> On Tue, 24 Jul 2012, Petr Viktorin wrote:
>>> On 07/24/2012 02:06 PM, Alexander Bokovoy wrote:
>>>> On Tue, 24 Jul 2012, Petr Viktorin wrote:
>>>>> On 07/24/2012 12:01 PM, Alexander Bokovoy wrote:
>>>>>> Hi,
>>>>>>
>>>>>> There are two problems in task naming in LDAP updates:
>>>>>>
>>>>>> 1. Randomness may be scarce in virtual machines
>>>>>> 2. Random number is added to the time value rounded to a second
>>>>>>
>>>>>> The second issue leads to values that may repeat themselves as time
>>>>>> only grows and random number is non-negative as well, so
>>>>>> t2+r2 can be equal to t1+t2 generated earlier.
>>>>>>
>>>>>> Since task name is a DN, there is no strict requirement to use an
>>>>>> integer value.  Instead, we can take time and attribute name. To get
>>>>>> reasonable 'randomness' these values are then hashed with sha1 and
>>>>>> use
>>>>>> the resulting string as task name.
>>>>>>
>>>>>> SHA1 may technically be an overkill here as we could simply use
>>>>>>
>>>>>>  indextask_$date_$attribute
>>>>>>
>>>>>> where $date is a value of time.time() but SHA1 gives a resonable
>>>>>> 'randomness' into the string.
>>>>>
>>>>> What kind of randomness do you mean? SHA1 is deterministic, it doesn't
>>>>> add any randomness at all. It just obscures what's really happening.
>>>> Hence using quotes to describe it. We don't need randomness in the task
>>>> names, we need something that avoids collisions.
>>>>
>>>> An issue here is in time.time() -- it may give us sub-second resolution
>>>> if underlying OS supports it, it may not. Having a second-level
>>>> resolution is not enough, especially on fast machines, so we can't
>>>> simply use int(times.time()) as it was in the original version.
>>>>
>>>> indextask_$date_$attribute has this issue that we don't have enough
>>>> guarantee for $date (time.time()) to be unique in sufficiently tight
>>>> conditions, thus use of SHA-1 to generate something that has better
>>>> chances to avoid collisions than $data_$attribute.
>>>
>>> My point is that if "indextask_$date_$attribute" is not unique,
>>> neither is SHA1("indextask_$date_$attribute"). Hashing has no effect
>>> on the chance of collisions.
>>>
>>> You could use Python's pseudorandom number generator (random.randint)
>>> instead of random.SystemRandom. It's not cryptographically secure but
>>> it's enough to avoid collisions, and it doesn't use up system entropy
>>> (except for initial seeding, through `import random`).
>>> "indextask_$date_$attribute_$pseudorandomvalue" should be unique enough.
>> Using random here is really bad.
>> What we ideally need is a method to increment sequential calls for
>> the same attribute.We use seconds to differentiate
>> between all tasks but that is not really required, tasks that were
>> processed will be removed.
>>
>
> Or maybe use $date_$attribute and just warn (ignore the error) if
> there's a duplicate -- if a reindex task for the same attribute already
> exists from the same second, do we really need to start a new one?
>

That is true.

We can generate a UUID, I think that is probably a better/safer thing to 
use overall.

rob