[Freeipa-devel] [PATCH] 0064 Rework task naming in LDAP updates to avoid conflicts

Tue Jul 24 13:47:59 UTC 2012

On 07/24/2012 02:49 PM, Alexander Bokovoy wrote:
> On Tue, 24 Jul 2012, Petr Viktorin wrote:
>> On 07/24/2012 02:06 PM, Alexander Bokovoy wrote:
>>> On Tue, 24 Jul 2012, Petr Viktorin wrote:
>>>> On 07/24/2012 12:01 PM, Alexander Bokovoy wrote:
>>>>> Hi,
>>>>>
>>>>> There are two problems in task naming in LDAP updates:
>>>>>
>>>>> 1. Randomness may be scarce in virtual machines
>>>>> 2. Random number is added to the time value rounded to a second
>>>>>
>>>>> The second issue leads to values that may repeat themselves as time
>>>>> only grows and random number is non-negative as well, so
>>>>> t2+r2 can be equal to t1+t2 generated earlier.
>>>>>
>>>>> Since task name is a DN, there is no strict requirement to use an
>>>>> integer value.  Instead, we can take time and attribute name. To get
>>>>> reasonable 'randomness' these values are then hashed with sha1 and use
>>>>> the resulting string as task name.
>>>>>
>>>>> SHA1 may technically be an overkill here as we could simply use
>>>>>
>>>>>  indextask_$date_$attribute
>>>>>
>>>>> where $date is a value of time.time() but SHA1 gives a resonable
>>>>> 'randomness' into the string.
>>>>
>>>> What kind of randomness do you mean? SHA1 is deterministic, it doesn't
>>>> add any randomness at all. It just obscures what's really happening.
>>> Hence using quotes to describe it. We don't need randomness in the task
>>> names, we need something that avoids collisions.
>>>
>>> An issue here is in time.time() -- it may give us sub-second resolution
>>> if underlying OS supports it, it may not. Having a second-level
>>> resolution is not enough, especially on fast machines, so we can't
>>> simply use int(times.time()) as it was in the original version.
>>>
>>> indextask_$date_$attribute has this issue that we don't have enough
>>> guarantee for $date (time.time()) to be unique in sufficiently tight
>>> conditions, thus use of SHA-1 to generate something that has better
>>> chances to avoid collisions than $data_$attribute.
>>
>> My point is that if "indextask_$date_$attribute" is not unique,
>> neither is SHA1("indextask_$date_$attribute"). Hashing has no effect
>> on the chance of collisions.
>>
>> You could use Python's pseudorandom number generator (random.randint)
>> instead of random.SystemRandom. It's not cryptographically secure but
>> it's enough to avoid collisions, and it doesn't use up system entropy
>> (except for initial seeding, through `import random`).
>> "indextask_$date_$attribute_$pseudorandomvalue" should be unique enough.
> Using random here is really bad.
> What we ideally need is a method to increment sequential calls for
> the same attribute.We use seconds to differentiate
> between all tasks but that is not really required, tasks that were
> processed will be removed.
>

Or maybe use $date_$attribute and just warn (ignore the error) if 
there's a duplicate -- if a reindex task for the same attribute already 
exists from the same second, do we really need to start a new one?

-- 
Petr³