[Freeipa-devel] [PATCH] 0064 Rework task naming in LDAP updates to avoid conflicts

Alexander Bokovoy abokovoy at redhat.com
Tue Jul 24 18:36:08 UTC 2012


On Tue, 24 Jul 2012, Petr Viktorin wrote:
>On 07/24/2012 04:50 PM, Alexander Bokovoy wrote:
>>On Tue, 24 Jul 2012, Rob Crittenden wrote:
>>>Petr Viktorin wrote:
>>>>On 07/24/2012 02:49 PM, Alexander Bokovoy wrote:
>>>>>On Tue, 24 Jul 2012, Petr Viktorin wrote:
>>>>>>On 07/24/2012 02:06 PM, Alexander Bokovoy wrote:
>>>>>>>On Tue, 24 Jul 2012, Petr Viktorin wrote:
>>>>>>>>On 07/24/2012 12:01 PM, Alexander Bokovoy wrote:
>>>>>>>>>Hi,
>>>>>>>>>
>>>>>>>>>There are two problems in task naming in LDAP updates:
>>>>>>>>>
>>>>>>>>>1. Randomness may be scarce in virtual machines
>>>>>>>>>2. Random number is added to the time value rounded to a second
>>>>>>>>>
>>>>>>>>>The second issue leads to values that may repeat themselves as time
>>>>>>>>>only grows and random number is non-negative as well, so
>>>>>>>>>t2+r2 can be equal to t1+t2 generated earlier.
>>>>>>>>>
>>>>>>>>>Since task name is a DN, there is no strict requirement to use an
>>>>>>>>>integer value.  Instead, we can take time and attribute name. To
>>>>>>>>>get
>>>>>>>>>reasonable 'randomness' these values are then hashed with sha1 and
>>>>>>>>>use
>>>>>>>>>the resulting string as task name.
>>>>>>>>>
>>>>>>>>>SHA1 may technically be an overkill here as we could simply use
>>>>>>>>>
>>>>>>>>>indextask_$date_$attribute
>>>>>>>>>
>>>>>>>>>where $date is a value of time.time() but SHA1 gives a resonable
>>>>>>>>>'randomness' into the string.
>>>>>>>>
>>>>>>>>What kind of randomness do you mean? SHA1 is deterministic, it
>>>>>>>>doesn't
>>>>>>>>add any randomness at all. It just obscures what's really happening.
>>>>>>>Hence using quotes to describe it. We don't need randomness in the
>>>>>>>task
>>>>>>>names, we need something that avoids collisions.
>>>>>>>
>>>>>>>An issue here is in time.time() -- it may give us sub-second
>>>>>>>resolution
>>>>>>>if underlying OS supports it, it may not. Having a second-level
>>>>>>>resolution is not enough, especially on fast machines, so we can't
>>>>>>>simply use int(times.time()) as it was in the original version.
>>>>>>>
>>>>>>>indextask_$date_$attribute has this issue that we don't have enough
>>>>>>>guarantee for $date (time.time()) to be unique in sufficiently tight
>>>>>>>conditions, thus use of SHA-1 to generate something that has better
>>>>>>>chances to avoid collisions than $data_$attribute.
>>>>>>
>>>>>>My point is that if "indextask_$date_$attribute" is not unique,
>>>>>>neither is SHA1("indextask_$date_$attribute"). Hashing has no effect
>>>>>>on the chance of collisions.
>>>>>>
>>>>>>You could use Python's pseudorandom number generator (random.randint)
>>>>>>instead of random.SystemRandom. It's not cryptographically secure but
>>>>>>it's enough to avoid collisions, and it doesn't use up system entropy
>>>>>>(except for initial seeding, through `import random`).
>>>>>>"indextask_$date_$attribute_$pseudorandomvalue" should be unique
>>>>>>enough.
>>>>>Using random here is really bad.
>>>>>What we ideally need is a method to increment sequential calls for
>>>>>the same attribute.We use seconds to differentiate
>>>>>between all tasks but that is not really required, tasks that were
>>>>>processed will be removed.
>>>>>
>>>>
>>>>Or maybe use $date_$attribute and just warn (ignore the error) if
>>>>there's a duplicate -- if a reindex task for the same attribute already
>>>>exists from the same second, do we really need to start a new one?
>>>>
>>>
>>>That is true.
>>>
>>>We can generate a UUID, I think that is probably a better/safer thing
>>>to use overall.
>>Updated patch attached.
>>
>>2012-07-24T14:36:31Z INFO Creating task to index attribute: memberOf
>>2012-07-24T14:36:31Z DEBUG Task id:
>>cn=indextask_memberOf_135624333918176170_14302,cn=index,cn=tasks,cn=config
>>2012-07-24T14:36:32Z INFO Indexing finished
>>...
>>2012-07-24T14:36:43Z INFO Creating task to index attribute: memberUser
>>2012-07-24T14:36:43Z DEBUG Task id:
>>cn=indextask_memberUser_135624334038532670_14302,cn=index,cn=tasks,cn=config
>>
>>2012-07-24T14:36:44Z INFO Indexing finished
>>
>>You can note that clock_seq does not change, it is because uuid.uuid1()
>>uses nanosecond resolution and uuid_generate_time() if the latter is
>>available. If we happen to ask for uuids within sub-nanosecond interval,
>>clock_seq will be different then.
>>
>>I'm extracting only time and clock_seq instead of pasting full uuid
>>value because uuid_generate_time() will leak out ethernet MAC address
>>for 48-bit node. We don't need more bits here so I drop these 48 bits
>>and avoid publishing the ethernet MAC address, even in logs.
>>
>>--
>>/ Alexander Bokovoy
>
>Yes, this approach will work. Just some technical comments:
>
>
>>+        self.sub_dict['TIME'] = cn_uuid.get_time()
>>+        self.sub_dict['SEQ'] = cn_uuid.get_clock_seq()
>
>Use the attributes, `cn_uuid.time` and `cn_uuid.clock_seq`. The 
>accessor methods are undocumented.
Fixed.

>
>>+        self.sub_dict['ATTR'] = attribute
>>
>>-        # Refresh the time to make uniqueness more probable. Add on some
>>-        # randomness for good measure.
>>-        self.sub_dict['TIME'] = int(time.time()) + r.randint(0,10000)
>>+        cn = self._template_str("indextask_${ATTR}_${TIME}_${SEQ}")
>
>You're overwriting sub_dict['TIME'], which is used elsewhere in the 
>class. That could cause trouble.
Since it was set here and others are using it, I kept it updated in a
new version as well, based on cn_uuid.time. But since cn_uuid.time is in
nanoseconds resolution, I have to divide the value by 1e9.

>There's no reason to use sub_dict here; you can simply do:
>
>cn = 'indextask_%s_%s_%s' % (attribute, cn_uuid.time, cn_uuid.clock_seq)
Fixed.

This version works fine for me.

-- 
/ Alexander Bokovoy
-------------- next part --------------
>From 3e404c5ab461a89aa492fa2f98c81e73fc4cdb1c Mon Sep 17 00:00:00 2001
From: Alexander Bokovoy <abokovoy at redhat.com>
Date: Tue, 24 Jul 2012 12:07:23 +0300
Subject: [PATCH 3/3] Rework task naming in LDAP updates to avoid conflicting
 names in certain cases

There are two problems in task naming in LDAP updates:

1. Randomness may be scarce in virtual machines
2. Random number is added to the time value rounded to a second

The second issue leads to values that may repeat themselves as time
only grows and random number is non-negative as well, so
t2+r2 can be equal to t1+t2 generated earlier.

Since task name is a DN, there is no strict requirement to use an integer value.
Instead, we generate an UUID and use its 60-bit time, 14-bit sequential number,
and attribute name.

https://fedorahosted.org/freeipa/ticket/2942
---
 ipaserver/install/ldapupdate.py |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/ipaserver/install/ldapupdate.py b/ipaserver/install/ldapupdate.py
index c64139889d9f84866ac0cd358ed3a3a7d95af7dc..949b86ad99bce97fe3d09baef7fa42dbbc55ac26 100644
--- a/ipaserver/install/ldapupdate.py
+++ b/ipaserver/install/ldapupdate.py
@@ -29,9 +29,11 @@ from ipaserver.install import installutils
 from ipaserver.install import service
 from ipaserver import ipaldap
 from ipapython import entity, ipautil
+import uuid
 from ipalib import util
 from ipalib import errors
 from ipalib import api
+from ipalib.dn import DN
 import ldap
 from ldap.dn import escape_dn_chars
 from ipapython.ipa_log_manager import *
@@ -328,16 +330,14 @@ class LDAPUpdate:
         if self.live_run:
             time.sleep(5)
 
-        r = random.SystemRandom()
+        cn_uuid = uuid.uuid1()
+        # cn_uuid.time is in nanoseconds, but other users of LDAPUpdate expect
+        # seconds in 'TIME' so scale the value down
+        self.sub_dict['TIME'] = int(cn_uuid.time/1e9)
+        cn = "indextask_%s_%s_%s" % (attribute, cn_uuid.time, cn_uuid.clock_seq)
+        dn = DN(('cn', cn), ('cn', 'index'), ('cn', 'tasks'), ('cn', 'config'))
 
-        # Refresh the time to make uniqueness more probable. Add on some
-        # randomness for good measure.
-        self.sub_dict['TIME'] = int(time.time()) + r.randint(0,10000)
-
-        cn = self._template_str("indextask_$TIME")
-        dn = "cn=%s, cn=index, cn=tasks, cn=config" % cn
-
-        e = ipaldap.Entry(dn)
+        e = ipaldap.Entry(str(dn))
 
         e.setValues('objectClass', ['top', 'extensibleObject'])
         e.setValue('cn', cn)
-- 
1.7.10.4



More information about the Freeipa-devel mailing list