[Freeipa-devel] [PATCH] Improvements to collection API

Stephen Gallagher sgallagh at redhat.com
Tue Jun 30 12:05:42 UTC 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/29/2009 06:34 PM, Dmitri Pal wrote:
>> Nack.
>>
>> Your PRIME hash function is dangerously limited. You guarantee an
>> overflow on 64-bit systems with strings of 13 characters or more. Worse
>> on 32 bit systems. At minimum, declare phash a uint64_t. Furthermore, in
>> your algorithm, the strings "mystring" and "mystirng" will produce an
>> identical hash. Consider locating a library that provides a safer and
>> less collision-prone hash. (I have no recommendations, perhaps someone
>> else on the list can help.)
>>
>> Beyond that, I'm not going to look too deeply into the internals.
>> Syntactically everything looks fine on a quick scan. It compiles cleanly
>> against the current head and does not break any existing code, so once
>> the above change is made, I'm comfortable with committing it and working
>> out any bugs later on.
>>
> This is really funny.
> The hash function is the exact copy of the key hash function from the
> dhash.c lines 131 - 132.
> This function constructs the key for the hash table entries so I assumed
> that it is sufficient enough.
> I view this hash as a quick internal aid to searching.
> If it worked in dhash.c why it would not work for me.
> The limitation you bring up related to overflow and order of letters is
> not a worry at the moment.
> It can be enhanced later in a separate patch.
> 
> By the way the dhash.c has a bug in this code (line 131).
> The value of h in the result of the hashing of a string will always be 0
> since h is initialized to 0 and then multiplied.
> 
> If there are no other issues I suggest accepting patch and I will log an
> enhancement request to improve hashing in collection.
> 
> 
> 
> _______________________________________________
> Freeipa-devel mailing list
> Freeipa-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/freeipa-devel
> 
> 

I'm re-evaluating my original issue with this, which was the overflow. I
realized when thinking about it some more that this really isn't the
main issue, because an overflow simply drops the high-order bits.
However, there's a more serious problem with the fact that simple
transliteration of characters can result in identical hashes.

I'd rather see us just use FNV-1a (public domain hash algorithm with
excellent dispersion)(1):

64-bit variation of FNV-1a:
#define FNV1a_prime 1099511628211
#define FNV1a_base 14695981039346656037
...
item->phash = FNV1a_base;
while(property[item->property_len] != 0) {
    item->phash = item->phash ^ property[item->property_len];
    item->phash = hash * FNV1a_prime;
    item->property_len++;
}

Also, phash MUST be defined as a uint64_t, not simply as "unsigned",
since that's not safely portable.

The easiest way to approach this would be simply to pull in the
public-domain code available here(2). Among other things, this would
handle all of the necessary


(1) http://isthe.com/chongo/tech/comp/fnv/#FNV-1
This page gives details on how the "magic numbers" were derived, as well
as alternate implementations for different hash sizes (32-bit, 128-bit,
512-bit, etc.)

(2) http://isthe.com/chongo/src/fnv/fnv-5.0.1.tar.gz
- -- 
Stephen Gallagher
RHCE 804006346421761

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAkpJ/5IACgkQeiVVYja6o6PEyACfXfasV2grJdiaVG7PQgyCo4OM
vBwAoK2XXy/tpA/qm6BdSImVR6YWc7kZ
=/SKV
-----END PGP SIGNATURE-----




More information about the Freeipa-devel mailing list