[libvirt] [PATCH 3/3] tests: deterministichash: Make hash tables arch-independent

John Ferlan jferlan at redhat.com
Tue Aug 15 20:07:16 UTC 2017



On 08/15/2017 11:47 AM, Peter Krempa wrote:
> On Tue, Aug 15, 2017 at 10:02:38 -0400, John Ferlan wrote:
>> On 08/15/2017 08:01 AM, Peter Krempa wrote:
>>> On Mon, Aug 14, 2017 at 20:46:10 -0400, John Ferlan wrote:
>>>> On 08/03/2017 04:50 AM, Peter Krempa wrote:
>>>
>>> [ trimmed off-topic part ]
>>>
>>>> NB: I didn't dig into the qemumonitorjsontest algorithm, but...
>>>>
>>>> While going through the common object changes, I ended up looking at and
>>>> thinking about the hash table algorithms, initially it was just the
>>>> "grow" algorithm as I think it is a bit "aggressive" (perhaps wasteful)
>>>> since as soon as 1 bucket chain exceeds 8 elements, the table is resized
>>>> by 8 unless the new size is 8 * 2048 - at which time no matter how full
>>>> a bucket is we don't resize the table
>>>
>>> Resizing the table is very expensive, so it makes sense to scale it
>>> aggresively. At some point though it would take too much memory.
>>>
>>> If some place in the code expects massive hash table, it can set the
>>> hash table to > 2048 manually.
>>>
>>
>> Right - avoiding excessive or unnecessary resizing is something I was
>> considering. One thing that was at the back of my mind is "somewhat
>> recent" discussions about 10s of thousands of disk/volumes. IIRC it's
>> been discussed on qemu list as it relates to libguestfs.
> 
> If we assume that our hash function has an even distribution of hashes
> (which was not refuted yet) then for 10k entries you should have 5ish
> entries per bucket. Even with 100k you are at 50 per bucket. That is
> totally managable.
> 

I really hadn't done any analysis, I was in the wondering stage and
setting up the experiment. It's a part of the code I didn't know much
about, so it was also a learning exercise. I never knew anything about
murmurhash nor siphash before today!

>> Because volume names tend to be less random than a UUID, I was looking
>> at how the existing algorithms operate when you have say "disk00001"
>> through "disk99999" w/r/t bucket layout and length of chains when we
>> reach the maximum bucket count.
>>
>> Still even with that, resizing a table just because one hash bucket
>> chain goes above 8 just seemed unnecessary. Still a work in progress
>> though - I may come up with nothing. It's the classic power struggle of
>> time vs. space.
> 
> (This was covered by dan's reply)
> 
>>>> The next thing I considered was using a prime number as the table bucket
>>>> size and it seems by just doing that will cause qemumonitorjsontest to
>>>
>>> What would be the benefit of such change? I don't really see how prime
>>> numbers would improve anything performance-wise.
>>>
>>
>> Nothing performance wise directly that comes to mind. I see that while
>> composing Michal and Daniel have responded.
>>
>> FWIW: My hacking has altered virhashtest.c to generate 1000's of UUIDs
>> and "elemN" names in order to check if/when one or the other has a
>> bucket full problem with the existing algorithms.  It's been a couple
>> weeks since I was looking at the data - so results are not fresh in mind
>> especially since I was on PTO last week.
>>
>>>> fail. Easy enough to reproduce by changing the virHashCreate call in
>>>> qemuBlockNodeNameGetBackingChain to use 53 instead of 50 (even 49 causes
>>>> failure). It doesn't matter if the test source also adjusts the initial
>>>> hash size.
>>>>
>>>> Of course this makes sense given that virHashCodeGen is the backend of
>>>> the virHashComputeKey algorithm that uses the return value %
>>>> table->size. So if "size" changes, then obviously order changes.
>>>
>>> As you've said that is expected. I don't quite follow what is your
>>> point here.
>>>
>>
>> I was pointing out that by changing the size you get different ordered
>> results, nothing more, nothing less. It just so happens that 50 returns
>> this set of results, but 53 is different.
>>
>>>> While I get the code bloat conundrum, but if the order doesn't matter
>>>> then I wonder what other variables could change that would affect this
>>>> test and whether we really want to be constantly altering the mocks
>>>> "just" to get the right answer for this test.
>>>
>>> The order depends on the result of the hasing function, the number of
>>> buckets and the order you've added the entries. All of those is
>>> expected when dealing with a hash table.
>>>
>>> After this last patch, you are guaranteed to have the hash function
>>> return deterministic results (even on big endian-boxes).
>>>
>>
>> Change the size to 53, rebuild, and run the test. It'll take a few
>> minutes, but I think you'd see a failure.
> 
> I fail to see the point of such excercise. It is expected that it will
> happen. If we wanted to make the testsuite slightly more robust in this
> regard, we could also mandate a fixed hash size for the mocked hash, but
> I don't quite feel the need to do so.
> 

Got it. You asked a question, I answered.  The only reason I used this
particular patch to respond was because this particular test had an
error result in my example and there was some concern over the
deterministic results.

>>> The hash size is not really altered very often since you can't really
>>> see a measurable performance benefit in most cases. This is also
>>> strenghtened by the fact, that the hash function is randomized by a
>>> random number, so in real usage you won't be able to deterministically
>>> cause hash collisions.
>>
>> True... at 50 with 4 elements you won't see a resize. You could possibly
>> have a table size of 1 in this instance and be fine, but that only works
>> well for the test! It's more problematic for real world because a table
>> size of 1 grows to 8, 64, etc.  Some would say the minimum table size
>> should be at least 8 if not the next prime, e.g. 11.
> 
> This is not a problem for real use cases. Real use cases don't care
> about the ordering and thus it does not matter whether it's resized or
> not. Hash table is a wrong data structure for lists.
> 
> In the tests it's a bit different. The goal of the testsuite should be
> to allow to write simple tests. Otherwise it will not be easy to
> persuade people to add tests at all. This means that sometimes it's
> necessary to cut corners.
> 
> In case of the testsuite that was failing, the result was correct in any
> ordering. The checker I wrote was dumb though ... really dumb. Just a
> strcmp. I don't want to waste time to write tests to do a full N to N
> checker. That does not make any sense.
> 
> If the code uses a hash table for convenience/performance reasons and
> it's convenient to test the contents of the table in the test suite,
> it's just best to format it to a string and compare it against a file.
> Especially since you get "VIR_TEST_REGEGERATE_OUTPUT" if you happen to
> modify the test. That would be rather compicated with a specific hash
> table checker and make test-writing more painful.
> 
>> In the long run, UUID's are random, but "#blockN" values are far less
>> random - it's the human randomness factor. No one is going to create 100
>> randomly named domain names or volume names, so factoring in the less
>> than randomly named elements is what I've been pondering.
> 
> This also was covered by dan's statement about the hashing function.
> 

The whole prime number thing is just one of those I recall using a prime
number for the divisor in a hashing algorithm is considered better - in
this case I hadn't considered the CodeGen half of the equation. It
wasn't yet on the radar.

>>> In the test code, the cases are added in a deterministic order. Since
>>> the hashing function in the tests is deterministic as well, the only
>>> thing that would influence the ordering is the bucket count. We don't
>>> modify them too often.
>>>
>>> I don't see what would make us alter mocks "just" to get the right
>>> answer here. The reason for this patch was, that the hash function is
>>> using bit operations and then treating the result as integers, which
>>> obviously does not work deterministically between big and little endian
>>> arches. Other than that, the test is now fully deterministic.
>>>
>>> Peter
>>>
>>
>> Understood - until someone changes the size of the table to be random or
>> less deterministic, then there's no problem.  But if that happens,
>> there's a problem.
> 
> Umm, why would anybody do that? It would not make any sense to allocate
> it at random. That's plain crazy. For anything other than that, it would
> be deterministic, and thus while the test output might change at the
> point when it would be changed, it would not lose determinism in the
> suite. The only randomization which makes sense is in the hashing
> function itself and we do have that.
> 

Poorly chosen "random" word by me... Perhaps unexpected or larger than
supplied value. We round up in various places for various reasons -
doing the same with a supplied hash table size wouldn't be out of the
question especially since we can "grow" to some larger value. In this
case, I just happened to alter the size to next prime for both create
and grow and fell down into this rabbit hole.

> Also as it was pointed out. The bucket count choice does not really add
> much in terms of distribution of elements to the buckets. Thanks to the
> auto-resize algorithm it's even somewhat resistant to undersized hashes.
> 
> Adding any kind of magic on top of this just does not seem to be worth
> at all.
> 

Fair enough.  I certainly don't want to spend my days characterizing the
performance of various algorithms to alter the size of a hash table that
has a hash algorithm that is distributing the elements rather evenly. I
still ask myself is having 1 chain longer than 8 too few to cause a
grow? And at what inflection point should a grow be done.

John

FWIW: Since it was simple enough, here's some mostly raw data adding 50K
elems to a hash table that starts out with a size of 1.

For every 1000 nbElems, I figured the number of empty buckets, the
number of "full" buckets (e.g. more than 8 elems), and the longest chain
as follows:

When the name is "#blockN":
nbElems=6000 size=4096, empty=955 full=1 longest=9    <== first "full"
nbElems=10000 size=4096, empty=341 full=5 longest=12
nbElems=15000 size=4096, empty=96 full=43 longest=14      inflection
nbElems=16000 size=4096, empty=77 full=68 longest=14  <== between full
nbElems=17000 size=4096, empty=61 full=91 longest=14      and empty
nbElems=18000 size=4096, empty=43 full=131 longest=14
nbElems=19000 size=4096, empty=36 full=181 longest=14
nbElems=20000 size=4096, empty=32 full=239 longest=15
nbElems=25000 size=4096, empty=9 full=637 longest=16
nbElems=30000 size=4096, empty=2 full=1287 longest=18
nbElems=40000 size=4096, empty=1 full=2649 longest=23
nbElems=50000 size=4096, empty=0 full=3508 longest=27

FWIW: Using an algorithm to have a prime "size" value divisor allows a
higher maximum size of 5779 buckets and the first "full" shows up
between 13K and 14K... Somewhere around 23K is the inflection between
empty and full... At 50K elems we get empty=1 full=2873 longest=22

For a randomly generated UUID:
nbElems=7000 size=4096, empty=736 full=1 longest=9    <== first "full"
nbElems=10000 size=4096, empty=360 full=5 longest=10
nbElems=15000 size=4096, empty=121 full=50 longest=13
nbElems=16000 size=4096, empty=99 full=80 longest=13      inflection
nbElems=17000 size=4096, empty=84 full=105 longest=13 <== between full
nbElems=18000 size=4096, empty=69 full=137 longest=14     and empty
nbElems=19000 size=4096, empty=55 full=177 longest=14
nbElems=20000 size=4096, empty=49 full=243 longest=14
nbElems=25000 size=4096, empty=13 full=649 longest=16
nbElems=30000 size=4096, empty=3 full=1276 longest=18
nbElems=40000 size=4096, empty=1 full=2634 longest=21
nbElems=50000 size=4096, empty=0 full=3513 longest=26

FWIW: Using the prime size algorithm... Between 9K and 10K we get our
first full, somewhere around 22K to 23K we get the inflection between
empty and full, and at 50K elems we get empty=2 full=2903 longest=22.

So the prime number does effect things a bit, but it doesn't really seem
to be significant.

Next, I started to tinker with the thought that perhaps not increasing
the size as soon as we get 1 bucket longer than 8 elems, but rather
trying to only grow when say 10% of the buckets were full or if any one
bucket had more than 16 elems and not have a cap to size as well as
perhaps growing the buckets more slowly once you reach certain nbElems.

For the #blockN's:
nbElems=10000 size=2459, empty=46 full=62 longest=14
nbElems=15000 size=3079, empty=19 full=164 longest=15
nbElems=20000 size=6029, empty=209 full=43 longest=12
nbElems=25000 size=7537, empty=273 full=52 longest=14
nbElems=30000 size=7537, empty=130 full=156 longest=15
nbElems=40000 size=9421, empty=140 full=282 longest=15
nbElems=50000 size=11777, empty=151 full=363 longest=15

@19507 elems is when we grow from 4813 to 6029 buckets
@21288 elems is when we grow from 6029 to 7537 buckets
@30203 elems is when we grow from 7537 to 9421 buckets
@46589 elems is when we grow from 9421 to 11777 buckets

For the UUID's:
nbElems=10000 size=2459, empty=38 full=65 longest=13
nbElems=15000 size=3079, empty=24 full=172 longest=15
nbElems=20000 size=4813, empty=80 full=133 longest=16
nbElems=25000 size=4813, empty=27 full=412 longest=16
nbElems=30000 size=6029, empty=47 full=426 longest=15
nbElems=40000 size=9421, empty=120 full=292 longest=15
nbElems=50000 size=11777, empty=174 full=380 longest=15

@25078 elems is when we grow from 4813 to 6029 buckets
@30409 elems is when we grow from 6029 to 7537 buckets
@35473 elems is when we grow from 7537 to 9421 buckets
@41512 elems is when we grow from 9421 to 11777 buckets

FWIW: For this round I also tracked the number of items in various
buckets when a resize occurred... It was a fairly standard deviation
with the largest number of buckets having 4-7 elements in their chains.
Furthermore the number of buckets with more than 10 elements was minimal.

That made me wonder if we shouldn't grow unless there was more than
perhaps 16 elems in a bucket... Taking that approach grows the table
beyond 4096 buckets only when at greater than 26K elements. Space vs.
time... We're not even close to needing 10's of thousands yet, but it
was to a degree an interesting exercise.




More information about the libvir-list mailing list