[libvirt] [PATCH 1/6] Optimize the elements the iterator visits.

Tue Jan 10 15:02:01 UTC 2012

On 01/09/2012 07:05 PM, Eric Blake wrote:
> On 12/09/2011 08:07 AM, Stefan Berger wrote:
>> In this patch we introduce testing whether the iterator points to a
>> unique set of entries that have not been seen before at one of the previous
>> iterations. The point is to eliminate duplicates and with that unnecessary
>> filtering rules by preventing identical filtering rules from being
>> instantiated.
>> Example with two lists:
>>
>> list1 = [1,2,1]
>> list2 = [1,3,1]
> Is this something that happens frequently in practice due to user input
> (that is, does the user really output list elements more than once), or
> something that happens more because of combinatorial combinations as you
> expand user inputs into something more precise?  I guess I'd like to
> know why such repetitions are likely to occur, although I can agree with
> doing less firewall rules if we can determine that the rules are duplicates.

Fair question. It's an optimization that I would admit is not absolutely 
necessary, but reduces the number of rules that are being generated in 
case duplicates would occur. Doing this calculation upfront rather than 
having duplicate rules evaluated on possibly thousands of packets seems 
worth the effort. The algorithm is rather short but maybe not as 
efficient as it should be.

>>   /*
>> + * Test whether the iterator entry points to a distinguished set of entries
>> + * that have not been seen before at one of the previous iterations.
>> + *
>> + * The point of this function is to eliminate duplicates.
>> + * Example with two lists:
>> + *
>> + * list1 = [1,2,1]
>> + * list2 = [1,3,1]
>> + *
>> + * The 1st iteration would take the 1st items of each list ->  1,1
>> + * The 2nd iteration would take the 2nd items of each list ->  2,3
>> + * The 3rd iteration would take the 3rd items of each list ->  1,1 but
>> + * skip them since this pair has already been encountered in the 1st iteration
>> + */
>> +static bool
>> +virNWFilterVarCombIterEntryAreUniqueEntries(virNWFilterVarCombIterEntryPtr cie,
>> +                                            virNWFilterHashTablePtr hash)
>> +{
>> +    unsigned int i, j;
>> +    virNWFilterVarValuePtr varValue, tmp;
>> +    const char *value;
>> +
>> +    varValue = virHashLookup(hash->hashTable, cie->varNames[0]);
>> +
>> +    value = virNWFilterVarValueGetNthValue(varValue, cie->curValue);
>> +
>> +    for (i = 0; i<  cie->curValue; i++) {
>> +        if (STREQ(value, virNWFilterVarValueGetNthValue(varValue, i))) {
>> +            bool isSame = true;
> This function is called in a loop over the list, and itself does a loop
> over the list, making it a quadratic check for duplicates.  Am I correct
> that lists can provided by the user with arbitrary lengths?

Each variable can have an number of elements >= 1, for example 
A=[10,20,30,20].

>> +            for (j = 1; j<  cie->nVarNames; j++) {
> Furthermore, we add another layer of looping (this time over the number
> of parallel lists being combined), for an overall cubic algorithm.  Is
> the number of parallel lists that can be provided by the user arbitrary?

The number of lists depends on the number of variables that are provided 
in one rule. So, $A, $B, and $C in one rule in different fields like 
this one below has 3 different lists (where the iterators ('@<n>') only 
change the behaviour for how the instantiated rules are generated):

<rule action='accept' direction='out'>
<tcp srcipaddr='$A[@1]' srcportstart='$B[@2]' dstportstart='$C[@3]'
            dscp='4'/>
</rule>

> That is, I'm wondering if we need to start worrying about reducing
> complexity, and coming up with an algorithm that isn't quite so brute
> force so as not to take quite so long on arbitrary-length inputs.

In case of no duplicates in $A, we have a complexity of O(n) [optimal 
case] for checking of a single rule (only 'vertical' checking, no 
'horizontal' checking)

In case $A has duplicates the complexity then becomes O(n^2) in the 
worst case since we do horizonal checking accross $B and $C as well.

I think the elements in $A for example would have to be sorted for 
faster detection of duplicates. A = [10,20,30,20] -> A = [10,20,20,30]. 
Now if I step on A[2], I would see that A[2-1] = A[2] and wouldn't have 
to look at A[0] at all. So that would be O[1]. But sorting the elements 
may alter their meaning in terms of the sequential processing of the 
rules. Got to think about this...

> On the other hand, until someone actually demonstrates slow input, I'm
> okay with making assumptions that most users won't be merging large
> amounts of long parallel lists, and that the overhead of reducing
> algorithmic complexity may cost more overhead than all expected use
> cases just being brute-forced.  So I'm okay checking this in as-is,
> until (and unless) we have performance numbers proving that we need to
> be smarter.
>
OK.
>> @@ -414,10 +460,15 @@ virNWFilterVarCombIterNext(virNWFilterVa
>>       unsigned int i;
>>
>>       for (i = 0; i<  ci->nIter; i++) {
>> +next:
>>           ci->iter[i].curValue++;
>> -        if (ci->iter[i].curValue<= ci->iter[i].maxValue)
>> +        if (ci->iter[i].curValue<= ci->iter[i].maxValue) {
>> +            if (!virNWFilterVarCombIterEntryAreUniqueEntries(
>> +&ci->iter[i], ci->hashTable)) {
>> +                goto next;
> Why do we need a backwards goto, when a 'continue' would get to the same
> location?

Using the goto we don't increase 'i' but only do ci->iter[i].curValue++, 
which is what is needed.

>> +            }
>>               break;
>> -        else
>> +        } else
>>               ci->iter[i].curValue = 0;
>>       }
> Coding style - the else needs {} since you added {} for the if.
Fixed.

> I think this is okay, but wait until I review the rest of the series to
> see how it was used in context.

I'll wait. Thanks.

    Stefan