[libvirt] [PATCH V4 04/10] Use scripting for cleaning and renaming of chains

Thu Nov 17 19:59:49 UTC 2011

On 11/17/2011 12:33 PM, Eric Blake wrote:
> On 10/26/2011 05:36 AM, Stefan Berger wrote:
>> The resulting list of chain names is then used to delete all the found
>> chains by first flushing and then deleting them.
>>
>> The same function is also used for renaming temporary filters to their final
>> names.
>>
>> I tested this with the bash and dash as script interpreters.
> Well, there's still the overriding design nag that we want to avoid
> shell interpretation if at all possible, but that's a _much_ bigger
> rewrite for another day, so I can live with this current patch (it's not
> making our use of sh any worse than it already was).
>
Agree.
>>
>> +static const char ebtables_script_func_collect_chains[] =
> Reading shell code that has been quoted into a C string literal to be
> passed through printf is an interesting exercise :)
>
>> +    "collect_chains()\n"
> Making sure I understand the rest of this patch, this function is
> reached in two places by the rest of your patch, with code like:
> +        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
> thus, you are using one command substitution per rootchain, where this
> function then outputs words to be parsed as part of a resulting command.
>
Yes, so this function starts out with the name of an ebtables chain and 
tries to determine all dependent chains of it, i.e., 'subchains'.

  #> ebtables -t nat -L libvirt-I-tck-test205002
  Bridge table: nat

  Bridge chain: libvirt-I-tck-test205002, entries: 5, policy: ACCEPT
  -p IPv4 -j I-tck-test205002-ipv4
  -p ARP -j I-tck-test205002-arp
  -p 0x8035 -j I-tck-test205002-rarp
  -p 0x835 -j ACCEPT
  -j DROP

Assuming I have the interface 'tck-test205002', I then run this function 
to find the chains
I-tck-test205002-ipv4, I-tck-test205002-arp and I-tck-test-205002-rarp 
and then visit
each one of those and try to find the chains they are 'jumping into'.

> That's a lot of processes; would it be worth using a shell for loop
> instead of a command-substitution per rootchain
>
I followed your suggested code below.

[...]
>
> sc=$(/path/to/iptables -t nat -L $1 | ...
>
> Can you please include a comment mentioning the typical output from such
> a command that you intend to be parsing?
>
See above (uses ebtables rather than iptables).
> Technically, you don't need backslash-newline after pipe; a trailing
> pipe is sufficient to continue a statement to the next line in shell.
> But it doesn't hurt to leave it in either.
>
>> +    "       sed -n \"/Bridge chain*/,$ s/.*\\-j \\([%s]-.*\\)/\\1/p\")\n"
>                                       1  2     3       4      5
>
> 1. Did you really mean to match lines with "Bridge chai" or "Bridge
> chainnn"?  That * is probably superfluous.
>
No, that was wrong.
> 2. "$ " is not portable shell.  To be portable, it should be "\$ " or '$ '.
>
Didn't know...
> 3. \- is not a portable sed escape.  But you don't need escaping for
> '-', since searching for a literal - in a sed expression works just fine.
>
> 4. This %s matches up to the 'chains' argument in this code:
> +    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
> which I in turn traced back to:
> +    char chains[3] = {
> +        CHAINPREFIX_HOST_IN_TEMP, // 'J'
> +        CHAINPREFIX_HOST_OUT_TEMP, // 'P'
> +        0};
> so it looks like you are taking a line that looks like:
> Bridge chain ... -j J-...
> and turning it into:
> J-...
> while ignoring all other lines.
>
> 5. Use of "\(\)/\1" in shell is unusual (it's well-defined by POSIX that
> the backslash is preserved literally if the next character is not
> backquote, dollar, backslash, or newline), but for safety reason, I
> prefer to write it as "\\(\\)/\\1" to make it obvious that in the shell
> code I meant a literal backslash rather than relying on POSIX rules.
>
> Put all of these together, and this line should probably be:
>
> "   sed -n \"/Bridge chain/,\\$ s/.*-j \\\\([%s]-.*\\\\)/\\\\1/p\")\n"
>
Thanks.
> Ah, so the end result is that you echo each name found, as well as
> recurse on each name found to echo any dependent names.  Is order
> important (all names from the first chain must be output before any
> names from the recursive calls)?  If not, then I can rewrite this
> function to avoid local sc in the first place.  Here, in shell form
> (I'll leave you to convert it back into C string literal form):
>
> collect_chains()
> {
>    for tmp in $(iptables -t nat -L $1 | \
>          sed -n "/Bridge chain/,\$ s/.*-j \\([JP]-.*\\)/\\1/p"); do
>      echo $tmp
>      collect_chains $tmp
>    done
> }
>
>
I took this 'as-is'. Thanks.
>> +
>> +static const char ebiptables_script_func_rm_chains[] =
>> +    "rm_chains()\n"
> It looks like you are calling rm_chains() with a single argument
> containing a whitespace separated list of arguments.
> +    virBufferAddLit(buf, "rm_chains \"$a\"\n");
>
> Why not just call rm_chains with multiple arguments?
>       virBufferAddLit(buf, "rm_chains $a\n");
>
>> +    "{\n"
>> +    " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ]&&  %s -t %s -F $tmp; done\n"
>> +    " for tmp in `echo \"$1\"`; do [ -n \"$tmp\" ]&&  %s -t %s -X $tmp; done\n"
>> +    "}\n";
> Is it essential that all chains be flushed before any are deleted?  If
If they are flushed first they cannot reference each other. So that 
prevents not being able to delete one because it's still referenced by 
another one and then also the order in which the names are being 
processed doesn't matter as much.
> so, then keeping this as two for loops makes sense; if not, then these
> could be combined.
>
> Same story about `echo "$1"` being identical to the much simpler $1.
>
> Same story about $tmp always being non-empty in a shell for-loop where
> the tokens were created by word-splitting.
>
I had a lot of problems when adding support for dash to it. Probably the 
IFS was wrong at some point and so I added lots of [ -n $tmp ].
> With one argument containing a quoted list, I'd write this as:
>
> rm_chains()
> {
>    for tmp in $1; do %s -t %s -F $tmp; done
>    for tmp in $1; do %s -t %s -X $tmp; done
> }
>
> with multiple arguments (unquoted $a), I'd write it as:
>
> rm_chains()
> {
>    for tmp in $*; do %s -t %s -F $tmp; done
>    for tmp in $*; do %s -t %s -X $tmp; done
> }
>
I took the 2nd version.

> <sidenote>
> A point that might be worth a separate cleanup patch: it looks awkward
> to have to write the literal name of the script and the table name in
> multiple places into the script:
> +    virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE);
>
> generating:
>
> rm_chains()
> {
>    for tmp in $*; do /path/to/iptables -t nat -f $tmp; done...
> ..
> rm_chains $a
>
>
> Why not change your shell script to set up a shell variable up front for
> ebtables_cmd_path and EBTABLES_DEFAULT_TABLE, and use that as a shell
> variable instead of as a printf substitution?
As you mentioned, let's defer this to another patch.
> I'd write this as:
> rename_chains()
> {
>    for tmp in $1; do
>      case $tmp in
>        J*) /path/to/iptables -t nat -E $tmp I${tmp#?} ;;
>        P*) /path/to/iptables -t nat -E $tmp O${tmp#?} ;;
>      esac
>    done
> }
>
> or back in your C-string format:
>
> "rename_chains()\n"
> "{\n"
> "  for tmp in $1; do\n"
> "    case $tmp in\n"
> "      %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n"
> "      %c*) %s -t %s -E $tmp %c${tmp#?} ;;\n"
> "    esac\n"
> "  done\n"
> "}\n"
>
Took this 'as-is'.
>> +
>> +static const char ebiptables_script_set_ifs[] =
>> +    "IFS=$' \\t\\n'\n"
>> +    "[ `expr length \"$IFS\"` != 3 ]&&  "
>> +        "IFS=$(echo | %s '{ print \"\\t\\n \"}')\n";
> Oy - my head hurts!  $'' is not portable; neither is expr length.  And
sorry for this unwanted 'side-effect' :-)

> doing IFS=$() is just wrong - that eats the trailing newline, leaving
> you with just IFS=space-tab instead of the intended
> IFS=space-tab-newline.  Just do:
>
> static const char ebiptables_script_set_ifs[] =
>      "nl='\n"
>      "'; IFS=' ''\t'$nl\n"
>
> and you're done.  No need for shenanigans with command substitution,
> expr, or awk.
I took it as-is now. Thanks.
> [Side note - the only reason I write ' ''\t' rather than ' \t' is that I
> don't like seeing space-tab in generated scripts; some editors have a
> tendency to corrupt space-tab in the name of being "helpful", so
> separating the characters avoids that problem.]
>
>> +
>> +#define NWFILTER_FUNC_COLLECT_CHAINS ebtables_script_func_collect_chains
>> +#define NWFILTER_FUNC_DELETE_CHAINS ebiptables_script_func_rm_chains
> This particular change in names made it a bit harder to search the rest
> of your patch (I had to search two different strings to see where the
> function definition was emitted, vs. where it was called); I'd suggest
> either naming the function delete_chains, or the macro
> NWFILTER_FUNC_RM_CHAINS, for consistency.
Fixed.
>> +#define NWFILTER_FUNC_RENAME_CHAINS ebiptables_script_func_rename_chains
>> +#define NWFILTER_FUNC_SET_IFS ebiptables_script_set_ifs
>>
>>   #define VIRT_IN_CHAIN      "libvirt-in"
>>   #define VIRT_OUT_CHAIN     "libvirt-out"
>> @@ -2805,95 +2848,65 @@ ebtablesCreateTmpSubChain(virBufferPtr b
>>       return 0;
>>   }
> Phew - made it past the function definitions.  Now, on to how they are
> used.  [By the way, in case it's not obvious with my lengthy review
> above, I _like_ the fact that you are using shell functions to
> consolidate effort - as long as we are using a scripting language, we
> might as well use all of its power - and it does mean that any future
> attempt to refactor this away from a temporary shell script into libvirt
> primitives will have to keep that in mind]
>
>>
>> -
>> -static int
>> -_ebtablesRemoveSubChain(virBufferPtr buf,
>> -                        int incoming,
>> -                        const char *ifname,
>> -                        enum l3_proto_idx protoidx,
>> -                        int isTempChain)
>> +static int _ebtablesRemoveSubChains(virBufferPtr buf,
>> +                                    const char *ifname,
>> +                                    const char *chains)
>>   {
>> -    char rootchain[MAX_CHAINNAME_LENGTH], chain[MAX_CHAINNAME_LENGTH];
>> -    char chainPrefix;
>> -
>> -    if (isTempChain) {
>> -        chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN_TEMP
>> -                                : CHAINPREFIX_HOST_OUT_TEMP;
>> -    } else {
>> -        chainPrefix =(incoming) ? CHAINPREFIX_HOST_IN
>> -                                : CHAINPREFIX_HOST_OUT;
>> -    }
>> +    char rootchain[MAX_CHAINNAME_LENGTH];
>> +    unsigned i;
>>
>> -    PRINT_ROOT_CHAIN(rootchain, chainPrefix, ifname);
>> -    PRINT_CHAIN(chain, chainPrefix, ifname, l3_protocols[protoidx].val);
>> -
>> -    virBufferAsprintf(buf,
>> -                      "%s -t %s -D %s -p 0x%x -j %s" CMD_SEPARATOR
>> -                      "%s -t %s -F %s" CMD_SEPARATOR
>> -                      "%s -t %s -X %s" CMD_SEPARATOR,
>> +    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
>> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
>> +    virBufferAsprintf(buf, NWFILTER_FUNC_DELETE_CHAINS,
>>                         ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
>> -                      rootchain, l3_protocols[protoidx].attr, chain,
>> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE);
>>
>> -                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain,
>> +    virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path);
> Given my above reformulation of SET_IFS, you don't need gawk_cmd_path here.
>
Fixed.
>> +    virBufferAddLit(buf, "a=\"");
>> +    for (i = 0; chains[i] != 0; i++) {
>> +        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
>> +        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
>> +    }
>> +    virBufferAddLit(buf, "\"\n");
> This results in the shell code:
>
> a="$(collect_chains name) $(collect_chains name) "
>
> If you added looping in the shell definition of collect_chains, this
> could instead be:
>
> a=$(collect_chains name name)
>
> Not essential, but food for thought for a smaller script.
Did that now.
>>
>> -                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chain);
>> +    for (i = 0; chains[i] != 0; i++) {
>> +        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
>> +        virBufferAsprintf(buf,
>> +                          "%s -t %s -F %s\n",
>> +                          ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
>> +                          rootchain);
>> +    }
>> +    virBufferAddLit(buf, "rm_chains \"$a\"\n");
> This results in:
>
> rm_chains "$a"
>
> See my earlier comments where using $* instead of $1 would let you write
> this as:
>
> rm_chains $a
>
> instead.
>
Using the 'rm_chains $a' now.
>> +static int
>> +ebtablesRenameTmpSubAndRootChains(virBufferPtr buf,
>> +                                  const char *ifname)
>> +{
>> +    char rootchain[MAX_CHAINNAME_LENGTH];
>> +    unsigned i;
>> +    char chains[3] = {
>> +        CHAINPREFIX_HOST_IN_TEMP,
>> +        CHAINPREFIX_HOST_OUT_TEMP,
>> +        0};
>> +
>> +    virBufferAsprintf(buf, NWFILTER_FUNC_COLLECT_CHAINS,
>> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE, chains);
> This means you are outputting the function definition from two different
> call paths.  It doesn't hurt to (re-)define a function to the same
> value, while avoiding the function if it isn't used; on the other hand,
> I'm wondering if it is worth outputting the functions as part of the
> file header rather than risking having the function definitions appear
> multiple times throughout the temporary script.  I can live with your
> current approach, so don't bother refactoring things unless you want to.
Then I don't refactor this one.
> [I speak from experience - I'm one of the autoconf maintainers, and
> solving the problem of how to generate shell script output based on m4
> input, where pre-requisite shell functions are emitted at most once, up
> front, but used multiple times through the rest of the script, all in
> order to reduce the total size of the overall script, proved to be an
> interesting challenge in writing a code generator.]
>
>> +    virBufferAsprintf(buf, NWFILTER_FUNC_RENAME_CHAINS,
>> +                      CHAINPREFIX_HOST_IN_TEMP,
>> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
>> +                      CHAINPREFIX_HOST_IN,
>> +                      CHAINPREFIX_HOST_OUT_TEMP,
>> +                      ebtables_cmd_path, EBTABLES_DEFAULT_TABLE,
>> +                      CHAINPREFIX_HOST_OUT);
>> +
>> +    virBufferAsprintf(buf, NWFILTER_FUNC_SET_IFS, gawk_cmd_path);
>> +    virBufferAddLit(buf, "a=\"");
>> +    for (i = 0; chains[i] != 0; i++) {
>> +        PRINT_ROOT_CHAIN(rootchain, chains[i], ifname);
>> +        virBufferAsprintf(buf, "$(collect_chains %s) ", rootchain);
>>       }
> Similar comments about collect_chains usage possibly being easier to
> understand if it includes a shell for loop over multiple arguments.
>
Converted.
> Looking forward to seeing what v5 looks like!
>
I did all conversions and libvirt-tck test cases pass without problems 
or leaving any tables behind upon VM teardown.

Thanks a lot for your review and help. I'll send out v5 shortly.

    Stefan