[libvirt] [PATCH] Add a virStrSplitQuoted for splitting quoted strings

Eric Blake eblake at redhat.com
Wed Mar 14 16:40:41 UTC 2012


On 03/14/2012 09:19 AM, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" <berrange at redhat.com>
> 
> To facilitate parsing of argv[] style strings, provide a
> virStrSplitQuoted API which will split a string on the listed
> separators, but also allow for quoting with ' or ".
> 
> * src/libvirt_private.syms, src/util/util.c,
>   src/util/util.h: Implement virStrSplitQuoted
> * tests/utiltest.c: Some tests for virStrSplitQuoted
> ---
>  src/libvirt_private.syms |    1 +
>  src/util/util.c          |  117 ++++++++++++++++++++++++++++++++++++++++++++++
>  src/util/util.h          |    2 +
>  tests/utiltest.c         |   89 ++++++++++++++++++++++++++++++++--
>  4 files changed, 203 insertions(+), 6 deletions(-)

This looks like it is repeating some of the code in
virsh.c:vshCommandStringGetArg; any chance we can combine them?  In
particular, the ability to mimic shell handling of \ escapes, as well as
the difference in behavior of \ inside "" vs. '', seems like it will
come in handy.

>  
> +
> +static char *
> +virStrDupUnescape(const char *src, size_t len)
> +{
> +    char *ret;
> +    size_t i, j;
> +    bool escape = false;
> +
> +    if (VIR_ALLOC_N(ret, len + 1) < 0)
> +        return NULL;
> +
> +    for (i = 0, j = 0 ; i < len ; i++) {
> +        if (escape) {
> +            escape = false;
> +            ret[j++] = src[i];
> +        } else if (src[i] == '\\') {
> +            escape = true;
> +        } else {
> +            ret[j++] = src[i];
> +        }

So this version only strips backslash.  Looks okay in isolation.

> +/**
> + * virStrSplitQuoted:
> + * @src: string to split
> + * @sep: list of separator characters
> + *
> + * Split the string 'src' into chunks, separated by one or more of
> + * the characters in 'sep'. Allow ' or " to quote strings even if
> + * they contain 'sep'

No documentation of backslash handling.  Are we trying to emulate shell
parsing here (where depending on outer, "", or '', \ behaves differently)?

Or are we trying to emulate C string parsing, where \n is translated to
newline?

What you have here does neither; although I didn't spot any flaw in the
code, I don't know if it's the algorithm we want to be using.


>  
> +struct StringSplitData {
> +    const char *src;
> +    const char *sep;
> +    const char **bits;
> +};

A point in your favor, for at least testing what you parse!  If we
change our mind to mimic shell or C parsing, then we'd have to update
these tests.

>  
> +    const char *bits1[] = { "foo", "bar", NULL };
> +    DO_TEST_STRING("foo bar", " ", bits1);
> +    DO_TEST_STRING("foo 'bar'", " ", bits1);
> +    DO_TEST_STRING("foo \"bar\"", " ", bits1);
> +    DO_TEST_STRING("   foo \"bar\"", " ", bits1);
> +    DO_TEST_STRING("   foo \"bar\"", " ", bits1);
> +    DO_TEST_STRING("   foo	\"bar\"\n   ", " \t\n\r", bits1);
> +
> +    const char *bits2[] = { "foo", "bar wizz", "eek", NULL };
> +    DO_TEST_STRING("foo 'bar wizz' eek", " ", bits2);
> +    DO_TEST_STRING("foo \"bar wizz\" eek", " ", bits2);

What about "foo bar\ wizz eek", if \ can escape outside of quotes?

> +
> +    const char *bits3[] = { "foo", "'bar' \"wizz\"", "eek", NULL };

It would also be nice to have a literal backslash in the expected
results, to prove that we can properly escape them.

Overall, I like the idea of the new function, but I'm worried that
introducing yet another parser could hurt us (users will be asking "now
which escape style is in effect here, and how does it differ from
standardized escape styles that I'm used to?").

-- 
Eric Blake   eblake at redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 620 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20120314/bb94be12/attachment-0001.sig>


More information about the libvir-list mailing list