[libvirt] [PATCH] Add a virStrSplitQuoted for splitting quoted strings

Daniel P. Berrange berrange at redhat.com
Wed Mar 14 16:50:47 UTC 2012


On Wed, Mar 14, 2012 at 10:40:41AM -0600, Eric Blake wrote:
> On 03/14/2012 09:19 AM, Daniel P. Berrange wrote:
> > From: "Daniel P. Berrange" <berrange at redhat.com>
> > 
> > To facilitate parsing of argv[] style strings, provide a
> > virStrSplitQuoted API which will split a string on the listed
> > separators, but also allow for quoting with ' or ".
> > 
> > * src/libvirt_private.syms, src/util/util.c,
> >   src/util/util.h: Implement virStrSplitQuoted
> > * tests/utiltest.c: Some tests for virStrSplitQuoted
> > ---
> >  src/libvirt_private.syms |    1 +
> >  src/util/util.c          |  117 ++++++++++++++++++++++++++++++++++++++++++++++
> >  src/util/util.h          |    2 +
> >  tests/utiltest.c         |   89 ++++++++++++++++++++++++++++++++--
> >  4 files changed, 203 insertions(+), 6 deletions(-)
> 
> This looks like it is repeating some of the code in
> virsh.c:vshCommandStringGetArg; any chance we can combine them?  In
> particular, the ability to mimic shell handling of \ escapes, as well as
> the difference in behavior of \ inside "" vs. '', seems like it will
> come in handy.

Ah I had forgotten about that code. Can you clarify the difference
in \ handling.  I guess you mean that inside '', \ can only be used
for  \\ and \', while inside "", it can do all the standard shell
escapes like \t, \n, etc ?

> 
> >  
> > +
> > +static char *
> > +virStrDupUnescape(const char *src, size_t len)
> > +{
> > +    char *ret;
> > +    size_t i, j;
> > +    bool escape = false;
> > +
> > +    if (VIR_ALLOC_N(ret, len + 1) < 0)
> > +        return NULL;
> > +
> > +    for (i = 0, j = 0 ; i < len ; i++) {
> > +        if (escape) {
> > +            escape = false;
> > +            ret[j++] = src[i];
> > +        } else if (src[i] == '\\') {
> > +            escape = true;
> > +        } else {
> > +            ret[j++] = src[i];
> > +        }
> 
> So this version only strips backslash.  Looks okay in isolation.
> 
> > +/**
> > + * virStrSplitQuoted:
> > + * @src: string to split
> > + * @sep: list of separator characters
> > + *
> > + * Split the string 'src' into chunks, separated by one or more of
> > + * the characters in 'sep'. Allow ' or " to quote strings even if
> > + * they contain 'sep'
> 
> No documentation of backslash handling.  Are we trying to emulate shell
> parsing here (where depending on outer, "", or '', \ behaves differently)?
> 
> Or are we trying to emulate C string parsing, where \n is translated to
> newline?
> 
> What you have here does neither; although I didn't spot any flaw in the
> code, I don't know if it's the algorithm we want to be using.

I should have sent this paired with my other patch for <cmdline>
handling in LXC. That is the intended use case for this function.

I'm not sure that anyone has ever clearly defined what escaping
syntax is used for /proc/cmdline (which is what <cmdline> is
representing.

SystemD's parser is what I modelled my code on, though it in fact
does not unescape anything. eg. if parsing


   foo "bar \" wizz" eek

systemd seems to return

   foo
   bar \" wizz
   eek

while I return

   foo
   bar " wizz
   eek

> > +struct StringSplitData {
> > +    const char *src;
> > +    const char *sep;
> > +    const char **bits;
> > +};
> 
> A point in your favor, for at least testing what you parse!  If we
> change our mind to mimic shell or C parsing, then we'd have to update
> these tests.

Yes, escaping rules blow my mind unless I can test them :-)

> > +    const char *bits1[] = { "foo", "bar", NULL };
> > +    DO_TEST_STRING("foo bar", " ", bits1);
> > +    DO_TEST_STRING("foo 'bar'", " ", bits1);
> > +    DO_TEST_STRING("foo \"bar\"", " ", bits1);
> > +    DO_TEST_STRING("   foo \"bar\"", " ", bits1);
> > +    DO_TEST_STRING("   foo \"bar\"", " ", bits1);
> > +    DO_TEST_STRING("   foo	\"bar\"\n   ", " \t\n\r", bits1);
> > +
> > +    const char *bits2[] = { "foo", "bar wizz", "eek", NULL };
> > +    DO_TEST_STRING("foo 'bar wizz' eek", " ", bits2);
> > +    DO_TEST_STRING("foo \"bar wizz\" eek", " ", bits2);
> 
> What about "foo bar\ wizz eek", if \ can escape outside of quotes?

Hmm, possibly

> > +
> > +    const char *bits3[] = { "foo", "'bar' \"wizz\"", "eek", NULL };
> 
> It would also be nice to have a literal backslash in the expected
> results, to prove that we can properly escape them.

Good point.

> Overall, I like the idea of the new function, but I'm worried that
> introducing yet another parser could hurt us (users will be asking "now
> which escape style is in effect here, and how does it differ from
> standardized escape styles that I'm used to?").

I think that perhaps we should have  virStrSplitQuoted just return
the split pieces, with *no* unescaping. And then have separate
functions to escape/unescape individual string pieces after the
fact.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list