[libvirt] [PATCH] Add a virStrSplitQuoted for splitting quoted strings
Daniel P. Berrange
berrange at redhat.com
Wed Mar 14 16:50:47 UTC 2012
On Wed, Mar 14, 2012 at 10:40:41AM -0600, Eric Blake wrote:
> On 03/14/2012 09:19 AM, Daniel P. Berrange wrote:
> > From: "Daniel P. Berrange" <berrange at redhat.com>
> >
> > To facilitate parsing of argv[] style strings, provide a
> > virStrSplitQuoted API which will split a string on the listed
> > separators, but also allow for quoting with ' or ".
> >
> > * src/libvirt_private.syms, src/util/util.c,
> > src/util/util.h: Implement virStrSplitQuoted
> > * tests/utiltest.c: Some tests for virStrSplitQuoted
> > ---
> > src/libvirt_private.syms | 1 +
> > src/util/util.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++
> > src/util/util.h | 2 +
> > tests/utiltest.c | 89 ++++++++++++++++++++++++++++++++--
> > 4 files changed, 203 insertions(+), 6 deletions(-)
>
> This looks like it is repeating some of the code in
> virsh.c:vshCommandStringGetArg; any chance we can combine them? In
> particular, the ability to mimic shell handling of \ escapes, as well as
> the difference in behavior of \ inside "" vs. '', seems like it will
> come in handy.
Ah I had forgotten about that code. Can you clarify the difference
in \ handling. I guess you mean that inside '', \ can only be used
for \\ and \', while inside "", it can do all the standard shell
escapes like \t, \n, etc ?
>
> >
> > +
> > +static char *
> > +virStrDupUnescape(const char *src, size_t len)
> > +{
> > + char *ret;
> > + size_t i, j;
> > + bool escape = false;
> > +
> > + if (VIR_ALLOC_N(ret, len + 1) < 0)
> > + return NULL;
> > +
> > + for (i = 0, j = 0 ; i < len ; i++) {
> > + if (escape) {
> > + escape = false;
> > + ret[j++] = src[i];
> > + } else if (src[i] == '\\') {
> > + escape = true;
> > + } else {
> > + ret[j++] = src[i];
> > + }
>
> So this version only strips backslash. Looks okay in isolation.
>
> > +/**
> > + * virStrSplitQuoted:
> > + * @src: string to split
> > + * @sep: list of separator characters
> > + *
> > + * Split the string 'src' into chunks, separated by one or more of
> > + * the characters in 'sep'. Allow ' or " to quote strings even if
> > + * they contain 'sep'
>
> No documentation of backslash handling. Are we trying to emulate shell
> parsing here (where depending on outer, "", or '', \ behaves differently)?
>
> Or are we trying to emulate C string parsing, where \n is translated to
> newline?
>
> What you have here does neither; although I didn't spot any flaw in the
> code, I don't know if it's the algorithm we want to be using.
I should have sent this paired with my other patch for <cmdline>
handling in LXC. That is the intended use case for this function.
I'm not sure that anyone has ever clearly defined what escaping
syntax is used for /proc/cmdline (which is what <cmdline> is
representing.
SystemD's parser is what I modelled my code on, though it in fact
does not unescape anything. eg. if parsing
foo "bar \" wizz" eek
systemd seems to return
foo
bar \" wizz
eek
while I return
foo
bar " wizz
eek
> > +struct StringSplitData {
> > + const char *src;
> > + const char *sep;
> > + const char **bits;
> > +};
>
> A point in your favor, for at least testing what you parse! If we
> change our mind to mimic shell or C parsing, then we'd have to update
> these tests.
Yes, escaping rules blow my mind unless I can test them :-)
> > + const char *bits1[] = { "foo", "bar", NULL };
> > + DO_TEST_STRING("foo bar", " ", bits1);
> > + DO_TEST_STRING("foo 'bar'", " ", bits1);
> > + DO_TEST_STRING("foo \"bar\"", " ", bits1);
> > + DO_TEST_STRING(" foo \"bar\"", " ", bits1);
> > + DO_TEST_STRING(" foo \"bar\"", " ", bits1);
> > + DO_TEST_STRING(" foo \"bar\"\n ", " \t\n\r", bits1);
> > +
> > + const char *bits2[] = { "foo", "bar wizz", "eek", NULL };
> > + DO_TEST_STRING("foo 'bar wizz' eek", " ", bits2);
> > + DO_TEST_STRING("foo \"bar wizz\" eek", " ", bits2);
>
> What about "foo bar\ wizz eek", if \ can escape outside of quotes?
Hmm, possibly
> > +
> > + const char *bits3[] = { "foo", "'bar' \"wizz\"", "eek", NULL };
>
> It would also be nice to have a literal backslash in the expected
> results, to prove that we can properly escape them.
Good point.
> Overall, I like the idea of the new function, but I'm worried that
> introducing yet another parser could hurt us (users will be asking "now
> which escape style is in effect here, and how does it differ from
> standardized escape styles that I'm used to?").
I think that perhaps we should have virStrSplitQuoted just return
the split pieces, with *no* unescaping. And then have separate
functions to escape/unescape individual string pieces after the
fact.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list