[libvirt] [PATCH v2 1/3] conf: Properly truncate wide character names in virDomainObjGetShortName

Tue Aug 29 07:34:16 UTC 2017

On Fri, Aug 25, 2017 at 06:03:15PM -0400, John Ferlan wrote:
>
>
>On 08/25/2017 07:21 AM, Martin Kletzander wrote:
>> We always truncated the name at 20 bytes instead of characters.  In
>> case 20 bytes were in the middle of a multi-byte character, then the
>> string became invalid and various parts of the code would error
>> out (e.g. XML parsing of that string).  Let's instead properly
>> truncate it after 20 characters instead.
>>
>> We cannot test this in our test suite because we would need to know
>> what locales are installed on the system where the tests are ran and
>> if there is supported one (most probably there will be, but we cannot
>> be 100% sure), we could initialize gettext in qemuxml2argvtest, but
>> there would still be a chance of getting two different (both valid,
>> though) results.
>
>Yeah - I tried - it got ugly fast.... Although I suppose in a test
>environment would could just "pick" a charset like en_US.UTF-8 and just
>ensure that things work as expected with that. Perhaps even make the
>shortened name tests go at the bottom with a FAILURE to compare argv
>results before setting the locale and a success after a call on the same
>test by name. Not something for this series because I'm not even sure it
>would work properly.  Maybe something for the byte sized tasks (rather
>literally too ;-)!)
>

The problem is that we can't pick any charset except "C", which is the
default.  There is no guarantee that you will have the charset installed
in the system.  For example I only have en_GB.utf8, cs_CZ.utf8 and the
defaults, C and POSIX.  We could, maybe, programmatically construct the
locale in the test itself, but that's way big of a hammer IMO.

>>
>> In order to test this it is enough to start a machine with a name for
>> which trimming it after 20 bytes would create invalid sequence (e.g.
>> 1234567890123456789č where č is any multi-byte character).  Then start
>> the domain and restart libvirtd.  The domain would disappear because
>> such illegal sequence will not go through the XML parser.  And that's
>> not a bug of the parser, it should not be in the XML in the first
>> place, but since we don't use any sophisticated formatter, just
>> mash some strings together, the formatting succeeds.
>
>If the domain was started before these patches, then the domain is still
>hidden since the shortened path names won't match... Such is life...
>

Such domain will never be loaded.  It's not because the paths won't
match.  We don't generate the paths for running domains, but we get them
from the state XML (that's needed so that we can change the way we
shorten characters and still be able to find domains with older
"shortening rules").  The problem with the domains that get lost is that
the XML consists of an invalid string literal and it is already saved on
the disk.  The parser (libxml2) will fail (rightfully) with an error
that such XML is invalid.  We can't do anything with that.  And,
honestly, I can't say I care that much about that as this really isn't
someone would notice in production.

>>
>> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1448766
>>
>> Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
>> ---
>>  src/conf/domain_conf.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 42 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
>> index 47eba4dbb315..dd73158f028b 100644
>> --- a/src/conf/domain_conf.c
>> +++ b/src/conf/domain_conf.c
>> @@ -27131,6 +27131,8 @@ virDomainDefHasMemballoon(const virDomainDef *def)
>>  }
>>
>>
>> +#define VIR_DOMAIN_SHORT_NAME_MAX 20
>> +
>>  /**
>>   * virDomainObjGetShortName:
>>   * @vm: Machine for which to get a name
>> @@ -27141,15 +27143,52 @@ virDomainDefHasMemballoon(const virDomainDef *def)
>>  char *
>>  virDomainObjGetShortName(const virDomainDef *def)
>>  {
>> -    const int dommaxlen = 20;
>> +    wchar_t wshortname[VIR_DOMAIN_SHORT_NAME_MAX + 1] = {0};
>> +    size_t len = 0;
>> +    char *shortname = NULL;
>>      char *ret = NULL;
>>
>> -    ignore_value(virAsprintf(&ret, "%d-%.*s",
>> -                             def->id, dommaxlen, def->name));
>> +    /* No need to do the whole conversion thing when there are no multibyte
>> +     * characters.  The same applies for illegal sequences as they can occur
>> +     * with incompatible locales. */
>> +    len = mbstowcs(NULL, def->name, 0);
>> +    if ((len == (size_t) -1 && errno == EILSEQ) ||
>> +        len == strlen(def->name)) {
>> +        ignore_value(virAsprintf(&ret, "%d-%.*s", def->id,
>> +                                 VIR_DOMAIN_SHORT_NAME_MAX, def->name));
>> +        return ret;
>
>consistently speaking
>
>    return NULL;
>

Not really, ret will be NULL here if and only if virAsprintf() failed.
This is not an error path, but rather a fallback path if either the
locale is incompatible or there are no wide characters (your
suggestion).  I hoped I explained that enough in the comment.

>> +    }
>> +
>> +    if (len == (size_t) -1 ||
>> +        mbstowcs(wshortname, def->name, VIR_DOMAIN_SHORT_NAME_MAX) == (size_t) -1) {
>> +        virReportError(VIR_ERR_INVALID_ARG, "%s",
>> +                       _("Cannot convert domain name to wide character string"));
>
>You could go with something like :
>
>virReportSystemError(errno,
>      _("Cannot convert domain name to wide character string, len=%zd"),
>      len);
>
>That way you'd know if you failed because len == -1 or the second
>mbstowcs call failed and you'd know why. Not sure ->name at this point
>could be considered an invalid argument, but getting the system error
>would certainly point you in the right direction.
>

Good point, I like that.

>FWIW:
>There may be an off by 1 error. With your this, 1234567890123456789č
>would be shortened to 19 characters (it's at 21), thus if you have two
>domains:
>
>1234567890123456789č
>
>and
>
>1234567890123456789
>
>They'd return the same answer
>

They won't.  That's the point of this whole function.  Such domains will
differ by their ID that's unique, the shortened name is ID-name.

>Of course so would 123456789012345678901 and 1234567890123456789012, but
>at least for those two the length is > 20.  Not sure if it matters or
>not as 19 or 20 characters for uniqueness is a lot.
>
>Still the bz had 7 multibyte characters that took up 21 char's - so it
>would seem things would be more limited with that charset. If that last
>character was the differentiator between all domains, then the short
>name is too short using only 18 chars that appear as 6 multibyte chars
>for the short name.
>

Not really.  I don't shorten the name to 20 bytes, but rather to 20
characters.  Keep in mind that mbstowcs()'s third parameter is the
number of wide characters that you want to convert, not the number of
bytes.

>Likewise a name using fully wide chars, e.g. ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ
>would be shortened to ÄèÎØÛÄèÎØÛ.
>
>At this point I'm ambivalent as to whether that's considered a problem.
>

There should not be any.  (after writing this sentence, the phrase
"famous last words" springs to mind for an _unknown_ reason :) )

>> +        return NULL;
>> +    }
>> +
>> +    len = wcstombs(NULL, wshortname, 0);
>> +    if (len == (size_t) -1)
>> +        return NULL;
>
>Would returning NULL here elicit a useful error message?
>

According to the manual pages there should not be any error case for
this, but I'll add the error reporting, just in case.

>>
>> +    if (len > VIR_DOMAIN_SHORT_NAME_MAX)
>> +        len = VIR_DOMAIN_SHORT_NAME_MAX;
>

Ah, the remnants of an unfinished patch...  This should not be here at all.

>Given a name fully using a wide character set if this were altered to:
>
>    if (len > VIR_DOMAIN_SHORT_NAME_MAX*2)
>        len = VIR_DOMAIN_SHORT_NAME_MAX*2;
>
>Then ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ shortens to ÄèÎØÛÄèÎØÛÄèÎØÛÄèÎØÛ, but
>1234567890123456789č! doesn't get shortened to 1234567890123456789č. The
>multibyte charsets would need what * 3?  It's somewhat ludicrous - but
>why I suggested originally to compare strlen() and mbstowcs() - if
>they're the same, less then 20, or having an error on the mbstowcs(),
>then just stick with the old logic.  Only fall into the new logic when
>wide or multibyte chars are present.
>

I hope this is explained by previous paragraphs.

>Of course the really harsh route is to count the width of each character
>in the array using mblen, then somehow mathematically get a result of 20
>"printed characters" of uniqueness.  Again, IDC really, but since I was
>looking and thinking about it I figured I'd at least mention it.
>
>So, I think the only required thing to fix would be the NULL return if
>len == -1 without an error message.  Beyond that altering the return to
>be NULL instead of ret and altering to use virReportSystemError are may
>as well do it since you need to fix the first - with that done:
>
>Reviewed-by: John Ferlan <jferlan at redhat.com>
>

Thanks for the review.  I'll read the other reviews in this series, fix
it up and push (if appropriate).

>John
>
>Whether the shortened name displayed "apparent" length rabbit hole needs
>adjustment - is up to you. I'm fine with it the way it is. I don't
>believe we ever documented our shortening "rules", but wide and
>multichar wide charsets really shorten things a lot.  Perhaps that's a
>cross that bridge when someone complains type thing!
>
>> +
>> +    if (VIR_ALLOC_N(shortname, len + 1) < 0)
>> +        return NULL;
>> +
>> +    if (wcstombs(shortname, wshortname, len) == (size_t) -1) {
>> +        virReportError(VIR_ERR_INVALID_ARG, "%s",
>> +                       _("Cannot convert domain name from wide character string"));
>> +        goto cleanup;
>> +    }
>> +
>> +    ignore_value(virAsprintf(&ret, "%d-%s", def->id, shortname));
>> + cleanup:
>> +    VIR_FREE(shortname);
>>      return ret;
>>  }
>>
>> +#undef VIR_DOMAIN_SHORT_NAME_MAX
>>
>>  int
>>  virDomainGetBlkioParametersAssignFromDef(virDomainDefPtr def,
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170829/bede8234/attachment-0001.sig>