[Libguestfs] [PATCH REPOST] Introduce a wrapper around xmlParseURI.

Pino Toscano ptoscano at redhat.com
Fri Nov 30 12:35:46 UTC 2018

On Friday, 2 November 2018 15:23:07 CET Richard W.M. Jones wrote:
> We only use xmlParseURI to parse our own "homebrew" URIs, for example
> the ones used by guestfish --add or virt-v2v.  Unfortunately
> xmlParseURI cannot handle URIs with spaces or other non-RFC-compliant
> characters so simple commands like these fail:
>   $ guestfish -a 'ssh://example.com/virtual machine.img'
>   guestfish: --add: could not parse URI 'ssh://example.com/virtual machine.img'
>   $ guestfish -a 'ssh://example.com/バーチャルマシン.img'
>   guestfish: --add: could not parse URI 'ssh://example.com/バーチャルマシン.img'
> This is a usability problem.  However since these are not expected to
> be generic RFC-compliant URIs we can perform the required
> percent-escaping ourselves instead of demanding that the user does
> this.
> Note that the wrapper function should not be used on real URLs or
> libvirt URLs.
> ---

I do not think this is a good idea at all.

First of all, converting the URI to UTF-8 is a bad idea, since that that
is not the encoding of the URI.  Second, it does a search&replace on
the whole string, just skipping some characters however not considering
the various parts of an URI.  Also, this will break well-formed URIs
that use e.g. Punycode.

In the end, users must provide compliant URIs anyway, so letting them
always do the proper job seems the better option to me.  Yes, I know it
is not the best option for users manually invoking the tools, but
certainly less problematic than dealing with all the possible issues
of partially-encoded-URIs.  This is also explained by the ycombinator
link mentioned in a comment, and how this is a mess in e.g. modern web

Let's not get into this mess, and just stay with the simple, and
effective solution: always require compliant URIs.

Pino Toscano
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20181130/09243d3f/attachment.sig>

More information about the Libguestfs mailing list