<div dir="ltr">I uploaded a v2, which does as you requested, more globally (across all python bindings) - tell me what you think.</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Apr 20, 2020 at 2:42 PM Daniel P. Berrangé <<a href="mailto:berrange@redhat.com">berrange@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, Apr 20, 2020 at 01:17:35PM +0300, Sam Eiderman wrote:<br>
> The python3 bindings create unicode objects from application strings<br>
> on the guest (i.e. installed rpm, deb packages).<br>
> It is documented that rpm package fields such as description should be<br>
> utf8 encoded - however in some cases they are not a valid unicode<br>
> string, on SLES11 SP4 the following packages fail to be converted to<br>
> unicode using guestfs_int_py_fromstring() (which invokes<br>
> PyUnicode_FromString()):<br>
> <br>
> PackageKit<br>
> aaa_base<br>
> coreutils<br>
> dejavu<br>
> desktop-data-SLED<br>
> gnome-utils<br>
> hunspell<br>
> hunspell-32bit<br>
> hunspell-tools<br>
> libblocxx6<br>
> libexif<br>
> libgphoto2<br>
> libgtksourceview-2_0-0<br>
> libmpfr1<br>
> libopensc2<br>
> libopensc2-32bit<br>
> liborc-0_4-0<br>
> libpackagekit-glib10<br>
> libpixman-1-0<br>
> libpixman-1-0-32bit<br>
> libpoppler-glib4<br>
> libpoppler5<br>
> libsensors3<br>
> libtelepathy-glib0<br>
> m4<br>
> opensc<br>
> opensc-32bit<br>
> permissions<br>
> pinentry<br>
> poppler-tools<br>
> python-gtksourceview<br>
> splashy<br>
> syslog-ng<br>
> tar<br>
> tightvnc<br>
> xorg-x11<br>
> xorg-x11-xauth<br>
> yast2-mouse<br>
> <br>
> This is a surgical fix for inspect_list_applications2()'s description<br>
> field.<br>
> <br>
> Signed-off-by: Sam Eiderman <<a href="mailto:sameid@google.com" target="_blank">sameid@google.com</a>><br>
> ---<br>
> generator/<a href="http://python.ml" rel="noreferrer" target="_blank">python.ml</a> | 8 ++++++++<br>
> 1 file changed, 8 insertions(+)<br>
> <br>
> diff --git a/generator/<a href="http://python.ml" rel="noreferrer" target="_blank">python.ml</a> b/generator/<a href="http://python.ml" rel="noreferrer" target="_blank">python.ml</a><br>
> index f0d6b5d96..7394a943a 100644<br>
> --- a/generator/<a href="http://python.ml" rel="noreferrer" target="_blank">python.ml</a><br>
> +++ b/generator/<a href="http://python.ml" rel="noreferrer" target="_blank">python.ml</a><br>
> @@ -170,6 +170,14 @@ and generate_python_structs () =<br>
> function<br>
> | name, FString -><br>
> pr " value = guestfs_int_py_fromstring (%s->%s);\n" typ name;<br>
> + (match typ, name with<br>
> + | "application", "app_description"<br>
> + | "application2", "app2_description" -><br>
> + pr " if (value == NULL) {\n";<br>
> + pr " value = guestfs_int_py_fromstring (\"\");\n";<br>
> + pr " PyErr_Clear ();\n";<br>
> + pr " }\n";<br>
<br>
I don't think this is especially friendly/helpful to users.<br>
<br>
I'm assuming that there's just a handful of characters that are not<br>
valid UTF-8. I think we really want a graceful conversion that will<br>
convert as much as possible, replacing any invalid UTF-8 with some<br>
generic placeholder character.<br>
<br>
Regards,<br>
Daniel<br>
-- <br>
|: <a href="https://berrange.com" rel="noreferrer" target="_blank">https://berrange.com</a> -o- <a href="https://www.flickr.com/photos/dberrange" rel="noreferrer" target="_blank">https://www.flickr.com/photos/dberrange</a> :|<br>
|: <a href="https://libvirt.org" rel="noreferrer" target="_blank">https://libvirt.org</a> -o- <a href="https://fstop138.berrange.com" rel="noreferrer" target="_blank">https://fstop138.berrange.com</a> :|<br>
|: <a href="https://entangle-photo.org" rel="noreferrer" target="_blank">https://entangle-photo.org</a> -o- <a href="https://www.instagram.com/dberrange" rel="noreferrer" target="_blank">https://www.instagram.com/dberrange</a> :|<br>
<br>
</blockquote></div>