<div dir="ltr">Hey Pino,<div><br></div><div>Can you search for the previous patches I submitted? I had some discussions regarding this with Daniel and Nir.</div><div><br></div><div>Thanks!</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 30, 2020 at 11:43 AM Pino Toscano <<a href="mailto:ptoscano@redhat.com">ptoscano@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Sunday, 26 April 2020 20:14:03 CEST Sam Eiderman wrote:<br> > The python3 bindings create PyUnicode objects from application strings<br> > on the guest (i.e. installed rpm, deb packages).<br> > It is documented that rpm package fields such as description should be<br> > utf8 encoded - however in some cases they are not a valid unicode<br> > string, on SLES11 SP4 the encoding of the description of the following<br> > packages is latin1 and they fail to be converted to unicode using<br> > guestfs_int_py_fromstring() (which invokes PyUnicode_FromString()):<br> <br> Sorry, I wanted to reach our resident Python maintainers to get their<br> feedback, and so far had no time for it. Will do it shortly.<br> <br> BTW do you have a reproducer I can actually try freely?<br> <br> > diff --git a/python/handle.c b/python/handle.c<br> > index 2fb8c18f0..fe89dc58a 100644<br> > --- a/python/handle.c<br> > +++ b/python/handle.c<br> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)<br> > #if PY_MAJOR_VERSION < 3<br> > return PyString_FromString (str);<br> > #else<br> > - return PyUnicode_FromString (str);<br> > + return guestfs_int_py_fromstringsize (str, strlen (str));<br> > #endif<br> > }<br> > <br> > @@ -397,7 +397,12 @@ guestfs_int_py_fromstringsize (const char *str, size_t size)<br> > #if PY_MAJOR_VERSION < 3<br> > return PyString_FromStringAndSize (str, size);<br> > #else<br> > - return PyUnicode_FromStringAndSize (str, size);<br> > + PyObject *s = PyUnicode_FromString (str);<br> > + if (s == NULL) {<br> > + PyErr_Clear ();<br> > + s = PyUnicode_Decode (str, strlen(str), "latin1", "strict");<br> <br> Minor nit: space between "strlen" and the opening bracket.<br> <br> Also, isn't there any error we can check as a way to detect this<br> situation, rather than always attempting to decode it as latin1?<br> <br> Thanks,<br> -- <br> Pino Toscano</blockquote></div>