[Libguestfs] [PATCH v2] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)
Sam Eiderman
sameid at google.com
Sat Apr 25 17:32:35 UTC 2020
Hi Nir,
I think latin1,
How do you think we should handle latin1 errors then? Replace on latin1 or
replace on utf-8?
for codec in ["utf8", "latin1"]:
try:
return decode(b, codec)
except:
pass
return decode(b, "utf8", errors="replace")
(Pseudocode, will be implemented in c)
On Thu, Apr 23, 2020, 21:34 Nir Soffer <nsoffer at redhat.com> wrote:
> On Mon, Apr 20, 2020 at 3:38 PM Sam Eiderman <sameid at google.com> wrote:
> >
> > The python3 bindings create unicode objects from application strings
> > on the guest (i.e. installed rpm, deb packages).
> > It is documented that rpm package fields such as description should be
> > utf8 encoded - however in some cases they are not a valid unicode
> > string,
>
> So what are they? latin1 maybe?
>
> Maybe use:
>
> try:
> value.decode("utf-8")
> except UnicodeDecodeError:
> value.decode("latin1")
>
> This will always succeed, producing possibly garbage output but so is
> errors='replace'.
>
> > on SLES11 SP4 the following packages fail to be converted to
> > unicode using guestfs_int_py_fromstring() (which invokes
> > PyUnicode_FromString()):
> >
> > PackageKit
> > aaa_base
> > coreutils
> > dejavu
> > desktop-data-SLED
> > gnome-utils
> > hunspell
> > hunspell-32bit
> > hunspell-tools
> > libblocxx6
> > libexif
> > libgphoto2
> > libgtksourceview-2_0-0
> > libmpfr1
> > libopensc2
> > libopensc2-32bit
> > liborc-0_4-0
> > libpackagekit-glib10
> > libpixman-1-0
> > libpixman-1-0-32bit
> > libpoppler-glib4
> > libpoppler5
> > libsensors3
> > libtelepathy-glib0
> > m4
> > opensc
> > opensc-32bit
> > permissions
> > pinentry
> > poppler-tools
> > python-gtksourceview
> > splashy
> > syslog-ng
> > tar
> > tightvnc
> > xorg-x11
> > xorg-x11-xauth
> > yast2-mouse
> >
> > Fix this by globally changing guestfs_int_py_fromstring()
> > and guestfs_int_py_fromstringsize() to decode utf-8 with the "replace"
> > error handler:
> >
> > https://docs.python.org/3/library/codecs.html#error-handlers
> >
> > For example, this will decode PackageKit's description on SLES4 the
> > following way:
> >
> > Backend: pisi
> > S.�ağlar Onur <caglar at pardus.org.tr>
>
> What is the original text?
>
> Nir
>
> > Signed-off-by: Sam Eiderman <sameid at google.com>
> > ---
> > python/handle.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/python/handle.c b/python/handle.c
> > index 2fb8c18f0..427424707 100644
> > --- a/python/handle.c
> > +++ b/python/handle.c
> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> > #if PY_MAJOR_VERSION < 3
> > return PyString_FromString (str);
> > #else
> > - return PyUnicode_FromString (str);
> > + return PyUnicode_Decode(str, strlen(str), "utf-8", "replace");
> > #endif
> > }
> >
> > @@ -397,7 +397,7 @@ guestfs_int_py_fromstringsize (const char *str,
> size_t size)
> > #if PY_MAJOR_VERSION < 3
> > return PyString_FromStringAndSize (str, size);
> > #else
> > - return PyUnicode_FromStringAndSize (str, size);
> > + return PyUnicode_Decode(str, size, "utf-8", "replace");
> > #endif
> > }
> >
> > --
> > 2.26.1.301.g55bc3eb7cb9-goog
> >
> >
> > _______________________________________________
> > Libguestfs mailing list
> > Libguestfs at redhat.com
> > https://www.redhat.com/mailman/listinfo/libguestfs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20200425/5f007234/attachment.htm>
More information about the Libguestfs
mailing list