[Libguestfs] [PATCH v2] python: Fix UnicodeError in inspect_list_applications2() (RHBZ#1684004)

Sam Eiderman sameid at google.com
Sat Apr 25 17:32:35 UTC 2020


Hi Nir,
I think latin1,

How do you think we should handle latin1 errors then? Replace on latin1 or
replace on utf-8?

for codec in  ["utf8", "latin1"]:
  try:
    return decode(b, codec)
  except:
    pass
return decode(b, "utf8", errors="replace")

(Pseudocode, will be implemented in c)



On Thu, Apr 23, 2020, 21:34 Nir Soffer <nsoffer at redhat.com> wrote:

> On Mon, Apr 20, 2020 at 3:38 PM Sam Eiderman <sameid at google.com> wrote:
> >
> > The python3 bindings create unicode objects from application strings
> > on the guest (i.e. installed rpm, deb packages).
> > It is documented that rpm package fields such as description should be
> > utf8 encoded - however in some cases they are not a valid unicode
> > string,
>
> So what are they? latin1 maybe?
>
> Maybe use:
>
>     try:
>         value.decode("utf-8")
>     except UnicodeDecodeError:
>         value.decode("latin1")
>
> This will always succeed, producing possibly garbage output but so is
> errors='replace'.
>
> > on SLES11 SP4 the following packages fail to be converted to
> > unicode using guestfs_int_py_fromstring() (which invokes
> > PyUnicode_FromString()):
> >
> >  PackageKit
> >  aaa_base
> >  coreutils
> >  dejavu
> >  desktop-data-SLED
> >  gnome-utils
> >  hunspell
> >  hunspell-32bit
> >  hunspell-tools
> >  libblocxx6
> >  libexif
> >  libgphoto2
> >  libgtksourceview-2_0-0
> >  libmpfr1
> >  libopensc2
> >  libopensc2-32bit
> >  liborc-0_4-0
> >  libpackagekit-glib10
> >  libpixman-1-0
> >  libpixman-1-0-32bit
> >  libpoppler-glib4
> >  libpoppler5
> >  libsensors3
> >  libtelepathy-glib0
> >  m4
> >  opensc
> >  opensc-32bit
> >  permissions
> >  pinentry
> >  poppler-tools
> >  python-gtksourceview
> >  splashy
> >  syslog-ng
> >  tar
> >  tightvnc
> >  xorg-x11
> >  xorg-x11-xauth
> >  yast2-mouse
> >
> > Fix this by globally changing guestfs_int_py_fromstring()
> > and guestfs_int_py_fromstringsize() to decode utf-8 with the "replace"
> > error handler:
> >
> >  https://docs.python.org/3/library/codecs.html#error-handlers
> >
> > For example, this will decode PackageKit's description on SLES4 the
> > following way:
> >
> >     Backend: pisi
> >         S.�ağlar Onur <caglar at pardus.org.tr>
>
> What is the original text?
>
> Nir
>
> > Signed-off-by: Sam Eiderman <sameid at google.com>
> > ---
> >  python/handle.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/python/handle.c b/python/handle.c
> > index 2fb8c18f0..427424707 100644
> > --- a/python/handle.c
> > +++ b/python/handle.c
> > @@ -387,7 +387,7 @@ guestfs_int_py_fromstring (const char *str)
> >  #if PY_MAJOR_VERSION < 3
> >    return PyString_FromString (str);
> >  #else
> > -  return PyUnicode_FromString (str);
> > +  return PyUnicode_Decode(str, strlen(str), "utf-8", "replace");
> >  #endif
> >  }
> >
> > @@ -397,7 +397,7 @@ guestfs_int_py_fromstringsize (const char *str,
> size_t size)
> >  #if PY_MAJOR_VERSION < 3
> >    return PyString_FromStringAndSize (str, size);
> >  #else
> > -  return PyUnicode_FromStringAndSize (str, size);
> > +  return PyUnicode_Decode(str, size, "utf-8", "replace");
> >  #endif
> >  }
> >
> > --
> > 2.26.1.301.g55bc3eb7cb9-goog
> >
> >
> > _______________________________________________
> > Libguestfs mailing list
> > Libguestfs at redhat.com
> > https://www.redhat.com/mailman/listinfo/libguestfs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20200425/5f007234/attachment.htm>


More information about the Libguestfs mailing list