[Libosinfo] [PATCH] force a UTF-8 locale for python3 to avoid broken ascii codec

Daniel P. Berrangé berrange at redhat.com
Thu Sep 5 17:19:00 UTC 2019


On Thu, Sep 05, 2019 at 07:08:27PM +0200, Fabiano Fidêncio wrote:
> On Wed, Mar 27, 2019 at 10:57 AM Daniel P. Berrangé <berrange at redhat.com> wrote:
> >
> > The python3 ascii codec violates POSIX C locale requirements by not being
> > 8-bit clean in its text handling. It raises an error for any byte with
> > top bit set
> >
> >   >       return codecs.ascii_decode(input, self.errors)[0]
> >   E       UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 419: ordinal not in range(128)
> >
> > To avoid this python bug we must force use of a UTF-8 locale. Ideally we
> > would use the C.UTF-8 locale, however, that is not portable across OS,
> > only existing on certain Linux distros. Instead we use the en_us.UTF-8
> > locale, but only for the character set data.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange at redhat.com>
> > ---
> >
> > Pushed as a CI build fix for FreeBSD distros
> >
> >  Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/Makefile b/Makefile
> > index c63cb6e..9d7f109 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -123,4 +123,4 @@ update-po:
> >          done
> >
> >  check: $(DATA_FILES) $(SCHEMA_FILES)
> > -       $(PYTHON) -m pytest $(PYTEST_LOG_LEVEL)
> > +       LC_ALL= LANG=C LC_CTYPE=en_US.UTF-8 $(PYTHON) -m pytest $(PYTEST_LOG_LEVEL)
> > --
> > 2.20.1
> >
> > _______________________________________________
> > Libosinfo mailing list
> > Libosinfo at redhat.com
> > https://www.redhat.com/mailman/listinfo/libosinfo
> 
> Daniel,
> 
> This commit is the reason of the following breakage (in my personal
> gitlab account):
> https://gitlab.com/fidencio/osinfo-db/-/jobs/288707257
> 
> It seems to happen because both debian & fedora (30+) containers do
> not have the required locale.
> I'd like to ask your suggestion on how to proceed here:
> - Shall we explicitly include glibc-langpack-en as part of the base packages?
>   - Its dependencies are: glibc, glibc-commonl
>   - Its size is: 6.0 M (on Fedora 30);
> - Shall we work around osinfo-db tests in a way that we can make it
> work without setting the locale?

In theory C.UTF-8 is our desired locale, but that is a non-standard
concept that is only carried as a downstream patch by certain distros.
Upstream glibc has not accepted it. It doesn't exist at all on *BSD.
Thus we picked en_US.UTF-8 as the only option that gives us UTF-8
which is portable across all known operating systems.

If you can't set the locale, the only option is to mandate python
3.7 as the minimum python version, which I think is too strict.

IOW, we shoud just intall the langpack.

FWIW, I'm proposing the exact same en_US.UTF-8 env var for libvirt
python code, so we'll need to deal with the same problem shortly
there too.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the Libosinfo mailing list