use of study() with utf8 enabled breaks regexps

Jason Vas Dias jvdias at redhat.com
Wed Nov 9 20:21:33 UTC 2005


To: perlbug at perl.org
Subject: use of study() with utf8 enabled breaks regexps
Cc: fedora-perl-devel-list at redhat.com
Reply-To: jvdias at redhat.com
Message-Id: <5.8.7_481_1131565834 at jvdias>

This is a bug report for perl from jvdias at redhat.com,
generated with the help of perlbug 1.35 running under perl v5.8.7.


-----------------------------------------------------------------
[Please enter your report here]

Use of study() with utf8 support enabled breaks perl-5.8.7's
regular expressions :

OK without UTF:
$  echo 'ABDCEFGHIJK' | 
   perl -pe 'study; s/HIJK/1234/;'
ABDCEFG1234

$ echo 'ABCDEFGHIJK' |
  perl -e '$_=<>; study; print /HIJK/,"\n";'
1

FAILS with UTF:
$ echo 'ABDCEFGHIJK' |
  PERL_UNICODE=31 perl -pe 'study; s/HIJK/1234/;'
ABDCEFGHIJK

$ echo 'ABCDEFGHIJK' | 
  PERL_UNICODE=31 perl -e '$_=<>; study; print /HIJK/,"\n";'

(re did not match)

Seems to be study() that is the culprit:
$ echo 'ABDCEFGHIJK' | 
  PERL_UNICODE=31 perl -pe 's/HIJK/1234/;'
ABDCEFG1234

And it is because $_ gets utf8-ness from STDIN:

$ echo 'ABDCEFGHIJK' |
  PERL_UNICODE=63 perl -e '$_=<>; study; print /HIJK/ ? "OK" : "FAIL","\n";'
FAIL

$ echo 'ABDCEFGHIJK' |
  perl -e '$_=<>; study; print /HIJK/ ? "OK" : "FAIL","\n";'
OK

$ echo 'ABDCEFGHIJK' | PERL_UNICODE=63
  perl -e '$_=<>; print /HIJK/ ? "OK" : "FAIL","\n";'
OK

This was in the 'en_US.UTF-8' locale. If I make utf-8 support
conditional on locale, the problem goes away for the C locale:

$ echo 'ABDCEFGHIJK' |
  PERL_UNICODE=127 LC_ALL=C perl -e '$_=<>; study; print /HIJK/ ? "OK" : "FAIL","\n";'
OK

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
This perlbug was built using Perl v5.8.7 in the Red Hat build system.
It is being executed now by Perl v5.8.7 - Tue Nov  8 16:24:03 EST 2005.

Site configuration information for perl v5.8.7:

Configured by Red Hat, Inc. at Tue Nov  8 16:24:03 EST 2005.

Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=linux, osvers=2.6.14-1.1655_fc5, archname=i386-linux-thread-multi
    uname='linux jvdias 2.6.14-1.1655_fc5 #1 tue nov 8 06:55:58 est 2005 i686 i686 i386 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=pentium4 -fasynchronous-unwind-tables -Dversion=5.8.7 -Dmyhostname=localhost -Dperladmin=root at localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -Ud_endservent_r_proto -Ud_setservent_r_proto -Dinc_version_list=5.8.6 5.8.5 5.8.4 5.8.3 -Dscriptdir=/usr/bin'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i386 -mtune=pentium4 -fasynchronous-unwind-tables',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='4.0.2 20051007 (Red Hat 4.0.2-3)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.90.so, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.3.90'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib/perl5/5.8.7/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl v5.8.7:
    /usr/lib/perl5/site_perl/5.8.7/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.7
    /usr/lib/perl5/site_perl/5.8.6
    /usr/lib/perl5/site_perl/5.8.5
    /usr/lib/perl5/site_perl/5.8.4
    /usr/lib/perl5/site_perl/5.8.3
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.7/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.6/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.7
    /usr/lib/perl5/vendor_perl/5.8.6
    /usr/lib/perl5/vendor_perl/5.8.5
    /usr/lib/perl5/vendor_perl/5.8.4
    /usr/lib/perl5/vendor_perl/5.8.3
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/5.8.7/i386-linux-thread-multi
    /usr/lib/perl5/5.8.7
    .

---
Environment for perl v5.8.7:
    HOME=/home/boston/jvdias
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games:/home/boston/jvdias/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash




More information about the Fedora-perl-devel-list mailing list