perl and UTF-8

Tue Jun 22 14:15:56 UTC 2004

Anand Buddhdev wrote:

> [arb at home arb]$ time grep zymology docs/sowpods.txt
> enzymology
> zymology
>  
> real    0m0.267s
> user    0m0.260s
> sys     0m0.000s
> 
> [arb at home arb]$ export LANG=C
> [arb at home arb]$ time grep zymology docs/sowpods.txt
> enzymology
> zymology
>  
> real    0m0.012s
> user    0m0.000s
> sys     0m0.000s
> 
> Grep is clearly still much slower in UTF8.
> 

I'm not sure that's quite a fair test as the files will be cached in the
second case.

However, you are right, grep is slower under utf8. It's not surprising as
any file it encounters could contain unicoded characters, so there will be
extra overheads even if the files only contain standard ascii.

rc.sysinit on FC2 still contains many lines with
LC_ALL=C grep
for just that reason!

Jonathan