perl and UTF-8
Jonathan Rawle
jr36 at leicester.ac.uk
Tue Jun 22 14:15:56 UTC 2004
Anand Buddhdev wrote:
> [arb at home arb]$ time grep zymology docs/sowpods.txt
> enzymology
> zymology
>
> real 0m0.267s
> user 0m0.260s
> sys 0m0.000s
>
> [arb at home arb]$ export LANG=C
> [arb at home arb]$ time grep zymology docs/sowpods.txt
> enzymology
> zymology
>
> real 0m0.012s
> user 0m0.000s
> sys 0m0.000s
>
> Grep is clearly still much slower in UTF8.
>
I'm not sure that's quite a fair test as the files will be cached in the
second case.
However, you are right, grep is slower under utf8. It's not surprising as
any file it encounters could contain unicoded characters, so there will be
extra overheads even if the files only contain standard ascii.
rc.sysinit on FC2 still contains many lines with
LC_ALL=C grep
for just that reason!
Jonathan
More information about the fedora-list
mailing list