[Libvir] [PATCH] Enhance virBuffer code
Jim Meyering
jim at meyering.net
Fri Dec 14 13:17:31 UTC 2007
"Richard W.M. Jones" <rjones at redhat.com> wrote:
> Jim Meyering wrote:
>> "Richard W.M. Jones" <rjones at redhat.com> wrote:
>>
>>> Jim Meyering wrote:
>>>> "Richard W.M. Jones" <rjones at redhat.com> wrote:
>>>>> Jim Meyering wrote:
>>>>>> What do you think of using this?
>>>>>>
>>>>>> isascii (*p) && isalnum (*p)
>>>>> I'm not sure I'm qualified to say what this does on EBCDIC, but quite
>>>>> likely lots of other code breaks there too anyway. This is nicely
>>>>> self-documenting anyway.
>>>> As Daniel suggested, isalnum is locale-sensitive.
>>>> If there's a locale with an alphabetic byte that is outside
>>>> the logical a-zA-Z range, yet still within 0..127, then the above
>>>> expression will give a false-positive for that byte.
>>>>
>>>> I've been inclined to stop worrying about EBCDIC for years, but a quick
>>>> search on the web finds that people are still stuck using it, and do
>>>> report bugs in ASCII-assuming code.
>>>>
>>>> This is why autoconf goes to the trouble of doing this:
>>>> tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
>>>> not this:
>>>> tr a-z A-Z
>>>> to convert to upper case.
>>> Another factor to consider here is that it doesn't matter if this
>>> function over-escapes, but it does matter if the function
>>> under-escapes. That is to say, it could escape every character as a
>>> %xx hex code, which would be ugly and slightly inefficient but not
>>> wrong.
>>
>> IMHO, if you don't use the all-enumerating switch-based code that Daniel
>> objects to, it'd be good to document (in both loops) that the test is
>> inaccurate with EBCDIC, and explain why it's ok to get false positives.
>>
>> Without comments, people might be tempted to use a similar test in a
>> context where the differences matter.
>
> OK, how about this?
>
> Rich.
>
> + for (p = str; *p; ++p) {
> + /* Want to escape only A-Z and 0-9. This may not work on
> EBCDIC. */
> + if (isascii (*p) && isalnum (*p))
Actually, with that, the code is at the mercy of locale definitions,
(which are notoriously unreliable), and it probably works with EBCDIC.
I wrote the following and tested a few systems:
#include <ctype.h>
#include <stdio.h>
#include <locale.h>
int
is_alphanum (char c)
{
switch (c)
{
/* generated by LC_ALL=C perl -e \
"print map {qq(case '\$_': )}('a'..'z', 'A'..'Z', '0'..'9')"|fmt */
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g':
case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u':
case 'v': case 'w': case 'x': case 'y': case 'z': case 'A': case 'B':
case 'C': case 'D': case 'E': case 'F': case 'G': case 'H': case 'I':
case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P':
case 'Q': case 'R': case 'S': case 'T': case 'U': case 'V': case 'W':
case 'X': case 'Y': case 'Z': case '0': case '1': case '2': case '3':
case '4': case '5': case '6': case '7': case '8': case '9':
return 1;
default:
return 0;
}
}
int
main ()
{
setlocale (LC_ALL, "");
for (unsigned int i = 0; i < 256; i++)
if (isalnum (i) && isascii (i) && !is_alphanum (i))
printf ("%d: %c", i, i);
return 0;
}
-------------------------------
I compiled and ran it against all installed locales like this:
gcc -o k k.c && for i in $(locale -a); do \
test "$(LC_ALL=$i ./k|wc -l)" = 0 || echo $i;done
On RHEL4, RHEL5, and rawhide, it finds this exception:
vi_VN.tcvn
Running manually in that locale suggests something is fishy:
$ LC_ALL=vi_VN.tcvn ./k
1:
2:
4:
5:
6:
17:
18:
19:
20:
21:
22:
23:
Surprise, surprise...
So in this locale, using "isascii (*p) && isalnum (*p)" would
*under*quote.
I didn't expect to find such a convincing argument.
I stand by my suggestion to use the switch statement.
More information about the libvir-list
mailing list