[libvirt] [PATCH v2 3/5] Extend nwfilter schema to accept comment attributes
Eric Blake
eblake at redhat.com
Tue Sep 28 19:26:48 UTC 2010
On 09/28/2010 04:28 AM, Stefan Berger wrote:
>> okay. It also leaves out 8-bit bytes - could that be a problem for i18n
>
>> where people want comments with native-language accented characters?
>> That is, are we being too strict here? Maybe a better pattern would be
>> to reject specific non-printing ASCII bytes we want to avoid, assuing
>> you can use escape sequences like [^\001]?
>
> Looking at
>
> http://www.asciitable.com/
>
> I should probably include 0x20-0x7E and 128-175, 224-238 - maybe even
> more? So the regex then becomes
>
> [ -~-¯à-î]{0,256}
True ASCII is strictly 7-bit; any locale where isprint() returns true on
8-bit bytes is a superset single-byte encoding, such as ISO-8859-1, or
'extended ascii' from the URL you posted above. But I'm also thinking
about multi-byte encodings, like UTF-8, where we cannot a priori write a
regex that will accept all valid Unicode printable characters, in part
because you have to look at more than one byte at a time to determine if
you have a printable character. Which goes back to my suggestion of an
inverse charset - rejecting bytes that are known to be non-printable
ASCII, and letting everything else whether or not it is is a printable
byte sequence in the current locale. So what about this idea: exclude
control characters except for tab, and let space and everything after
through (I don't know if it needs to be adjusted to also reject �):
[^-
-]{0,256}
--
Eric Blake eblake at redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
More information about the libvir-list
mailing list