[libvirt] [PATCH v2 3/5] Extend nwfilter schema to accept comment attributes

Stefan Berger stefanb at us.ibm.com
Tue Sep 28 20:06:14 UTC 2010


Eric Blake <eblake at redhat.com> wrote on 09/28/2010 03:26:48 PM:

> [image removed] 
> 
> Re: [libvirt] [PATCH v2 3/5] Extend nwfilter schema to accept 
> comment attributes
> 
> Eric Blake 
> 
> to:
> 
> Stefan Berger
> 
> 09/28/2010 03:27 PM
> 
> Cc:
> 
> libvir-list
> 
> On 09/28/2010 04:28 AM, Stefan Berger wrote:
> >> okay.  It also leaves out 8-bit bytes - could that be a problem for 
i18n
> >
> >> where people want comments with native-language accented characters?
> >> That is, are we being too strict here?  Maybe a better pattern would 
be
> >> to reject specific non-printing ASCII bytes we want to avoid, assuing
> >> you can use escape sequences like [^\001]?
> >
> > Looking at
> >
> > http://www.asciitable.com/
> >
> > I should probably include 0x20-0x7E and 128-175, 224-238 - maybe even
> > more? So the regex then becomes
> >
> > [&#x20;-&#x7E;€-¯à-î]{0,256}
> 
> True ASCII is strictly 7-bit; any locale where isprint() returns true on 

> 8-bit bytes is a superset single-byte encoding, such as ISO-8859-1, or 
> 'extended ascii' from the URL you posted above.  But I'm also thinking 
> about multi-byte encodings, like UTF-8, where we cannot a priori write a 

> regex that will accept all valid Unicode printable characters, in part 
> because you have to look at more than one byte at a time to determine if 

> you have a printable character.  Which goes back to my suggestion of an 
> inverse charset - rejecting bytes that are known to be non-printable 
> ASCII, and letting everything else whether or not it is is a printable 
> byte sequence in the current locale.  So what about this idea: exclude 
> control characters except for tab, and let space and everything after 
> through (I don't know if it needs to be adjusted to also reject &#x00):
> 
> [^&#x01;-&#x08&#x0A-&#x1F]{0,256}

Fine by me. We may just give the impression of accepting unicode while the 
code does not handle it.

   Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20100928/30fe9881/attachment-0001.htm>


More information about the libvir-list mailing list