[libvirt] [PATCH v2 3/5] Extend nwfilter schema to accept comment attributes

Thu Sep 30 11:31:56 UTC 2010

  On 09/28/2010 04:06 PM, Stefan Berger wrote:
>
> Eric Blake <eblake at redhat.com> wrote on 09/28/2010 03:26:48 PM:
>
> > [image removed]
> >
> > Re: [libvirt] [PATCH v2 3/5] Extend nwfilter schema to accept
> > comment attributes
> >
> > Eric Blake
> >
> > to:
> >
> > Stefan Berger
> >
> > 09/28/2010 03:27 PM
> >
> > Cc:
> >
> > libvir-list
> >
> > On 09/28/2010 04:28 AM, Stefan Berger wrote:
> > >> okay.  It also leaves out 8-bit bytes - could that be a problem 
> for i18n
> > >
> > >> where people want comments with native-language accented characters?
> > >> That is, are we being too strict here?  Maybe a better pattern 
> would be
> > >> to reject specific non-printing ASCII bytes we want to avoid, assuing
> > >> you can use escape sequences like [^\001]?
> > >
> > > Looking at
> > >
> > > http://www.asciitable.com/
> > >
> > > I should probably include 0x20-0x7E and 128-175, 224-238 - maybe even
> > > more? So the regex then becomes
> > >
> > > [&#x20;-&#x7E;€-¯à-î]{0,256}
> >
> > True ASCII is strictly 7-bit; any locale where isprint() returns 
> true on
> > 8-bit bytes is a superset single-byte encoding, such as ISO-8859-1, or
> > 'extended ascii' from the URL you posted above.  But I'm also thinking
> > about multi-byte encodings, like UTF-8, where we cannot a priori 
> write a
> > regex that will accept all valid Unicode printable characters, in part
> > because you have to look at more than one byte at a time to 
> determine if
> > you have a printable character.  Which goes back to my suggestion of an
> > inverse charset - rejecting bytes that are known to be non-printable
> > ASCII, and letting everything else whether or not it is is a printable
> > byte sequence in the current locale.  So what about this idea: exclude
> > control characters except for tab, and let space and everything after
> > through (I don't know if it needs to be adjusted to also reject &#x00):
> >
> > [^&#x01;-&#x08&#x0A-&#x1F]{0,256}
>
> Fine by me. We may just give the impression of accepting unicode while 
> the code does not handle it.

... except that xmllint does not like &#x01 with or without preceding ^ 
(among other things):

xmllint --relaxng ./docs/schemas/nwfilter.rng 
tests/nwfilterxml2xmlout/comment-test.xml
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef: invalid 
xmlChar value 1
<param name="pattern">[^&#x01;-&#x08&#x0A-&#x1F]{0,256}</param>
                                     ^
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid 
hexadecimal value
<param name="pattern">[^&#x01;-&#x08&#x0A-&#x1F]{0,256}</param>
                                           ^
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef: invalid 
xmlChar value 0
<param name="pattern">[^&#x01;-&#x08&#x0A-&#x1F]{0,256}</param>
                                           ^
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid 
hexadecimal value
<param name="pattern">[^&#x01;-&#x08&#x0A-&#x1F]{0,256}</param>
                                                ^
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef: invalid 
xmlChar value 0
<param name="pattern">[^&#x01;-&#x08&#x0A-&#x1F]{0,256}</param>
                                                ^
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid 
hexadecimal value
<param name="pattern">[^&#x01;-&#x08&#x0A-&#x1F]{0,256}</param>
                                                      ^
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef: invalid 
xmlChar value 0
<param name="pattern">[^&#x01;-&#x08&#x0A-&#x1F]{0,256}</param>

    Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20100930/dd07aff2/attachment-0001.htm>