<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
On 09/28/2010 04:06 PM, Stefan Berger wrote:
<blockquote
cite="mid:OF6D73EE01.7B8FACFA-ON852577AC.006DE338-852577AC.006E5F71@us.ibm.com"
type="cite">
<br>
<tt><font size="2">Eric Blake <a class="moz-txt-link-rfc2396E" href="mailto:eblake@redhat.com"><eblake@redhat.com></a> wrote on
09/28/2010
03:26:48 PM:<br>
<br>
> [image removed] </font></tt>
<br>
<tt><font size="2">> <br>
> Re: [libvirt] [PATCH v2 3/5] Extend nwfilter schema to
accept <br>
> comment attributes</font></tt>
<br>
<tt><font size="2">> <br>
> Eric Blake </font></tt>
<br>
<tt><font size="2">> <br>
> to:</font></tt>
<br>
<tt><font size="2">> <br>
> Stefan Berger</font></tt>
<br>
<tt><font size="2">> <br>
> 09/28/2010 03:27 PM</font></tt>
<br>
<tt><font size="2">> <br>
> Cc:</font></tt>
<br>
<tt><font size="2">> <br>
> libvir-list</font></tt>
<br>
<tt><font size="2">> <br>
> On 09/28/2010 04:28 AM, Stefan Berger wrote:<br>
> >> okay. It also leaves out 8-bit bytes - could
that be
a problem for i18n<br>
> ><br>
> >> where people want comments with native-language
accented
characters?<br>
> >> That is, are we being too strict here? Maybe a
better
pattern would be<br>
> >> to reject specific non-printing ASCII bytes we
want to avoid,
assuing<br>
> >> you can use escape sequences like [^\001]?<br>
> ><br>
> > Looking at<br>
> ><br>
> > </font></tt><a moz-do-not-send="true"
href="http://www.asciitable.com/"><tt><font size="2">http://www.asciitable.com/</font></tt></a><tt><font
size="2"><br>
> ><br>
> > I should probably include 0x20-0x7E and 128-175,
224-238 - maybe
even<br>
> > more? So the regex then becomes<br>
> ><br>
> >
[ -~€-¯à-î]{0,256}<br>
> <br>
> True ASCII is strictly 7-bit; any locale where isprint()
returns true
on <br>
> 8-bit bytes is a superset single-byte encoding, such as
ISO-8859-1,
or <br>
> 'extended ascii' from the URL you posted above. But I'm
also
thinking <br>
> about multi-byte encodings, like UTF-8, where we cannot a
priori write
a <br>
> regex that will accept all valid Unicode printable
characters, in
part <br>
> because you have to look at more than one byte at a time
to determine
if <br>
> you have a printable character. Which goes back to my
suggestion
of an <br>
> inverse charset - rejecting bytes that are known to be
non-printable
<br>
> ASCII, and letting everything else whether or not it is
is a printable
<br>
> byte sequence in the current locale. So what about this
idea:
exclude <br>
> control characters except for tab, and let space and
everything after
<br>
> through (I don't know if it needs to be adjusted to also
reject �):<br>
> <br>
> [^-
-]{0,256}</font></tt>
<br>
<br>
<tt><font size="2">Fine by me. We may just give the impression of
accepting
unicode while the code does not handle it.</font></tt>
<br>
</blockquote>
<br>
... except that xmllint does not like  with or without
preceding ^ (among other things):<br>
<br>
<tt>xmllint --relaxng ./docs/schemas/nwfilter.rng
tests/nwfilterxml2xmlout/comment-test.xml<br>
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 1<br>
<param
name="pattern">[^-
-]{0,256}</param><br>
^<br>
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid
hexadecimal value<br>
<param
name="pattern">[^-
-]{0,256}</param><br>
^<br>
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 0<br>
<param
name="pattern">[^-
-]{0,256}</param><br>
^<br>
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid
hexadecimal value<br>
<param
name="pattern">[^-
-]{0,256}</param><br>
^<br>
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 0<br>
<param
name="pattern">[^-
-]{0,256}</param><br>
^<br>
./docs/schemas/nwfilter.rng:862: parser error : CharRef: invalid
hexadecimal value<br>
<param
name="pattern">[^-
-]{0,256}</param><br>
^<br>
./docs/schemas/nwfilter.rng:862: parser error : xmlParseCharRef:
invalid xmlChar value 0<br>
<param
name="pattern">[^-
-]{0,256}</param><br>
</tt><br>
Stefan<br>
</body>
</html>