[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: recent-file-spec: possible design flow ?



Waldo Bastian wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Monday 14 July 2003 14:51, Oliver Braun wrote:


Hi *,

we - the SUN team working on OpenOffice.org - noticed a possible design
flow in the recent-file-spec (or at least in the Gnome 2.2
implementation of it) when looking at the Ximian patches for
OpenOffice.org:

it seems that Gnome 2.2 converts the full local file path to utf-8
before encoding the result as file url. It uses the text encoding
matching the current locale as "from" encoding. This is not reversable
if the path contains bytes that are not valid characters in this
encoding (multi encoding paths) !

The result will be that the application launched by the panel will not
be able to open such a file when chosen by the user from the "Open
Recent" menu. Unfortunatly we made the same mistake in OpenOffice.org
1.x :(. The only way to handle multi encoding paths correctly seems to
be to encode the byte sequence as returned by the file system layer.

The recent file spec says <QUOTE> All text in the file should be stored
in the UTF-8 encoding.</QUOTE>, which IMHO can easily (mis- ?)
understood as "convert file names to utf-8".

How does KDE expect file urls to be encoded ?



I'm not aware of any recent-file-spec or KDE implementing it, but in general KDE converts filenames from locale-encoding to 16-bit unicode which is used internally, and typically stored on disk as utf-8.


URL's are handled slightly different, when storing filenames as URL's, they are re-encoded using the locale-encoding and then the non-ascii part is %-encoded. That results in a URL that consists of ASCII-chars only and the octets of the URL match 1:1 with the octets of the original filename (assuming decoding/encoding with the locale-encoding is reversable)

We did identify a problem when using utf-8 as locale-encoding. If a filename is not a valid utf-8 sequence then decoding/encoding such filename will change it. We intent to fix that by recording the invalid-utf8 sequence in the 16-bit unicode string so that we can still convert it back to the original sequence when converting back to "utf-8" (It will not be valid utf-8)

Are you aware of other encodings than utf-8 where this might be a problem?

Basically you can run into this problem with any encoding: let say a chinese user creates a file in zh.BIG5 locale that contains non ascii characters. Later (s)he logs in with "C" locale. In this case, the conversion from locale-encoding to utf-8/utf-16 may produce some "?" if the byte sequence being the file name is not valid in that encoding.

Latest when it comes to file urls stored on disk in one locale, but read in another, the conversion from utf-8/utf-16 to the locale encoding will fail and the resulting file name does not match the one on disk.

You can even try to save a file named with german umlauts, store the url and try to convert back in "C" locale and the conversion will fail.

Luckily, users don't do such things very often - or don't expect it to work ..

- Oliver



Cheers,
Waldo
- -- bastian kde org -=|[ SuSE, The Linux Desktop Experts ]|=- bastian suse com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org


iD8DBQE/ErFFN4pvrENfboIRAupGAJkBGFpB16LhSxaQSA74TZXia/C6WwCbBA31
OWgDG7K3nYnh88dYKm2LD1M=
=C1Su
-----END PGP SIGNATURE-----






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]