pdftohtml encoding question

François Patte francois.patte at math-info.univ-paris5.fr
Mon Mar 10 22:27:13 UTC 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bonsoir,

I am trying to convert a pdf file into html using pdftohtml provided by f8.

I get an html file with "nice" characters like: ’ insead of apostroph,
or é instead of é...

so i think that there is some coding problem.

Using man pdftohtml, I got this info:
- -enc <string>
~              output text encoding name


but, I am unable to guess what is the syntax to use in order to have a
correct output in utf8 for:

Error: Couldn't find unicodeMap file for the 'utf8' encoding

is the only answer I get if I try:

pdftohtml -enc utf8 myfile.pdf


i tried utf-8, latin1, latin-1, ISO_8859-1, .... without any success.


If somebody knows... many thnaks in advance.


- --
François Patte
UFR de mathématiques et informatique
Université Paris Descartes
45, rue des Saints Pères
F-75270 Paris Cedex 06
Tél. +33 (0)1 44 55 35 61
http://www.math-info.univ-paris5.fr/~patte
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFH1bXBdE6C2dhV2JURAoPgAJ9KFRPk265X2Wp0uTmofOJBOGmZHgCfXZs8
cRHc7uIPOnAvBKGpiFVAByg=
=UBKu
-----END PGP SIGNATURE-----




More information about the fedora-list mailing list