pdftohtml encoding question
Andras Simon
szajmi at gmail.com
Tue Mar 11 12:40:21 UTC 2008
On 3/10/08, François Patte <francois.patte at math-info.univ-paris5.fr> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> bonsoir,
>
> I am trying to convert a pdf file into html using pdftohtml provided by f8.
>
> I get an html file with "nice" characters like: ’ insead of apostroph,
> or Ã(c) instead of é...
>
> so i think that there is some coding problem.
>
> Using man pdftohtml, I got this info:
> - -enc <string>
> ~ output text encoding name
>
>
> but, I am unable to guess what is the syntax to use in order to have a
> correct output in utf8 for:
>
> Error: Couldn't find unicodeMap file for the 'utf8' encoding
>
> is the only answer I get if I try:
>
> pdftohtml -enc utf8 myfile.pdf
>
>
> i tried utf-8, latin1, latin-1, ISO_8859-1, .... without any success.
>
>
> If somebody knows... many thnaks in advance.
I don't, but
man pdftohtml
-> Pdftohtml was developed by Gueorgui Ovtcharov and Rainer Dorsch. It is
based and benefits a lot from Derek Noonburg?s xpdf package.
man xpdf
-> -enc encoding-name
Sets the encoding to use for text output. The encoding-name
must be defined with the unicodeMap command (see xpdfrc(5)).
This defaults to "Latin1" (which is a built-in encoding). [con-
fig file: textEncoding]
man xpdfrc
-> unicodeMap encoding-name map-file
[...]
The Latin1, ASCII7, Symbol, ZapfDingbats, UTF-8, and
UCS-2 encodings are predefined.
I'm afraid you'll have to read the elided part if you need an encoding
other than these six.
Hope this helps,
Andras
More information about the fedora-list
mailing list