Copying text from a protected pdf file
Paul Smith
phhs80 at gmail.com
Wed Sep 21 09:35:00 UTC 2005
On 9/17/05, Paul Smith <phhs80 at gmail.com> wrote:
> > > I have got a pdf file, whose text I would like to copy to a word
> > > processor. However, it seems to be protected, as when I copy and paste
> > > a piece of text from there into a word processor, I only see garbage.
> > > Is there some way of getting clean text from the pdf file?
> >
> > The PDF format has many ways to display text. To be able to extract text
> > you need a file that stores strings and uses font information to render them
> > in the viewer. You may be seeing images that were rasterized long ago.
> > You should provide the output of the "pdffonts" command, preferrable for a
> > minimal document (a big document could combine sections that use fonts with
> > images).
> >
> > For example, the simplest case is a document that uses the PostScript Type 1
> > fonts provided by the viewer:
> >
> > $ pdffonts /usr/share/doc/cups-1.1.20/ssr.pdf
> > name type emb sub uni object ID
> > ------------------------------------ ------------ --- --- --- ---------
> > Times-Roman Type 1 no no no 4 0
> > Helvetica Type 1 no no no 7 0
> > Helvetica-Bold Type 1 no no no 8 0
> > Times-Bold Type 1 no no no 5 0
> > Courier Type 1 no no no 3 0
> > Symbol Type 1 no no no 9 0
> > Times-Italic Type 1 no no no 6 0
>
> Thanks, George. In my case,
>
> $ pdffonts myfile.pdf
> name type emb sub uni object ID
> ------------------------------------ ------------ --- --- --- ---------
> DTUUBE+TTBC19E318t00 TrueType yes yes no 13 0
> URMVBE+TTBC18C910t00 TrueType yes yes no 16 0
> TOYVBE+Symbol Type 1C yes yes no 19 0
> Helvetica Type 1C yes no no 22 0
> CLLUBE+TTBC1802E0t00 TrueType yes yes no 34 0
> Helvetica-Bold Type 1C yes no no 43 0
> Helvetica-Oblique Type 1C yes no no 58 0
> $
Is it possible to find the missing fonts to install them?
Paul
More information about the fedora-list
mailing list