Extracting ASCII text from a PDF Document

Martin McCormick martin at dc.cis.okstate.edu
Thu Aug 12 12:26:56 UTC 2010


I have a PDF document that does have embedded ASCII text in it.
It plays fine on a Macintosh that has no OCR software on it but
uses Voiceover. Voiceover just runs on ASCII so the ASCII is
there.

	I need to use the file on a Debian system so I hope I am
just using a2ps and pstotext wrong.

	if one uses pstotext on this document, it immediately
errors out. If I use a2ps and give it -o outfilename.ps, a2ps
runs but I may be producing an image file as there is no text
from the document, talk about sound and fury signifying nothing.

	If one runs pstotext on that output file, one gets a
single form feed for each page and nothing else.

	The PDF document is not protected.

	Any suggestions as to how to extract the text are
welcome. Thanks.

Martin McCormick




More information about the Blinux-list mailing list