Extracting ASCII text from a PDF Document
Kirk Reiser
kirk at braille.uwo.ca
Thu Aug 12 12:37:43 UTC 2010
What happens when you run pdftotext on the file?
On Thu, 12 Aug 2010, Martin McCormick wrote:
> I have a PDF document that does have embedded ASCII text in it.
> It plays fine on a Macintosh that has no OCR software on it but
> uses Voiceover. Voiceover just runs on ASCII so the ASCII is
> there.
>
> I need to use the file on a Debian system so I hope I am
> just using a2ps and pstotext wrong.
>
> if one uses pstotext on this document, it immediately
> errors out. If I use a2ps and give it -o outfilename.ps, a2ps
> runs but I may be producing an image file as there is no text
> from the document, talk about sound and fury signifying nothing.
>
> If one runs pstotext on that output file, one gets a
> single form feed for each page and nothing else.
>
> The PDF document is not protected.
>
> Any suggestions as to how to extract the text are
> welcome. Thanks.
>
> Martin McCormick
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://www.redhat.com/mailman/listinfo/blinux-list
>
--
Kirk Reiser The Computer Braille Facility
e-mail: kirk at braille.uwo.ca University of Western Ontario
phone: (519) 661-3061
More information about the Blinux-list
mailing list