reading pictures of text in pdf

Paul Merrell marbux at gmail.com
Fri Nov 13 01:02:14 UTC 2015


On Thu, Nov 12, 2015 at 4:10 PM, Brian Tew <montanalag at gmail.com> wrote:
> Is there anything in linux that can convert a pdf file that is a picture of text
> into real actual plain text?

Assuming there's no DRM involved, tesseract-OCR is probably your best
bet. <https://code.google.com/p/tesseract-ocr/>. The source code has
moved to <https://github.com/tesseract-ocr> but the documentation
seems to still be on code.google.com.

Best regards,

Paul

-- 
[Notice not included in the above original message:  The U.S. National
Security Agency neither confirms nor denies that it intercepted this
message.]




More information about the Blinux-list mailing list