reading pictures of text in pdf
marbux at gmail.com
Fri Nov 13 01:02:14 UTC 2015
On Thu, Nov 12, 2015 at 4:10 PM, Brian Tew <montanalag at gmail.com> wrote:
> Is there anything in linux that can convert a pdf file that is a picture of text
> into real actual plain text?
Assuming there's no DRM involved, tesseract-OCR is probably your best
bet. <https://code.google.com/p/tesseract-ocr/>. The source code has
moved to <https://github.com/tesseract-ocr> but the documentation
seems to still be on code.google.com.
[Notice not included in the above original message: The U.S. National
Security Agency neither confirms nor denies that it intercepted this
More information about the Blinux-list