what software used for ocr on linux

Tony Baechler tony at baechler.net
Fri Jul 11 06:31:06 UTC 2014


You don't need a graphical environment for sane, but you do for xsane.  I've
confirmed that sane will let you scan from the command line.  It's been a
few years, so I don't remember the exact process, but I think you might need
a sane-utils package.  The problem I had is that it put each page in a
single .tif image which I couldn't get to OCR very well and couldn't easily
find a way to combine into a single file for more convenient and faster
processing.  I played around with "convert" from ImageMagick but still
didn't get very far, so I unfortunately went back to K1000 in Windows.
Again, it's been a few years and I only played with Tesseract, so you might
get better results nowadays.  If you do install Tesseract, be prepared for a
lot of dependencies as it's very big.  Many newer scanners don't have sane
drivers and won't be detected as they're designed to work in Windows, so
don't be surprised if your scanner doesn't appear to work.  I got lucky in
that it found the scanner automatically and mostly just worked for me.

On 2014-07-10 04:35 AM, Doug Smith wrote:
> 
> 
> First of all, install all these so that you will have a choice: 
> 
> ocrad, tesseract, gocr and cuneiform 
> 
> These are the actual ocr engines which do the text recognition in the first place.  
> 
> After this, if you are in a graphical environment install ocrfeeder which is the basic open-book framework.  Also make sure you have sane in if you 
> are using a scanner.  Make sure sane can recognize your scanner.  
> 
> Now, you are ready to try it.  I have no scanner so I have never used this but it might just do the trick for you.  
> 
> 
> 
> Hope this helps. 
> 
> 
> 
> 

-- 
Have a good day,
Tony Baechler
tony at baechler.net




More information about the Blinux-list mailing list