what software used for ocr on linux

Willem van der Walt wvdwalt at csir.co.za
Fri Jul 11 06:48:19 UTC 2014

I have written a front-end for doing OCR in the kies package.
It handles scanning through scanimage and then OCR through any of a number 
of engines.
It produces an HTML file with all the pages and links to each one.
It also can do OCR from an image file.
The engines are called through wrapper scripts which are defined in a text 
If anyone wants it, the kies package currently can be found at: 
As root:
tar jxv f kies-latest.tar.bz2
cd kies
The actual program for the scan stuf is called kies_p2t
HTH, Willem

On Thu, 10 Jul 2014, Tony Baechler wrote:

> You don't need a graphical environment for sane, but you do for xsane.  I've
> confirmed that sane will let you scan from the command line.  It's been a
> few years, so I don't remember the exact process, but I think you might need
> a sane-utils package.  The problem I had is that it put each page in a
> single .tif image which I couldn't get to OCR very well and couldn't easily
> find a way to combine into a single file for more convenient and faster
> processing.  I played around with "convert" from ImageMagick but still
> didn't get very far, so I unfortunately went back to K1000 in Windows.
> Again, it's been a few years and I only played with Tesseract, so you might
> get better results nowadays.  If you do install Tesseract, be prepared for a
> lot of dependencies as it's very big.  Many newer scanners don't have sane
> drivers and won't be detected as they're designed to work in Windows, so
> don't be surprised if your scanner doesn't appear to work.  I got lucky in
> that it found the scanner automatically and mostly just worked for me.
> On 2014-07-10 04:35 AM, Doug Smith wrote:
>> First of all, install all these so that you will have a choice:
>> ocrad, tesseract, gocr and cuneiform
>> These are the actual ocr engines which do the text recognition in the first place.
>> After this, if you are in a graphical environment install ocrfeeder which is the basic open-book framework.  Also make sure you have sane in if you
>> are using a scanner.  Make sure sane can recognize your scanner.
>> Now, you are ready to try it.  I have no scanner so I have never used this but it might just do the trick for you.
>> Hope this helps.
> -- 
> Have a good day,
> Tony Baechler
> tony at baechler.net
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://www.redhat.com/mailman/listinfo/blinux-list
> -- 
> This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
> The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.
> This message has been scanned for viruses and dangerous content by MailScanner,
> and is believed to be clean.
> Please consider the environment before printing this email.

More information about the Blinux-list mailing list