Any Accurate O C R Programs in DOS I can run in Linux?

John G. Heim jheim at math.wisc.edu
Fri Jun 28 20:03:58 UTC 2013


It looks as if tesseract can do something called "orientation and script 
detection" but it doesn't do it by default. I haven't been able to try 
it since my scanner is at home. But here is a quote from the tesseract 
man page. Note that it says the default (option 3) is to not do OSD.

--- begin quote ---
NAME
        tesseract - command-line OCR engine

SYNOPSIS
        tesseract imagename outbase [-l lang] [-psm N] [configfile ...]

OPTIONS
[...]
        -psm N
            Set Tesseract to only run a subset of layout analysis and 
assume a certain form of image. The options for N are:

                0 = Orientation and script detection (OSD) only.
                1 = Automatic page segmentation with OSD.
                2 = Automatic page segmentation, but no OSD, or OCR.
                3 = Fully automatic page segmentation, but no OSD. (Default)
                4 = Assume a single column of text of variable sizes.
                5 = Assume a single uniform block of vertically aligned 
text.
                6 = Assume a single uniform block of text.
                7 = Treat the image as a single text line.
                8 = Treat the image as a single word.
                9 = Treat the image as a single word in a circle.
                10 = Treat the image as a single character.




More information about the Blinux-list mailing list