Fedora Linux OCR howto v0.1
Mitch Wiedemann
mc2 at lightlink.com
Sun Feb 22 02:43:30 UTC 2004
Below is the process I use to get good OCR results in Linux. Posted
here for posterity.
My Setup:
-------------
Fedora Core 1 (Linux)
Kooka ("Scan & OCR Program" in KDE Graphics menu)
gocr (gocr-0.37-0.rhfc1.dag.i386.rpm)
Canon CanoScan LIDE 30 USB scanner
Notes about input document quality:
1. The better the quality of the document, the better your OCR results
will be. If the characters on the page don't have at least some
whitespace all the way around, you've got an uphill battle. And you
know what Sun Tzu says about that!
2. Text that is of a uniform size and weight will yield better results.
3. If you're scanning text from non-white stock, you'll have to fiddle
with the brightness and contrast when scanning to try to get as near
true black and true white as you can.
Scanning
------------
1. Connect scanner
2. Place document on scanner glass, careful to keep the lines of text as
horizontal as possible.
3. Open Kooka and set the following:
Scan Mode: "Gray"
Resolution: 150 - 300 dpi, depending on the size of the text on the
page.
4. Do a "Preview Scan"
5. Select the text area you want to scan for OCR
6. Do a "Final Scan" and select an output image format. I use "PNG" for
no good reason.
Note: For some odd reason, I have to close Kooka after scanning and
before OCR to get things to work they way they should. YMMV.
OCR
-----------
1. Select the image you saved in step 6 above, and select a few words
from the image to do a test OCR run.
2. Click the "OCR on Selection" button on the Kooka toolbar.
3. Use default gocr setting for the first run.
4. Check results. Adjust gocr settings if needed to get better results.
5. Repeat as necessary.
6. Try selecting all of your image text and click the "OCR on Selection"
button again.
Good luck.
--
Mitch Wiedemann
mc Computer Consulting
mc2 at lightlink.com
http://www.lightlink.com/mc2
More information about the fedora-list
mailing list