OCR in Fedora?

Tue Jul 22 22:21:50 UTC 2008

On Mon, Jul 21, 2008 at 3:54 PM, Valent Turkovic
<valent.turkovic at gmail.com> wrote:
> On Mon, Jul 21, 2008 at 12:13 PM, Paul Smith <phhs80 at gmail.com> wrote:
>> 2008/7/21 joachim.backes at rhrk.uni-kl.de <joachim.backes at rhrk.uni-kl.de>:
>>>> Does anybody do OCR using software available in Fedora? Which ones do
>>>> you use? How do you use them?
>>>> I saw an article about OCRopus [1] and how great app it is but there
>>>> is no ocropus in fedora currently.
>>>>
>>>> [1]
>>>> http://arstechnica.com/news.ars/post/20071024-hands-on-with-googles-ocropus-open-source-scanning-software.html
>>>
>>> I use gocr-0.45-2.fc9.i386
>>>
>>> I think it comes from the fedora repo.
>>
>> Tesseract is better:
>>
>> yum install tesseract
>>
>> Paul
>>
>> --
>> fedora-list mailing list
>> fedora-list at redhat.com
>> To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
>>
>
> Hi Joachim and Paul,
> do gocr and tesseract have GUIs? How are you using them? Do you get
> formated text or just plain text file? Do gocr and tesseract recognise
> colums? Is it possible to get formated OpenOffice Writer document that
> matches the original scanned page?
>
> I read the article I posed the link to about OCRopus and it seams that
> uses tesseract but it somehow improved.
>
> Cheers,
> Valent.

I've used both gocr and tesseract on the same text.  gocr has a gui,
tesseract is only command line.  I've used both tools on various tiff
files.  There is a good writeup on the net, forget where on using
tesseract on ubuntu.  I got much better text recognition with
tesseract from the same original scanned text.  Never tried ocropus.
gustav