png2txt -

Tue Jul 1 18:40:55 UTC 2008

Craig White wrote:
> On Mon, 2008-06-30 at 16:26 +0100, Paul Smith wrote:
>> On Sat, Jun 28, 2008 at 5:32 PM, Bob Goodwin USA
>> <bobgoodwin at wildblue.net> wrote:
>>> fred smith wrote:
>>>>>>> Is there an F8 application that will convert a .png copy of a text list
>>>>>>> to a text file?
>>>>>> ----
>>>>>> png is a picture file and there is no text.
>>>>>>
>>>>>> If you want OCR (optical character recognition - software that scans a
>>>>>> picture for recognizable text and saves the recognized text to a file),
>>>>>> I would suggest tesseract.
>>>>> Thanks, I will look at that.
>>>>>
>>>> I believe that Tesseract only understands TIF files, so you will need
>>>> to convert the png before you can OCR them.
>>>>
>>>>
>>> Yes, I discovered that requirement but now I am stumped by -
>>>
>>>   The command line is:
>>>   tesseract <image.tif> <output> [-l langid]
>>>
>>> I thought "-l enUS" might work but no go there.
>>>
>>> There's no man page, only a README and that doesn't tell me about the langid
>>> other than it wants it.  Without it I get very strange looking text.
>> Unfortunately, the OCR programs working in Linux are not very good
>> yet. In case you have access to Acrobat Professional, use it instead;
>> the results are usually excellent.
> ----
> I've never used Acrobat Professional for OCR but I have gotten excellent
> results from tesseract on Linux.
> 
> OP should check out...
> 
> http://www.groklaw.net/article.php?story=20061210115516438&query=tesseract
> 
> http://www.linuxjournal.com/article/9676
> 
I do some similar thing, non-OCR but working with scanned text, and I 
use the netpbm package. First I convert the original format to a 
greyscale image (aka pgm), then convert that to a bilevel image (aka 
black and white) with "pgmtopbm -thr" and setting the value of the 
transition as needed (-val option). Those images are then easily 
converted to tif or whatever you need, in my case jbig images for bext 
compression.

-- 
Bill Davidsen <davidsen at tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot