extracting text from png files

Linux for blind general discussion blinux-list at redhat.com
Mon Dec 17 22:56:00 UTC 2018


Howdy,

i use tesseract for doing this.
I recognized with version 4.0 what just is released the results improved 
a lot here (for german and english usecases).
some offical numbers could be found here:
https://github.com/tesseract-ocr/docs/raw/master/das_tutorial2016/7Building%20a%20Multi-Lingual%20OCR%20Engine.pdf
the languages improves between 10 and 80 percent - depending on language 
and it previouse support level..
It seems it got a new OCR engine spend based on neuronal network.

cheers chrys

Am 17.12.18 um 16:57 schrieb Linux for blind general discussion:
> Disclaimer: I don't know which image formats either program supports
> directly, nor do I know of a good way to convert between image
> formats, though I'm pretty sure cuneiform supports at least .jpg and
> .png files directly.
>
> I also remember at least one OCR tutorial recommending some
> preprocessing to make images easier for the OCR program to work with,
> and I believe they used the convert command provided by imagemagick to
> do so, but I forget the details.
>
> Also, it's been a while since I've attempted any OCR'ing myself(how
> often I had to manually clean up the output kind of put me off), so
> there might be others on this list who can provide better, and more
> specific advice on this subject.
>
> Still, I hope I've at least got you started on the right track.
>
>
>




More information about the Blinux-list mailing list