[Jocr-devels] gocr and the blind (fwd)

Wed Mar 31 16:36:35 UTC 2004

The question is, how good is GOCR?

Pratik Patel
Managing Director
CUNY Assistive Technology Services
the City University of New York
(718) 997-3775
ppatel at qc.edu

-----Original Message-----
From: blinux-list-bounces at redhat.com
[mailto:blinux-list-bounces at redhat.com] On Behalf Of John J. Boyer
Sent: Wednesday, March 31, 2004 11:12 AM
To: Stoppard; Linux for blind general discussion
Subject: Re: [Jocr-devels] gocr and the blind (fwd)

Nigel,

GOCR is the Gnu optical Character Recognition program. It accepts an
image 
file as input and produces a tex1t file as output. Both input and output

may be in various formats. If you have a scanner attached to your
machine 
you can obtain an image with scanimage, pipe it to gocr, and eigher 
redirect the output of gocr to a file or pipe it to another program.
Below 
is the README file from the gocr source download.

				GOCR/JOCR

GOCR is an optical character recognition program, released under the GNU

General Public License. It reads images in many formats (pnm, pbm, pgm,
ppm, 
some pcx and tga image files (or PNM from stdin); if netpbm-progs is
installed and popen works on your system you can also use pnm.gz,
pnm.bz2,
png, jpg, tiff, gif, bmp and others) and outputs a text file. 
You do not have to train the program or store large font bases.
Simply call gocr from the command line and get your results.

To see installation instructions, see the INSTALL file.

How to start? (QUICK START)
---------------------------
You'll probably want to use one of the frontends available, such as the
TCL/TK
or the GTK. They make your life much easier.

Some examples of how you can use gocr:

  gocr -h       			# help
  gocr file.pbm				# minimum options
  gocr -v 1 file.pbm >out.txt 2>out.log # generate text- and log file
  djpeg -pnm -gray text.jpg | gocr -	# using JPEG-files
  gzip -cd text.pbm.gz | gocr -		# using gzipped PBM-files
  giftopnm text.gif | gocr -		# using GIF-files
  gocr -v 1 -v 32 -m 4 file.pbm   	# zoning and out30.bmp output
  xli -geometry 400x400 out30.bmp # see details using xli (recommanded
viewer)
  wish gocr.tcl			  # X11-tcl/tk-frontend (development
version)

How to get image files?
-----------------------
Scan text pages and save it as PGM/PBM/PNM file. Use a program such as
The GIMP or Sane. You can also use netpbm-progs to convert several image

formats into PGM/PBM/PNM. The tool djpeg can be used to convert jpeg
into pgm.
If you have a POSIX compatible system like linux and PNM-tools, gzip and
bzip2
are installed, you are lucky and gocr will do conversion 
from [.pnm.gz, .pnm.bz2, .jpg, .jpeg, .bmp, .tiff, .png, .ps, .eps]
to [.pgm] for you. This list can easily be extended editing src/pnm.c.

Gocr also comes with some examples, try: make examples.

Memory limitations
------------------
WARNING!!!

If you use a 300dpi scan of A4 letter, the image is about 2500x3500
pixels and
gocr requires 8.75MB for storing the picture into the memory. Not only
that,
but gocr may create a 2nd copy, using a total of 17MB. This is
independent
of using b/w or gray-scale images. Be sure that you have enough RAM
installed
in your machine! Alternatively you can cut the picture into small
pieces.
You can use the pnmcut, from the netpbm package to cut the file.
Example:

pnmcut -left 0 -right 2500 -top 0 -height 1000 bigfile.pnm >
smallfile.pnm

And then use gocr in the cropped image as usual. Take care: if you chopp
the
characters, gocr won't be able to understand that line.

Future versions will take care of this issue automatically.

Limitations
-----------
gocr is still in its early stages. Your images should fit in these
requirements
if you want a good output:

- good scans (all chars well seperated, one column, no tables etc, 12pt
300dpi)
  should work well
- fonts 20-60 pixels ( 5pt * 1in/72pt * 300 dpi = 20 dots )
- output of image file for controlling detection

And note that speed is very slow (this will be changed when recognition
works 
well)
  12pt 300dpi 1700x950 16lines 700chars 22x28 P90=40s..90s v0.2.3 (gcc
-O0)

You can try to optimize the results:
- make good scans/treat image
- try to change the critical gray level (option -l <n>)
- control the result on out30.bmp (option -v 32)
- enlarge option -d <n> for high resolution images which are noisy
- try different combinations for option -m <n>
- for thousends of documents with same font
  you can use/create a database (-m 2/-m 130)
- use options -d 0 -m 8 on screen shots (font8x12)

What does >> NOT << work at the moment:
- complex layouts (try option -m 4)
- bad scans, noisy/snowy images, FAX-quality images
- serif fonts, italic fonts, slanted fonts
- handwritten texts (this is valid for the next ten years I guess)
   the exisctence of autotrace can shorten this 
- rotated images (but slightly rotated images should be no problem)
- small fonts (fax like) or mix of different font size
- colored images (use gray or black/white)
- Chinese, Arabian, Egyptian, Cyrillic or Klingon fonts

How it works or how it should work?
- put the entire file into RAM (300dpi grayscale recommended)
- remove dust and snow
- detect small angle (lines which are not horizontal)
- detect text boxes (option -m 4)
- detect text-lines
- detect characters
- first step recognition (every character has its own empirical
procedure)
  - no neural network or similar general algorithms
- analyze not detected chars by comparison with detected ones
- try to divide overlapping chars
- testwise: compare all letters (like compression of pictures)
- for more details look to the gocr.html documentation

How can you help gocr?
----------------------
- Send comments, ideas and patches (diff -ru gocr_original/
gocr_changed/).
- If you have a lot of money, spend a bit (www.paypal.com).
- I always need example files (.pbm.gz or jpeg <100kB) for testing
  the behavior of the ocr engine under different conditions, 
  because scanning does take a lot of time which I do not have.
  But do not send files which are not convertable by commercial ocr
programs
  or which are protected from copying and electronic processing by
copyright.
  That will help, to get the world's best OCR open source program. :)
Thanks!
- Send me your results (errors,num_chars,dpi) and if possible results
  and name of professional OCR programs for statistics.
- Read OCR literature, extract the essentials and send a short report
  to me ;).
- If you have a good idea, how to manage some OCR-tasks, tell me!
- Tell your friends about gocr. Tell me about your success. Be happy.

After all, is it gocr or jocr?
------------------------------
The original name of this project is gocr, from GNU Optical Character
Recognition. Another project is using the same name, however; so the
name was changed to jocr. If you have a good idea for a name, please
send it.

Latest news
------------
  http://jocr.sourceforge.net

Authors: (see AUTHORS)

On Wed, 31 Mar 2004, Stoppard wrote:

> For give me for asking, but what is gocr?
> 
> Something, optical chariter reader?
> I thought screen readers and braile displays were expensive every
where not
> just in the UK.
> 
> Thanks,
> Nigel,
> Seven of Nine is trivial sublime.
> To read about injustice, please visit.
> http://www.stormbringer.tv/info/story1.html
> ----- Original Message -----
> From: "John J. Boyer" <director at chpi.org>
> To: <blinux-list at redhat.com>
> Sent: Wednesday, March 31, 2004 2:14 PM
> Subject: [Jocr-devels] gocr and the blind (fwd)
> 
> 
> > Some of you might want to contact the BBC as suggested in this
message. I
> > might do so myself, but I don't have a scanner connected to my
machine.
> >
> > John
> >
> >
> > --
> > John J. Boyer; Executive Director, Chief Software Developer
> > Computers to Help People, Inc.
> > http://www.chpi.org
> > 825 East Johnson; Madison, WI 53703
> >
> >
> > ---------- Forwarded message ----------
> > Date: Tue, 30 Mar 2004 12:21:29 -0800
> > From: Oliver D. Iberien <oliver.iberien at mindspring.com>
> > To: jocr-devels at lists.sourceforge.net
> > Subject: [Jocr-devels] gocr and the blind
> >
> > I was just listening to a BBC Radio 4 broadcast about the trouble
the
> blind in
> > Britain have in locating books of any real quality to read. I
wondered why
> > they weren't simply using scanners, OCR, and braille displays and/or
> speech
> > synthesizers along with printed books. It turn out that the basic
software
> > sold to the blind is very expensive. This is $1000
> > (http://www.freedomscientific.com/fs_news/nr_OB7S.asp), not
including a
> > speech sythesizer. (It's thousands more for a braille display).
> >
> > It seems to me that someone could string something together using
gocr, or
> > similar, to give the blind affordable software and hence better
access. I
> > don't know if anyone is working on such a thing, but if so, you
might want
> to
> > contact the BBC broadcasters at intouch at bbc.co.uk. The website is
> > http://www.bbc.co.uk/radio4/factual/intouch.shtml.
> >
> > Oliver
> >
> >
> > -------------------------------------------------------
> > This SF.Net email is sponsored by: IBM Linux Tutorials
> > Free Linux tutorial presented by Daniel Robbins, President and CEO
of
> > GenToo technologies. Learn everything from fundamentals to system
> >
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> > _______________________________________________
> > Jocr-devels mailing list
> > Jocr-devels at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/jocr-devels
> >
> >
> > _______________________________________________
> > Blinux-list mailing list
> > Blinux-list at redhat.com
> > https://www.redhat.com/mailman/listinfo/blinux-list
> 
> 
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://www.redhat.com/mailman/listinfo/blinux-list
> 

-- 
John J. Boyer; Executive Director, Chief Software Developer
Computers to Help People, Inc.
http://www.chpi.org
825 East Johnson; Madison, WI 53703

_______________________________________________
Blinux-list mailing list
Blinux-list at redhat.com
https://www.redhat.com/mailman/listinfo/blinux-list