ocr + fedora core and a big book..

Rudolf Kastl che666 at gmail.com
Mon Jan 16 09:51:03 UTC 2006


2006/1/16, Gregory Machin <gregory.machin at gmail.com>:
>
> I agree with you, but the boss wants ocr.. I think i will leave hime to
> figure is out I have to much coding to do .. lol ...
>
> thanks for the input .. have a grate day ..
>
> On 1/13/06, Bill Rugolsky Jr. <brugolsky at telemetry-investments.com> wrote:
> >
> > On Fri, Jan 13, 2006 at 10:47:02AM +0000, Paul F. Johnson wrote:
> > > Grab a copy of gocr, compile and install (it's not in FE which is
> > odd).
> > > When you scan, ensure it's at as high a resolution as possible
> > (minimum
> > > in my experience of 300 dpi) and grey scaled.
> > >
> > > Use either gimp or xsane to grab the scan and tell gocr to do it's
> > > business.
> > >
> > > OCR is not an exact science and you will really need to sit down and
> > go
> > > through the scanned text to ensure that the numbers scanned are
> > correct
> > > (very easy to spot, you may have @ instead of 0, l for 1 and the
> > such).
> > > Save the file generated. You may then need to either write a script to
> >
> > > delimit using " " as the target or feed it into emacs and then search
> > > and replace " " for "," - save.
> >
> > Sadly, in my (limited) experience, none of the free software solutions
> > such as Gocr or Clara OCR is really up to the task.  The leading
> > proprietary packages are vastly superior.  Some of them have free 30-day
> > evaluations.
> >
> > With a proper setup for lots of automated training, Clara might be able
> > to do the job.  Especially if you do some image morphology (using, e.g.,
> > GIMP) to clean up the scans.  But you'll have to do some serious work.
> >
> > A tried and true technique that avoids using proprietary software
> > is to simply pay multiple people to type the whole thing, and then
> > reconcile the differences (or use majority voting). :-)
> >
> > Regards,
> >
> >         Bill Rugolsky
> >
>
>
>
> --
> Gregory Machin
> greg at linuxpro.co.za
> gregory.machin at gmail.com
> www.linuxpro.co.za
> www.exponent.co.za
> Web Hosting Solutions
> Scalable Linux Solutions
> www.iberry.info (support and admin)
>
> +27 72 524 8096
>
> --
> fedora-list mailing list
> fedora-list at redhat.com
> To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list



Thats another reason to get the best available solution packaged into
extras... if its beeing widely used its probably beeing improved at a faster
rate.

regards,
Rudolf Kastl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20060116/a377a17b/attachment-0001.htm>


More information about the fedora-list mailing list