ocr success

Sun Dec 21 00:06:50 UTC 2008

On Sat, Dec 20, 2008 at 2:11 PM, Daniel Dalton <d.dalton at iinet.net.au> wrote:
> That's a good idea, I didn't think of that, I guess I should invest
> some time into writing something like this.

I did some checking and it sounds like Ocropus already does a lot of
the kinds of things I discussed and is under active development. So
you might study Ocropus further before deciding whether to develop a
script. Your time might be more productively spent contributing to
that project.

Here are some links that may assist:

Ocropus Project home page: <http://code.google.com/p/ocropus/>.

Ocropus Wiki: <http://sites.google.com/site/ocropus/>.

Ocropus mailing list/forum: <http://groups.google.com/group/ocropus>.

Ocropus documentation: <http://sites.google.com/site/ocropus/documentation>

Ocropus development road map: <http://code.google.com/p/ocropus/wiki/Roadmap>.

Updated road map: <http://sites.google.com/site/ocropus/roadmap>
(extends roadmap beyond milestones identified in the first roadmap).

I checked because the major OCR apps on Windows have for many years
provided tools for this kind of stuff. Therefore, I thought it likely
that someone was already developing an open source solution.

Along the way, I learned that Ocropus includes disabled code for
handwriting recognition that may be repaired later. Google is
generously funding both Ocropus and Teseract development, with
Teseract being developed right now mainly for book conversions, in aid
of Google's Books initiative.

I hope this helps. OCR is one of those areas where free and open
source developers are still catching up with proprietary software.
The bright side of that situation is that there should be a lot of
progress made fairly quickly because the technology is well
understood.

Best regards,

Paul

-- 
Universal Interoperability Council
<http:www.universal-interop-council.org>