Convert PDF to Text?
fredex
fredex at fcshome.stoneham.ma.us
Mon Apr 23 10:40:39 UTC 2007
On Mon, Apr 23, 2007 at 06:38:01AM +0100, Keith G. Robertson-Turner wrote:
> Verily I say unto thee, that Akemi Yagi spake thusly:
> > On Sun, 22 Apr 2007 01:33:32 +0100, Keith G. Robertson-Turner wrote:
> >
> >> All it produces is "empty" html files, that is - they are proper html
> >> (head, body, etc.) but the actual content is not there.
> >>
> >> IOW it looks like it can only work if the content of the PDF really is
> >> text, and not a scanned image of text.
> >
> > This might be of help:
> >
> > http://www.groklaw.net/article.php?story=20061210115516438
>
> Thanks for the link. Looks good.
>
I must point out that the scanned result will certainly need a fair amount
of cleanup. While tesseract is pretty good, it is far from perfect.
--
-------------------------------------------------------------------------------
.---- Fred Smith /
( /__ ,__. __ __ / __ : /
/ / / /__) / / /__) .+' Home: fredex at fcshome.stoneham.ma.us
/ / (__ (___ (__(_ (___ / :__ 781-438-5471
-------------------------------- Jude 1:24,25 ---------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-list/attachments/20070423/014250f7/attachment-0001.sig>
More information about the fedora-list
mailing list