quick unrtf question?
geoff at QuiteLikely.com
Sun Dec 8 18:17:05 UTC 2013
I'm not a Bookshare member as I'm not in the USA. But if what I've seen
is a typical representation of Bookshare books, it's trivial to convert
these to HTML.
Assuming your book has the daisyTransform.xsl file included with it, and
it's probably easy enough to get hold of if it doesn't, you can use
xsltproc to convert it like so:
xsltproc -o <outputfile.html> daisyTransform.xsl <inputfile.xml>
When I first saw this thread, I was wondering if you were wanting to
convert Word 2007/2010 docx files. These files are really zip files with
an XML document and a bunch of related files.
There is a transform called docs2html.xsl (don't remember if that's just
what I called it or if it was originally called this) which you can use
with xsltproc and unzip to convert docx files to html.
A search for docx2html xsl will turn up a bunch of results, and I'm of
course happy to send the XSL to anyone who wants it.
I have a one-line shell script that takes the docx file as an argument and
produces an HTML file with the same basename. The line of code is:
unzip -p "$1" word/document.xml |xsltproc -o "`basename "$1" .docx`.html" docx2html.xsl -
Note that this assumes a document created by Microsoft Word. Word always
calls the XML file word/document.xml but there's no reason for it to be
called this and apparently some other software packages use different
finally, while installing xsltproc on this box just now to verify all
this, I also noticed a Debian package called xmlto which is apparently a
front-end to xsltproc and such that's meant to take some of the work out
of all this. I've not tried it though.
More information about the Blinux-list