quick unrtf question?

Geoff Shang geoff at QuiteLikely.com
Sun Dec 8 18:17:05 UTC 2013


Hi,

I'm not a Bookshare member as I'm not in the USA.  But if what I've seen 
is a typical representation of Bookshare books, it's trivial to convert 
these to HTML.

Assuming your book has the daisyTransform.xsl file included with it, and 
it's probably easy enough to get hold of if it doesn't, you can use 
xsltproc to convert it like so:

xsltproc -o <outputfile.html> daisyTransform.xsl <inputfile.xml>

When I first saw this thread, I was wondering if you were wanting to 
convert Word 2007/2010 docx files.  These files are really zip files with 
an XML document and a bunch of related files.

There is a transform called docs2html.xsl (don't remember if that's just 
what I called it or if it was originally called this) which you can use 
with xsltproc and unzip to convert docx files to html.

A search for docx2html xsl will turn up a bunch of results, and I'm of 
course happy to send the XSL to anyone who wants it.

I have a one-line shell script that takes the docx file as an argument and 
produces an HTML file with the same basename.  The line of code is:

unzip -p "$1" word/document.xml |xsltproc -o "`basename "$1" .docx`.html" docx2html.xsl -

Note that this assumes a document created by Microsoft Word.   Word always 
calls the XML file word/document.xml but there's no reason for it to be 
called this and apparently some other software packages use different 
names.

finally, while installing xsltproc on this box just now to verify all 
this, I also noticed a Debian package called xmlto which is apparently a 
front-end to xsltproc and such that's meant to take some of the work out 
of all this.  I've not tried it though.

HTH,
Geoff.




More information about the Blinux-list mailing list