[publican-list] Possible alternative to wkhtmltopdf
Peter Moulder
peter.moulder at monash.edu
Tue Dec 6 13:42:41 UTC 2011
I mentioned earlier that I was working on an HTML renderer to do
pagination. Let's call it Morp. Although it isn't user-ready, the
output is starting to look like a tempting alternative, at least for
print usage.
Headline features from a Publican point of view:
- HTML/CSS styling.
- Doesn't fall apart when encountering a keep-together block larger
than a page.
- Allows glyph fallback font substitution for mixed-script documents.
- Proper shaping for Indic scripts (using Pango).
- Decent page breaking: honours 'widows' & 'orphans' and so on, but
also tries to avoid breaks that are merely undesirable, such as
breaking a short list item, or even splitting a paragraph if this
can be easily avoided. Conversely, it might allow a widow if the
alternatives seem worse.
(E.g. if I mark figures as page-break-before:avoid and
page-break-inside:avoid, then Morp chooses to give a widow on page 89 of
the below sample in preference to either breaking those constraints or
leaving the page only 60% full.)
- css3-page styling of page headings, page numbering (roman numerals
in preface), styling of the "blank" page before a chapter, different
margins between inside & outside edges, etc.
- Rounded borders for the <pre> things. (This is the most obvious
visual difference between FOP page content that I guess is due to
something missing from FOP.)
- Justified text good enough to actually use.
Web browsers and even word processors have taught people that
justified text can't be used satisfactorily, producing large gaps
and/or excessive hyphenation.
Morp may not apply every known technique, but already it's enough
that Publican-produced pages can look like a book rather than like
a web page or school project.
(I have a feeling that FOP can do quite good justified text too, btw.)
The most recent sample of wkhtmltopdf output that was posted to the list
was the Red Hat Enterprise Linux 6 Installation Guide (in English):
http://fedorapeople.org/~jfearn/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US-TEST.pdf
The corresponding document (though apparently a slightly different
version) as rendered by FOP is
http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/pdf/Installation_Guide/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US.pdf
while output from Morp is at
http://bowman.infotech.monash.edu.au/~pmoulder/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US-Morp.pdf
I've tried to make the page styling match FOP output. Given that
Publican output has lots of screenshots (so it's hard to fill every page
evenly no matter what you do), I've set the pagination not to try very
hard to fill pages exactly, letting break pages in more logical places
(so usually breaking between paragraphs rather than within paragraphs).
I've used SVG versions of the warning/note/important icons, whereas I
replaced the list-item bitmap images with simple glyph markers (diamond
and box).
Some notable omissions are:
- No page references yet (e.g. in tables of contents).
- No clickable document outline or clickable links. I'd do this if
Cairo made it convenient (someone was working on an interface for
that), but this isn't something my boss needs. Otherwise, the
output could be labelled as "PDF for printing" or the like, and
steering people to EPUB or HTML for on-screen use.
pjrm.
More information about the publican-list
mailing list