[publican-list] Possible alternative to wkhtmltopdf

Peter Moulder peter.moulder at monash.edu
Tue Dec 6 13:42:41 UTC 2011


I mentioned earlier that I was working on an HTML renderer to do
pagination.  Let's call it Morp.  Although it isn't user-ready, the
output is starting to look like a tempting alternative, at least for
print usage.


Headline features from a Publican point of view:

  - HTML/CSS styling.

  - Doesn't fall apart when encountering a keep-together block larger
    than a page.

  - Allows glyph fallback font substitution for mixed-script documents.

  - Proper shaping for Indic scripts (using Pango).

  - Decent page breaking: honours 'widows' & 'orphans' and so on, but
    also tries to avoid breaks that are merely undesirable, such as
    breaking a short list item, or even splitting a paragraph if this
    can be easily avoided.  Conversely, it might allow a widow if the
    alternatives seem worse.

    (E.g. if I mark figures as page-break-before:avoid and
    page-break-inside:avoid, then Morp chooses to give a widow on page 89 of
    the below sample in preference to either breaking those constraints or
    leaving the page only 60% full.)

  - css3-page styling of page headings, page numbering (roman numerals
    in preface), styling of the "blank" page before a chapter, different
    margins between inside & outside edges, etc.

  - Rounded borders for the <pre> things.  (This is the most obvious
    visual difference between FOP page content that I guess is due to
    something missing from FOP.)

  - Justified text good enough to actually use.

    Web browsers and even word processors have taught people that
    justified text can't be used satisfactorily, producing large gaps
    and/or excessive hyphenation.

    Morp may not apply every known technique, but already it's enough
    that Publican-produced pages can look like a book rather than like
    a web page or school project.

(I have a feeling that FOP can do quite good justified text too, btw.)


The most recent sample of wkhtmltopdf output that was posted to the list
was the Red Hat Enterprise Linux 6 Installation Guide (in English):

  http://fedorapeople.org/~jfearn/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US-TEST.pdf

The corresponding document (though apparently a slightly different
version) as rendered by FOP is

  http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/pdf/Installation_Guide/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US.pdf

while output from Morp is at

  http://bowman.infotech.monash.edu.au/~pmoulder/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US-Morp.pdf


I've tried to make the page styling match FOP output.  Given that
Publican output has lots of screenshots (so it's hard to fill every page
evenly no matter what you do), I've set the pagination not to try very
hard to fill pages exactly, letting break pages in more logical places
(so usually breaking between paragraphs rather than within paragraphs).

I've used SVG versions of the warning/note/important icons, whereas I
replaced the list-item bitmap images with simple glyph markers (diamond
and box).

Some notable omissions are:

  - No page references yet (e.g. in tables of contents).

  - No clickable document outline or clickable links.  I'd do this if
    Cairo made it convenient (someone was working on an interface for
    that), but this isn't something my boss needs.  Otherwise, the
    output could be labelled as "PDF for printing" or the like, and
    steering people to EPUB or HTML for on-screen use.

pjrm.




More information about the publican-list mailing list