OOo documents look different

Caolan McNamara caolanm at redhat.com
Mon Jul 10 08:32:22 UTC 2006


On Sat, 2006-07-08 at 05:32 -0400, Benjy Grogan wrote:
> It's been this way for me since probably FC3 and still in FC5.
> Certain .doc files that fit perfectly on one page in Windows XP, will
> spread over into two pages on FC.  I think it has to do with fonts,
> but not sure.

There are a number of possible reasons, but indeed the most likely is
simply that you don't have the font that the original document was
written with.

Without having the same font then unless the replacement font is
metrically equivalent then the text in the document is not going to be
in the same place.

Now, there are two main fonts used in MS documents, Times New Roman and
Arial. If you buy RHEL, or if you buy StarOffice you get either
"Thorndale AMT/Thorndale" or "Albany AMT/Albany" and those are
metrically equivalent to "Times New Roman" and "Arial". So if you have
*those* fonts your document is likely to be exactly the same in writer
as in msword, or XP vs Fedora. If you have those fonts, but still have
trouble then you have something interesting that the OOo developers can
examine and fix. If you don't have those fonts then you're never going
to be a totally happy camper unless someone gives us some free
equivalently spaced and leaded fonts. (FWIW Cumberland AMT/Cumberland is
the Courier New equivalent font IIRC)

For completeness, what differentiates Fedora OOo from "stock" OOo in
this area, is that we immediately ask fontconfig what is the best
replacement font when the original font is not found. Stock OOo contains
it's own list of guess work to fallback through. Both systems for fedora
use "Nimbus No 9 Roman L" (snappy name) for "Times New Roman" fallback
unless Thorndale is available, in which case that's the preferred
fallback.

Some other interesting things are that 

Some fonts have "leading" as in the metal strips of lead which
typesetters would place between lines of type, which is a feature of the
font which describes how far apart the various line of text written in
that font should be placed, MS has honours this since maybe 97, and OOo
honours this from 2.0.1 onwards. Both applications having compatibility
flags to toggle this on and off

Both MS and writer have various compatibility flags which control some
aspects of spacing between paragraphs, i.e. some versions of word
combine before/after spacing of touching paragraph together by adding
them, and some by finding the max of before/after and use that value
instead, this is also controlled by some compatibility flags. I think I
got most of them working in writer in 2.0.2 onwards. These are part of
the HTML compatibility flags which word implemented to behave the same
as web browsers for it's HTML import work.

There is also a little feature in word where a line of text which
comprises only of whitespace, when it appears at the top of a page, is
considered of less height than a line of normal text, or a line of
whitespace when it appears elsewhere on a page. This one is now
implemented in 2.0.3 I believe.

There is another aspect to this, which is WYSIWYG in relation to what
printer is selected. This is actually also controlled by compatibility
flags, since around word 2000, MS no longer cares about what your
printer can and can not do, and renders according to an idealized
printer backend, and stuffs that to your printer to make do the best it
can. Which means that the doc is always the same on screen regardless of
the printer. Writer *also* now does this, and we hope that we have
gotten the right guesses as to the properties of the idealized printer.
A little toggle in word and writer revert to the traditional "do what
the printer can do" behaviour.

Another little gotcha is with tabs, if there is a tiny amount of space
between the end of a character and the next tab stop , then when a tab
is inserted a decision needs to be made if that space is sufficiently
large enough for the tab to go that next tab stop, or if to skip it and
move to the next one. The default value in word is much smaller than the
default value in writer (or maybe it was the other way around, I think
that compatibility was completed in 2.0.2, but maybe in 2.0.3). Of
course if your font's aren't the same then this sort of tab staircasing
is going to happen regardless of flags.

There are minor things which don't affect anything, but can confuse.
e.g. There are often complaints that tables imported from word and
misplaced, because the guide lines in writer that show the text
boundaries in writer show that imported tables are positioned with their
left corner outside the guide lines. That's because word positions
tables by default so that the contained text will appear on the text
boundary edge, and positions the table negatively to take into account
the left and right cell paddings. Only in the horizontal direction, not
the vertical one though.

For fun, here's another one. A graphic in writer is positioned and sized
according to the top left of the graphic + border. So if you have a
3x3cm graphic, and put a 1cm border around it. The graphic itself is
shrunk to make the result remain the same size before and after the
border is added, and this overall result is still placed in accordance
with the outside left corner. In word the graphic content remains the
same size and the border placed outside, so the overall result is
larger, but the graphic is positioned according to the top left "inside"
corner of the graphic. i.e. position graphic and then draw border
outside it. This gets sort of crazy though with some of the "striped"
bordering styles where there are three stripes in the border, black,
while, black. Word positions according to the top left corner of the
*middle* stripe. Neither the inside of the graphic itself, or the
outside of the border, but according to some implementation quirk of
border handling.

So.... you need metrically equivalent fonts, and if you still have
problems then some possibly unknown, but similar to above, "tricky to
figure out the complete rules" layout quirk is affecting us.

I find the the problems are endemic to an organization due to shared
templates, or more informal basing documents off eachother and other
shared habits. e.g. if a template in word 6 was migrated through many
word revisions it'll have the word 6 compatibility features toggled on,
which aren't the current word defaults so some of the word 6 quirks
remain in use in a company but aren't so prevalent for the writer .doc
filter developers to see as common. So an organization tends to find
writer uniformly "rubbish" or "good" depending on their word-using
history. Obviously with font related spacing issues, you generally get
"good" results when users use page breaks to create a new page, "bad"
results when users hit return to get vertical white space until they get
to a new page. In the first case the cumulative error rate is reset to 0
on every page break, in the second case the error accumulates over the
entire document.

C.




More information about the fedora-devel-list mailing list