RFC: Page size on PPC/PPC64 builders

David Woodhouse dwmw2 at infradead.org
Mon Mar 3 09:41:51 UTC 2008


On Sun, 2008-03-02 at 18:46 -0500, Tom Lane wrote:
> David Woodhouse <dwmw2 at infradead.org> writes:
> > I already see people who should know better complaining about how
> > building for PPC is 'painful' -- and that kind of attitude has
> > contributed to the idiocy of letting 'secondary architecture' builds
> > fail _without_ aborting the main build. I don't want to make matters
> > worse by increasing the perception that building for PPC is hard.
> 
> I've used many non-Intel arches for long enough to not particularly
> worry about one versus another.  However, it took me darn near two
> months to puzzle out the mysql bug that started this thread, and that
> was way too painful.  The problems I see that we need to work on are:
> 
> 1. It's impossible to reproduce the Koji build environment accurately
> without access to PPC64 hardware; widely available stuff like Apple
> Macs isn't PPC64 and won't show page-size-related problems.

Mac G5s are PPC64, as is the PlayStation 3. Hardware isn't that much of
a problem.

The more interesting issue is that the current Fedora kernel doesn't use
64KiB pages on PPC64 hardware. Only the RHEL5 and FC6 kernels do. That
makes testing slightly harder, and is perhaps another argument on the
side of switching back to 4KiB pages.

> 2. There is pitifully little opportunity for Fedora developers to
> get at such hardware.  As far as I've found out, there is exactly
> one PPC64 machine available, its location is documented nowhere
> public (eventually I found out that the magic incantation is "ask
> David Woodhouse"),

I don't think that's a very closely guarded secret -- and there are
other people who offer such accounts too. As a Red Hat employee, you
should also have access to a number of internal systems for that
purpose.

>  and it's down at the moment.

The machine I normally give accounts on is still working fine -- it's
only the FC6 box which is AWOL.

But yes, your points (1) and (2) mostly come down to the same thing --
we need a properly documented way for packagers to access machines of
all kinds. I have frequently said that this should be a requirement for
secondary architectures. But it should cover primary architectures too,
like PPC and x86_64 which not everyone has access to.

> 3. It was not at all obvious that the problem stemmed from changing
> the build farm machines' underlying kernels from RHEL4 to RHEL5.

It was obvious to me -- and that was when I didn't even _know_ that the
builders had been changed. I asked about that, and when it was confirmed
that the builders _had_ changed to RHEL5, I was left in no doubt at all
that your build failure was due to 64KiB pages.

On 2008-01-09 I pointed you right at the problem (albeit largely by
luck) when I told you:
    "It'll certainly affect the way it _allocates_ stack space."

Looking back at that thread, I see you tried increasing the thread stack
to 512KiB without success. That doesn't seem consistent with what we now
believe to be the problem. Was there another issue too?

> I wasted a great deal of time on the assumption that I was looking
> for a consequence of a recent rawhide change, when in fact there was
> no such change. 
>
> Next time we make a change in the buildfarm's underlying kernels,
> I respectfully suggest that that be treated as forcing a mass
> rebuild, just like we do when there are other toolchain changes.
> If I'd seen the breakage first occur in a context like that, it
> would have been much clearer what to look for.

It was very clear to me. If I didn't convey that clarity to you, then I
apologise. I shall have to remember to be less subtle in future :)

-- 
dwmw2




More information about the fedora-devel-list mailing list