BuildSystem questions

Sat Nov 12 15:28:58 UTC 2005

On Fri, 2005-11-11 at 10:18 -0700, Kevin Fenzi wrote:
> If you have 2 packages (say A and B) and queue up first A and then B,
> can you be sure that A will be finished and be available by the time B
> is building? This doesn't seem to be the case currently, or there is a
> window there when it's not true. So for packages with other packages
> as dependencies should we wait until they have gone to the 'needsign'
> area? Or longer? before building the package that depends on them?
> 
> I imagine there is a createrepo in there after A has completed, but B
> might have already started? 

Correct, there is no ordering guarantee at this time.  Mainly because
doing ordering would require depsolving on the build server which isn't
implemented yet.  It's not impossible though and is something I'd like
to see in plague 0.5.

> If a job fails due to something that doesn't require any changes to
> the package (ie, it couldn't find a dependent package that was just
> built, the devel repository was in an unstable state, the build
> machine got stuck, martians killed the job, etc) do we still need to
> bump the release of the package and request a new build? Or will
> 'plague-client requeue NNN' work to rebuild the job after the problem
> is (hopefully) gone? Since the job failed, that release was never
> released, so there shouldn't be an issue there. 

If any build fails, you do need to bump the release before you stick a
new build in.  We had a discussion of a 'make force-tag' (essentially,
cvs tag -F <tag>) a long time ago and people decided it wasn't a good
idea.  But technically, if you know the magic cvs command and the tag
format, you don't have to do bump the release before a rebuild.

A tag and release really only matter when they've gotten through the
build system.  You never want to RPMs with the same NEVR that differ in
content, and you can't possibly have that if one hasn't gotten through
the build system successfully.

> Some observations/comments: 
> 
> If you kill a job with 'plague-client kill NNN' it says it killed it,
> but the web page shows it still there and unaffected. It then gets
> killed when it reaches a builder and mails that it was killed. Perhaps
> plage-client should say "job NNN will be killed when it reaches the
> builder" or something. 

Yes, this was reported earlier this week and is a bug when jobs are in
the 'waiting' state.  Hope to look at this this afternoon too.

> The PPC machine seems to be somewhat of a bottleneck. A build of mine
> this morning took 4min on the i386/x86_64 arches and about 18min on
> the ppc machine. Should we look at adding another ppc machine? Or
> increasing memory in the existing one or something? If it's hard to
> get hardware allocated, perhaps we could stick up a donate button on
> the website to get more builder boxes?

Two issues here...  The PPC machine either has 4 processors or 2 x dual
core, I forget which.  So while it's got twice the number of CPUs as one
of the hammer boxes, they are slightly slower.  The real issue with the
PPC machine is the disks, which seem to be slower in general, and that
multiple mock/yum instances don't run in parallel.  I think we're still
seeing the issue where mock/yum lock the host machine's rpmdb even
though they install to the chroot.  That means only one 'prepping' job
can actually execute at any time, even if 4 jobs are 'prepping'.

Dan