BuildSystem questions

Sat Nov 12 20:06:51 UTC 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

>>>>> "Dan" == Dan Williams <dcbw at redhat.com> writes:

Dan> On Fri, 2005-11-11 at 10:18 -0700, Kevin Fenzi wrote:
>> If you have 2 packages (say A and B) and queue up first A and then
>> B, can you be sure that A will be finished and be available by the
>> time B is building? This doesn't seem to be the case currently, or
>> there is a window there when it's not true. So for packages with
>> other packages as dependencies should we wait until they have gone
>> to the 'needsign' area? Or longer? before building the package that
>> depends on them?
>> 
>> I imagine there is a createrepo in there after A has completed, but
>> B might have already started?

Dan> Correct, there is no ordering guarantee at this time.  Mainly
Dan> because doing ordering would require depsolving on the build
Dan> server which isn't implemented yet.  It's not impossible though
Dan> and is something I'd like to see in plague 0.5.

It would be good to get something that would work for that asap. 

Currently if you have a string of packages that depends on the next
one up in the chain, you have to build one, wait for it to be released
and break the repository, then do the next one. Not very optimal. ;( 

Are you sure this isn't working now? 

Looking at the build system in the root.log files I see it refering to
packages that are 'core' 'extras' or 'local'. Isn't 'local' the local
needsign repository? 

>> If a job fails due to something that doesn't require any changes to
>> the package (ie, it couldn't find a dependent package that was just
>> built, the devel repository was in an unstable state, the build
>> machine got stuck, martians killed the job, etc) do we still need
>> to bump the release of the package and request a new build? Or will
>> 'plague-client requeue NNN' work to rebuild the job after the
>> problem is (hopefully) gone? Since the job failed, that release was
>> never released, so there shouldn't be an issue there.

Dan> If any build fails, you do need to bump the release before you
Dan> stick a new build in.  We had a discussion of a 'make force-tag'
Dan> (essentially, cvs tag -F <tag>) a long time ago and people
Dan> decided it wasn't a good idea.  But technically, if you know the
Dan> magic cvs command and the tag format, you don't have to do bump
Dan> the release before a rebuild.

But if the job just failed due to a build system issue you shouldn't
need to re-tag the package, right? Just requeue the job and it should
run the same build again.

Dan> A tag and release really only matter when they've gotten through
Dan> the build system.  You never want to RPMs with the same NEVR that
Dan> differ in content, and you can't possibly have that if one hasn't
Dan> gotten through the build system successfully.

But they _don't_ differ in content. 

- - Submit job for package A-1.0-1.fc5
- - Build fails or gets stuck in the build system. 
- - make NO CHANGES to the package in cvs. 
- - plague-client kill NNN
- - plague-client requeue NNN
- - package builds and goes to needsign. 

Is there anything wrong with that procedure? I find it nasty to bump
the release in a package just to get another build when nothing in the
package has changed. If you have to modify the package to build of
course you need to change release, but in this case nothing has
changed except the buildsystem didn't get stuck on the job. 

>> Some observations/comments:
>> 
>> If you kill a job with 'plague-client kill NNN' it says it killed
>> it, but the web page shows it still there and unaffected. It then
>> gets killed when it reaches a builder and mails that it was
>> killed. Perhaps plage-client should say "job NNN will be killed
>> when it reaches the builder" or something.

Dan> Yes, this was reported earlier this week and is a bug when jobs
Dan> are in the 'waiting' state.  Hope to look at this this afternoon
Dan> too.

Excellent. Thanks. ;) 

>> The PPC machine seems to be somewhat of a bottleneck. A build of
>> mine this morning took 4min on the i386/x86_64 arches and about
>> 18min on the ppc machine. Should we look at adding another ppc
>> machine? Or increasing memory in the existing one or something? If
>> it's hard to get hardware allocated, perhaps we could stick up a
>> donate button on the website to get more builder boxes?

Dan> Two issues here...  The PPC machine either has 4 processors or 2
Dan> x dual core, I forget which.  So while it's got twice the number
Dan> of CPUs as one of the hammer boxes, they are slightly slower.
Dan> The real issue with the PPC machine is the disks, which seem to
Dan> be slower in general, and that multiple mock/yum instances don't
Dan> run in parallel.  I think we're still seeing the issue where
Dan> mock/yum lock the host machine's rpmdb even though they install
Dan> to the chroot.  That means only one 'prepping' job can actually
Dan> execute at any time, even if 4 jobs are 'prepping'.

Bummer. Perhaps we could add another ppc box to spread out the disk
load? Or move to a RAID setup on it? 

FYI, two of my builds are currently sitting stuck on the build queue. 
- From what I can see in the log files, the build completed and the rpm
was written out, but then it's not seeing that and moving it to
needsign/addtorepo. 

Do you want to take a look at those jobs? 
Or should I just kill them and requeue? 

I'm in the #fedora-extras on irc.freenode.net too.

Dan> Dan

kevin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFDdkte3imCezTjY0ERAqQ+AJ9IyX1E1aRvioPLhov5WluZs6P0xgCfSOVV
5n7rHx1m0QH5xZ3WwwXdU34=
=ArRq
-----END PGP SIGNATURE-----