More status on the Extras buildsystem

Dan Williams dcbw at redhat.com
Mon Aug 1 16:03:09 UTC 2005


On Mon, 1 Aug 2005, Paul Howarth wrote:
> I see that a number of jobs have now made it into the queue, including
> both of my requests (and some duplicates from other people too). I tried
> killing one of my duplicate jobs about 20 minutes ago by doing:
> 
> $ plague-client kill 282
> 
> Shortly afterwards I received an email stating that the job had been
> killed. However, the page
> http://buildsys.fedoraproject.org/build-status/job.psp?uid=282 still
> shows that job as "building" and in fact the plague-client command has
> still not exited. This doesn't seem right...

It appears that (as of last night) the build server was stuck in SSL_BIO_read() 
trying to receive data from hammer3.  I killed the hammer3 plague-builder 
process, but the server didn't notice that because it was stuck in that 
function.

Now the fix for this is to use socket timeouts, which essentially make the 
sockets non-blocking, but this leads to other problems (ie, socket.makefile() 
doesn't work well with socket.settimeout(), but we have to use makefile because 
the SSL sockets don't have a dup2()) that need to be dealt with as well.  I hope 
that I can come up with some non-blocking solution here to deal with these 
issues.  The worst thing is that these problems are completely non-reproducible 
and occur at random.

The immediate solution is to restart the build server.

Dan




More information about the Fedora-maintainers mailing list