More status on the Extras buildsystem

Dan Williams dcbw at redhat.com
Mon Aug 1 18:32:57 UTC 2005


On Mon, 1 Aug 2005, seth vidal wrote:
> On Mon, 2005-08-01 at 12:03 -0400, Dan Williams wrote:
> > On Mon, 1 Aug 2005, Paul Howarth wrote:
> > > I see that a number of jobs have now made it into the queue, including
> > > both of my requests (and some duplicates from other people too). I tried
> > > killing one of my duplicate jobs about 20 minutes ago by doing:
> > > 
> > > $ plague-client kill 282
> > > 
> > > Shortly afterwards I received an email stating that the job had been
> > > killed. However, the page
> > > http://buildsys.fedoraproject.org/build-status/job.psp?uid=282 still
> > > shows that job as "building" and in fact the plague-client command has
> > > still not exited. This doesn't seem right...
> > 
> > It appears that (as of last night) the build server was stuck in SSL_BIO_read() 
> > trying to receive data from hammer3.  I killed the hammer3 plague-builder 
> > process, but the server didn't notice that because it was stuck in that 
> > function.
> > 
> > Now the fix for this is to use socket timeouts, which essentially make the 
> > sockets non-blocking, but this leads to other problems (ie, socket.makefile() 
> > doesn't work well with socket.settimeout(), but we have to use makefile because 
> > the SSL sockets don't have a dup2()) that need to be dealt with as well.  I hope 
> > that I can come up with some non-blocking solution here to deal with these 
> > issues.  The worst thing is that these problems are completely non-reproducible 
> > and occur at random.
> > 
> > The immediate solution is to restart the build server.

I think I've got a workable solution based on select() rather than setting 
socket timeouts, since timeouts also set the socket non-blocking.  I won't know 
all the error conditions until we deploy it but we can fix those up as we go.  
So hopefully this will just make the server drop the builder on the floor rather 
than hanging and getting confused as is currently the case.

Dan




More information about the Fedora-maintainers mailing list