More status on the Extras buildsystem
Dan Williams
dcbw at redhat.com
Mon Aug 1 16:03:09 UTC 2005
On Mon, 1 Aug 2005, Paul Howarth wrote:
> I see that a number of jobs have now made it into the queue, including
> both of my requests (and some duplicates from other people too). I tried
> killing one of my duplicate jobs about 20 minutes ago by doing:
>
> $ plague-client kill 282
>
> Shortly afterwards I received an email stating that the job had been
> killed. However, the page
> http://buildsys.fedoraproject.org/build-status/job.psp?uid=282 still
> shows that job as "building" and in fact the plague-client command has
> still not exited. This doesn't seem right...
It appears that (as of last night) the build server was stuck in SSL_BIO_read()
trying to receive data from hammer3. I killed the hammer3 plague-builder
process, but the server didn't notice that because it was stuck in that
function.
Now the fix for this is to use socket timeouts, which essentially make the
sockets non-blocking, but this leads to other problems (ie, socket.makefile()
doesn't work well with socket.settimeout(), but we have to use makefile because
the SSL sockets don't have a dup2()) that need to be dealt with as well. I hope
that I can come up with some non-blocking solution here to deal with these
issues. The worst thing is that these problems are completely non-reproducible
and occur at random.
The immediate solution is to restart the build server.
Dan
More information about the Fedora-maintainers
mailing list