plague: Job waited too long for repo to unlock. Killing it...

Dan Williams dcbw at redhat.com
Fri Jan 18 14:56:10 UTC 2008


On Fri, 2008-01-18 at 03:56 +0100, Michael Schwendt wrote:
> On Fri, 4 Jan 2008 15:55:56 +0100, Michael Schwendt wrote:
> 
> > > Certainly related, job #37767 just failed
> > > 
> > >   Failed to copy /srv/rpmbuild/server_work/fedora-5-epel/37767-php-pecl-memcache-2.2.1-1.el5/ppc/php-pecl-memcache-2.2.1-1.el5.ppc.rpm to the repository directory.
> > > 
> > 
> > Status update:
> > 
> > A work-around and debug-helper has been applied. I've moved the "does the
> > downloaded file exist and is readable?" check _in front_ of the downloader
> > callback. It now checks the downloaded file as early as possible, and if
> > the check fails, the download is marked as failed, so plague retries
> > downloading it at most 10 times. If that fails, there is a serious problem
> > of course (and that's not what we've seen). And if the good download turns
> > inaccessible later although it had passed the check, that would be *very*
> > interesting and will appear in the logs. Perhaps this mysterious
> > download/add_to_repo problem is only a rare bug in threaded Python
> > and working around it might be the only thing we can do.
> 
> I've applied another patch that shall fix the symptoms. (The cause of it
> may be related to threads+events.)
> 
> While the work-around was good enough to prevent the server from crashing,
> build-job 37923 (perl-String-CRC32 for el4) according to the logs did
> "copy to needsign repo" twice. That also explains why files downloaded
> from a builder are missing suddenly while copying them into the needsign
> repo. The first run it copied all files successfully. Then during the
> second run one of the files was missing because another thread had started
> the related cleanup already. That made the job fail. According to the logs,
> the packager requeued the job to build it once more and successfully.

Sorry about all the threading :)  It was probably a bad idea (threads
usually are), but it did more or less do the job for a few years.
Here's hoping all the EPEL Koji issues can be ironed out so you don't
have to keep poking plague.

Dan





More information about the Fedora-buildsys-list mailing list