plague: Job waited too long for repo to unlock. Killing it...

Michael Schwendt bugs.michael at gmx.net
Fri Jan 18 02:56:39 UTC 2008


On Fri, 4 Jan 2008 15:55:56 +0100, Michael Schwendt wrote:

> > Certainly related, job #37767 just failed
> > 
> >   Failed to copy /srv/rpmbuild/server_work/fedora-5-epel/37767-php-pecl-memcache-2.2.1-1.el5/ppc/php-pecl-memcache-2.2.1-1.el5.ppc.rpm to the repository directory.
> > 
> 
> Status update:
> 
> A work-around and debug-helper has been applied. I've moved the "does the
> downloaded file exist and is readable?" check _in front_ of the downloader
> callback. It now checks the downloaded file as early as possible, and if
> the check fails, the download is marked as failed, so plague retries
> downloading it at most 10 times. If that fails, there is a serious problem
> of course (and that's not what we've seen). And if the good download turns
> inaccessible later although it had passed the check, that would be *very*
> interesting and will appear in the logs. Perhaps this mysterious
> download/add_to_repo problem is only a rare bug in threaded Python
> and working around it might be the only thing we can do.

I've applied another patch that shall fix the symptoms. (The cause of it
may be related to threads+events.)

While the work-around was good enough to prevent the server from crashing,
build-job 37923 (perl-String-CRC32 for el4) according to the logs did
"copy to needsign repo" twice. That also explains why files downloaded
from a builder are missing suddenly while copying them into the needsign
repo. The first run it copied all files successfully. Then during the
second run one of the files was missing because another thread had started
the related cleanup already. That made the job fail. According to the logs,
the packager requeued the job to build it once more and successfully.




More information about the Fedora-buildsys-list mailing list