InstantMirror Proposal Re: ApacheMirror.py for a site-local Fedora mirror

Warren Togami wtogami at redhat.com
Tue Nov 20 21:40:24 UTC 2007


Ed Swierk wrote:
> 
>> I didn't read deeply into your code yet, but I imagine that it needs
>> improvement to handle unique synchronization and expiration issues that
>> yum repos and rawhide install trees create when file contents change
>> without changing filenames.
> 
> If a requested file already exists in the local mirror, the handler
> compares the Last-Modified time of the upstream file with the local
> file, and downloads the file if the upstream version is newer. I'm not
> familiar with rawhide, but this seems to work okay for the updates
> repos where metadata files are frequently regenerated. It doesn't
> remove files that no longer exist upstream, of course.

Ah, this works great as an initial implementation.  We can at least have 
something that works before we make it more efficient (and less 
unfriendly in hitting the upstream server too many times).

> 
>> That daemon could be configured to handle intelligent expiry of various
>> parts of the mirror tree in different ways.  For example:
>> - development (rawhide) repo changes at least once per day.  It also
>> contains install images (boot.iso, bootdisk.img, stage2, etc.) that need
>> to be expired every time the tree changes.  (We might need to add a
>> hashes file to the mirror tree to allow the tool to monitor these changes.)
>> - Released distros never change, so don't need to monitor their
>> repomd.xml for changes.
> 
> An even simpler approach is to have the daemon iterate through every
> local file, checking whether the file exists upstream and deleting the
> local copy if it doesn't. This requres no repodata parsing, but
> flooding the upstream server with HEAD requests might be considered
> unfriendly.

Why don't we implement the unfriendly approach first because we can get 
that out quickly.  That way people can have something to run while we 
work on the proper version that substantially reduces the number of hits 
to the upstream server.

Warren Togami
wtogami at redhat.com




More information about the fedora-devel-list mailing list