InstantMirror Proposal Re: ApacheMirror.py for a site-local Fedora mirror

Warren Togami wtogami at redhat.com
Tue Nov 20 03:19:43 UTC 2007


Ed Swierk wrote:
> Having tired of babysitting the rsync cron job that was keeping my
> local Fedora mirror up-to-date, I tried the caching proxy approach
> suggested at http://fedoraproject.org/wiki/Infrastructure/Mirroring/SiteLocalMirrors
> for a few weeks. This, too, was unsatisfactory--I still want some
> control over the mirrored content and the ability to pre-populate the
> cache from a DVD ISO acquired via bittorrent when a new version of
> Fedora is released.
> 
> ApacheMirror.py is a mod_python request handler that behaves like a
> caching proxy, except it maps the URL path of a cached document
> directly to a local directory rather than hashing the URL, this
> preserving the mirror directory structure.
> 
> Just drop ApacheMirror.py into /usr/lib/python*/site-packages, set
> your preferred upstream server and point it at a local directory on a
> nice big disk, and forget it:
> 
> <VirtualHost *:80>
>    ServerName mirrors.sample.com
>    ServerName mirrors
>    DocumentRoot /mirrors
> 
>    SetHandler mod_python
>    PythonHandler ApacheMirror
>    PythonDebug on
>    PythonOption ApacheMirror.upstream http://download.fedora.redhat.com
> </VirtualHost>
> 
> The implementation is by no means bulletproof--consider this release
> 0.1--but it's worked well enough to serve local yum needs for the past
> few days.
> 
> If there's interest, I could package up the script into an srpm (which
> seems overkill for 50 lines of Python) or submit it as a patch to some
> existing package.
> 
> --Ed
> 

Excellent, I was hoping for something like this!  I had played a bit 
with both squid and varnish, but neither were fully satisfactory because 
they can't easily store your cache in the original directory structure 
without writing your own backend storage engine.

http://fedoraproject.org/wiki/Infrastructure/ProjectHosting/RequestingNewProject
Could you please create an "upstream" project for it at 
hosted.fedoraproject.org?  I think there are a number of improvements 
that can be made.

I didn't read deeply into your code yet, but I imagine that it needs 
improvement to handle unique synchronization and expiration issues that 
yum repos and rawhide install trees create when file contents change 
without changing filenames.

Perhaps a separate, asynchronous daemon can monitor upstream (via HTTP 
or whatever) for repomd.xml changes.  It should then parse the 
repomd.xml so it knows when to expire the repodata/* files.  Then it 
should parse the .xml files in repodata/ to compare it to local storage, 
and intelligently expire the packages if any changed (as happens during 
signing).  It can then know exactly which files to delete from the local 
cache because they are no longer in the upstream.  This daemon interacts 
with ApacheMirror.py only in deleting files from the local directories, 
effectively expiring the cache.  Very simple.

That daemon could be configured to handle intelligent expiry of various 
parts of the mirror tree in different ways.  For example:
- development (rawhide) repo changes at least once per day.  It also 
contains install images (boot.iso, bootdisk.img, stage2, etc.) that need 
to be expired every time the tree changes.  (We might need to add a 
hashes file to the mirror tree to allow the tool to monitor these changes.)
- Released distros never change, so don't need to monitor their 
repomd.xml for changes.

Please create an upstream project at hosted.fedoraproject.org and let's 
get started on this!  Here you get to choose an project name for your 
new "upstream" project.  I personally would choose something like really 
obvious like InstantMirror... but you get to choose.

The default definitions for mirroring download.fedoraproject.org could 
be included in a Fedora/EPEL package that requires ApacheMirror.py and 
the monitor/expiry daemon.  That way a sysadmin who wants to create an 
instant Fedora mirror need only install that package and enable it in 
/etc/httpd/conf.d/.  yum update handles pulling in updates for tree 
changes (repo locations, how often to poll for repomd.xml changes, etc.)

Example:
yum install InstantMirror-fedora
vim /etc/httpd/conf.d/InstantMirror-fedora.conf
#(enable stuff)
service httpd restart
# http://fedora.localdomain.com
Instant Fedora mirror!

InstantMirror-fedora.noarch.rpm    : instant Fedora mirror
InstantMirror-centos.noarch.rpm    : instant CentOS mirror
InstantMirror-rpmfusion.noarch.rpm : instant RPMFusion mirror
InstantMirror-foo.noarch.rpm       : instant Foo mirror

Warren Togami
wtogami at redhat.com

p.s.
The same code could be used to create a public static-repos mirror. 
static-repos changes many times per hour, probe for changes every 2 
minutes.  We need a few permanent public mirrors of this so people stop 
hitting the koji server directly.  Any public mirror interested in 
hosting this?

p.p.s.
Another idea before I forget about it:
Later add configurable fallbacks to a different upstream if 
download.fp.org is down.  mirrors.kernel.org might be a good alternative 
for default, for example.




More information about the fedora-devel-list mailing list