mirrormanager future features
Matt_Domsch at dell.com
Mon Sep 3 23:05:52 UTC 2007
MirrorManager, for what I really wanted to see by the Fedora 7
release, has been a success. But there are still several gotchas I'd
like to iron out before Fedora 8.
* The mirrorlist mod_python applet consumes too much memory on the app
servers. It basically reads in a 2MB mirrorlist_cache pickle file
which is lists, by directory, of what mirrors hold what content.
Handy to have, but in mod_python, that blows the RSS size out to
~27MB per process, times all the httpd processes that have run that
code, each with their own private copy. Not pretty.
The mirrormanager TurboGears backend isn't fast enough to handle all
the client requests for mirrorlists, hence I exported the data for
mod_python to use. But the mod_python trick takes too much memory.
The way out? Split the mod_python applet into two pieces:
1) Yet another daemon, listening on a local UNIX socket,
that has a copy of the mirrorlist cache. It calculates the answers
2) The mod_python applet connects to the daemon, passes it's list
of args, and gets back the answer list. It handles redirects too.
In this way, the daemon can fork() itself if necessary to handle the
traffic, but those forks() use copy-on-write memory, and the
children will never touch the pickle, so they'll all share mostly
the same memory. One copy of the mirrorlist_cache, used by all
Since I'm saving so much RSS memory here, I can add back into the
mirrorlist_cache all the directories which are being omitted
now. So, we will be able to return the list for any dir or file that
the public mirrors know about, not just a few as we do now.
I've got a stab at this, but am still working on the details. I'll
want to do some time tests against the new code, to make sure it
isn't too much slower for clients, but a quick swag shows it'll be
OK; 0.3sec or so per request, even in parallel, which IMHO is "good
* Mike's redirection stuff is included in the above already, so
that'll be online as soon as the rest is.
Now, to find the time before F8...
Still to come, provided I find a lot of time (unlikely), or someone
else steps up to help:
* Designate a way for mirrors to claim themselves to be always
up-to-date. Probably will require a sysadmin to set this bit, as
it's somewhat dangerous. But there are cases, e.g. a local
out-of-line squid proxy, where it makes sense to do it. This change
will change the schema, and has repercussions throughout the code,
so I haven't wanted to make it lightly.
* Some people want metalink support. Conceptually it's possible, and
even pretty easy once we've got the daemon above working right. But
as noted on f-d-l, it's been 10 weeks since someone asked for it and
even sent some code that doesn't quite integrate but was a starting
point, and I haven't had time to get to it. It's not looking good
for me to add that right now, but I'd be happy to review patches.
* I've wanted to add the libgeoip country->continent mappings, so we
can fall back netblock -> country -> continent -> global but I
don't know C->Python bindings code at all, and need that exported in
python-GeoIP for mm to use.
* I've got pending a request to change the fedora.repo files to make
yum treat the list as in priority order. I really want the
continent mappings in place before doing that though...
Should we let countries with <3 mirrors return their own lists? Right
now if a country has <3 mirrors, the users get the global list back.
Anything else people really need to see?
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux
More information about the Fedora-infrastructure-list