Instant Mirror Status...?

Les Mikesell lesmikesell at gmail.com
Fri Sep 26 15:35:31 UTC 2008


James Antill wrote:
> >
>>>> Furthermore, I absolutely don't want to return the same mirror at the
>>>> top of the list _for everyone_ in a given country.
>>> Hash MM's "primary" IP address to select one of the various available
>>> mirrors, assuming they're returned in a consistent order?
>> If you are going to return a list of N mirrors, make N copies of that 
>> list, rotating one position for each.  Knock the last octet off the 
>> source IP and hash the remaining part with some consistent algorithm 
>> that will give you N values and use that to choose the copy of the list 
>> you send.
> 
>  Which is much harder than it sounds given that MM can't actually "make
> N copies" of each list of IPs it might send out. But...

If you can get the list in a fixed order, you just have to replace the 
code that randomizes it with something that isn't 'worst-possible-case' 
for a site with a caching proxy.  You could get some improvement simply 
by setting cache control headers on the list for some reasonable time - 
but then it is much harder to correct a mistake.

>>   Everything is as distributed and robust as before, but you 
>> don't defeat attempts to save your bandwidth with caching proxies.
> 
>  This is _only_ true if you are getting asked for the list from every
> single IP address, or that the subset of IP addresses you are getting
> asked from happen to be as random/distributed as what MM does now.

That's up to the hashing algorithm.  I'm not an expert, but someone 
should be able to pick one that can take the first 3 octets of an IP 
address as input and give an essentially random distribution.  For brute 
force you could convert the address to ascii, md5 it, then take modulo 
the number of list items as the starting point.  There's probably 
something much more efficient, but that should give you randomness. I'd 
drop the last octet so clustered proxies in the same class C subnet or 
behind NAT gateways with multiple public addresses would get the same list.

>  You might argue that it'll probably "random/distributed enough", but I
> find it much easier to believe that the above will solve your problem
> and you didn't get much further than that in your analysis.

It isn't 'my' problem.  It's everyone's problems that the mirrors have 
to send many times the number of copies that they would if you stop 
going out of your way to defeat existing caching infrastructure.   And I 
intentionally left the choice of hashing algorithm up to someone who is 
more familiar with their nature.  Personally, I don't think it can get 
any worse than it is so I'm probably not qualified for the analysis 
you'd like.  As long as you keep giving the whole list, the clients will 
find something that works even if it isn't optimal.  Or maybe yum could 
look for proxy headers on the response and (optionally) randomize by 
itself if there are none.

-- 
   Les Mikesell
    lesmikesell at gmail.com






More information about the fedora-devel-list mailing list