[RFC]InstantMirror Redesign

Kulbir Saini kulbirsaini at students.iiit.ac.in
Tue Mar 11 16:40:27 UTC 2008


Hi!

> On 10/03/2008, Kulbir Saini <kulbirsaini at students.iiit.ac.in> wrote:
>
>>     1. InstantMirror to be used by a small group of people. In this case we
>>  have to get rid of all the dependencies like squid, apache because for a
>>  small setup nobody is going to configure squid and apache. So, we use a
>>  proxy server implemented in python for this kind of setup and integrate
>>  InstantMirror with it in caching mode. So that it becomes easy to setup
>>  and don't require squid, apache or whatever else.
>
> You've just described "squid in offline mode", more or less, above
> (offline being the "honour request from cache if we're unable to
> connect" part).

       I agree with the point that while we can't connect to remote host we
are not doing anything better than squid. Actually, I think that having
something is better than nothing. If you can't fetch package from
upstream, serve whatever you have.

>>     2. InstantMirror to be used by an organization. As almost all the
>>  organization ( i am focusing more on institutes/universities here) use a
>>  common proxy server to access the Internet, we will have the InstantMirror
>>  which can be integrated with squid. There will be no difficulty in setup
>>  as the people already use squid (assuming squid is widely used in
>>  Unix/Linux world) and know how to configure it. We can't use proxy server
>>  implemented in python here because no organization would ever agree to use
>>  a stripped down version of proxy server instead of squid.
>
> ... and in this case, you want squid. Basically.

       I disagree with this point. Keeping in mind my knowledge of squid
(actually squid behavior may be different than what I think or what I
have understood), here we are doing more than what squid does.

       1. If I get a request for xyz-0.1.2.rpm from a repo mirror M1 or repo
R1. If we are using squid, then squid will fetch xyz-0.1.2.rpm from
upstream and cache as well as serve the client. But if another request
comes for xyz-0.1.2.rpm from a repo mirror M2 of repo R1 or a repo
mirror M3 of repo R2, then squid will fetch xyz-0.1.2.rpm again, though
the packages are same.

       2. Squid stores the cached packages in a cryptic manner on the hard
disk that can't be browsed. You can't really prioritize what you want
to store as in case of Yum plugin below. We want to facilitate rpm
search on the local mirror in the long run so we should know what is
stored where. And rpm packages may need to be stored or transferred on
a separate server which is not possible in squid.

>>     Imagine a university with thousands of Linux users and everyone is
>>  updating their system weekly. GBs of bandwidth is being wasted every week
>>  due to subsequent downloads of the same package.
>
> You just have to update the "maximum object size" for the squid cache
> to cover the largest package in the distro. It's more or less plug and
> play after that. You will need to make all your clients use the same
> mirrorlist for all your clients, of course :))

       We can't go to thousand of people and ask them to use the same
mirrorlist. So, if we have InstantMirror everybody is free to use
whatever repo or mirrorlist he/she wants to use and our system will
cache relevant packages perfectly fine.

>>     If you have any suggestions for improvements, comments on the current
>>  design or you want to criticize the design, please reply back. They would
>>  really help me to improve.
>
> Seriously, you will end up rewriting squid. Just use it :o) it's
> surprisingly light on memory if you configure it right. You wouldn't
> mind a single-purpose proxy running on your machine? You would have
> the same benefit from actually using squid, *plus* that can be used
> for other purposes (i.e. shared cache for browsers running on the
> machine, too). I've done this for many years now.

        I don't think we are rewriting squid or so. It would be a very small
python module which will do very limited by relevant things. Just
intercepts connections for rpms and metadata and let squid handle
everything else. We are not basically interrupting squid much.
Actually squid can't meet all our requirements, and thats the reason
we are forced to do something like this.

>>  PS : This is my first RFC, if i wrote it badly please forgive me :)
>
> It's not bad at all! You've covered all the bases I could see. You
> would still need a "mirror manager" process involved (to do any kind
> of prefetching). even with squid as a proxy; so please don't take the
> above as any kind of slap in the face. You've done a good job of the
> requirements doc :)

         Thank a lot for noting down all those points. They helped me to have
a more clear idea of InstantMirror. And thanks for the encouragement
as well :)



-------------------------------------------------------
Thank you,
Kulbir Saini,
Computer Science and Engineering,
International Institute of Information Technology,
Hyderbad, India - 500032.

My Home-Page: http://saini.co.in/
My Institute: http://www.iiit.ac.in/
My Linux-Blog: http://linux.saini.co.in/

IRC nick : generalBordeaux
Channels : #fedora, #fedora-devel, #yum on freenode
-------------------------------------------------------





More information about the fedora-devel-list mailing list