[Pulp-list] A repository download policy set to background means?

Michael Hrivnak mhrivnak at redhat.com
Wed May 11 12:34:12 UTC 2016

Using a proxy such as varnish or squid is a key part of the on-demand
workflow. Consider this common use case:

With some type of systems management software, a user initiates a "yum
update" or equivalent on 1000 machines at once. They're all the same or
similar, so they want to retrieve the same 20 RPMs from Pulp. Pulp has not
yet downloaded any of them, so the RPMs are retrieved through the streamer.
We would not want pulp to download 1000 copies of each RPM from the remote
source. There needs to be some way to de-duplicate the requests, so Pulp
only downloads each RPM once. We also don't want to keep HTTP requests
waiting while the download completes (what if it takes a long time?), so as
bytes are retrieved from that one remote download, we want them streamed
out to each of the 1000 clients.

This is a complex and specialized problem to solve, especially when you are
handling requests in multiple processes across multiple machines. Squid and
varnish are great at solving this particular problem, so we use them to do

On the graphic at the link below, you can see the de-duplication shown as "
many requests ---> [ squid cache ] ---> one request"


Hopefully that's helpful. Let me know if more explanation on anything would
make it more clear.


On Wed, May 11, 2016 at 3:56 AM, Lutchy Horace (Mailing List) <
mailinglist.subscriptions at lhprojects.net> wrote:

> On Wed, 11 May 2016 01:16:18 -0400
> "Lutchy Horace (Mailing List)"
> <mailinglist.subscriptions at lhprojects.net> wrote:
> > Hello,
> >
> > To avoid flooding the mailing list with multiple E-Mails, I'll
> > be collapsing a few questions into one E-Mail.
> >
> > 1. I had assumed a completed sync task meant it pulled remote packages
> > onto the pulp server. The documentation isn't quite clear on this
> > subject because this does not seem to be the case?
> >
> > 2. Going through bug reports in regards to download policies, the
> > picture is clearer regarding immediate and on_demand policies but
> > quite vague about what background policy do?
> >
> > 3. Do you really need to download content units onto the pulp server?
> >       3.a. If not, how does this work? Does consumers contact the
> >       origin servers directly?
> >       3.b. If yes, what is difference of scheduling a sync task and
> >       not a "download" task?
> >
> > Regards
> >
> While resolving an entirely different issue regarding pulp, I stumbled
> on https://media.readthedocs.org/pdf/pulp/stable/pulp.pdf and
> http://pulp.readthedocs.io/en/latest/user-guide/deferred-download.html.
> Which elaborates a bit more on what each download policy actually does.
> So far, I've installed python-pulp-streamer and varnish on the same
> box, although I am bit confuse as to why I would need an additional
> 'Reverse Proxy' in the stack. That at least fixes the 'No more mirrors
> left to try' problem I was facing on consumers.
> Regards
> --
> Lutchy Horace
> Owner/Operator/Administrator [http://www.lhprojects.net]
> Owner/Operator/Administrator [http://www.bombshellz.net]
> Owner/Operator/Administrator [http://www.animehouse.club]
> About Me [http://about.me/lhprojects]
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20160511/218fbf0c/attachment.htm>

More information about the Pulp-list mailing list