[Pulp-dev] Lazy for Pulp3

Brian Bouterse bbouters at redhat.com
Wed May 30 15:34:19 UTC 2018


Actually, what about these as names?

policy=immediate  -> downloads now while the task runs (no lazy). Also the
default if unspecified.
policy=cache-and-save   -> All the steps in the diagram. Content that is
downloaded is saved so that it's only ever downloaded once.
policy=cache     -> All the steps in the diagram except step 14. If squid
pushes the bits out of the cache, it will be re-downloaded again to serve
to other clients requesting the same bits.

If ^ is better I can update the stories. Other naming ideas and use cases
are welcome.

Thanks,
Brian

On Wed, May 30, 2018 at 10:50 AM, Brian Bouterse <bbouters at redhat.com>
wrote:

>
>
> On Wed, May 30, 2018 at 8:57 AM, Tom McKay <thomasmckay at redhat.com> wrote:
>
>> I think there is a usecase for "proxy only" like is being described here.
>> Several years ago there was a project called thumbslug[1] that was used in
>> a version of katello instead of pulp. It's job was to check entitlements
>> and then proxy content from a cdn. The same functionality could be
>> implemented in pulp. (Perhaps it's even as simple as telling squid not to
>> cache anything so the content would never make it from cache to pulp in
>> current pulp-2.)
>>
>
> What would you call this policy?
> policy=proxy?
> policy=stream-dont-save?
> policy=stream-no-save?
>
> Are the names 'on-demand' and 'immediate' clear enough? Are there better
> names?
>
>>
>> Overall I'm +1 to the idea of an only-squid version, if others think it
>> would be useful.
>>
>
> I understand describing this as a "only-squid" version, but for clarity,
> the streamer would still be required because it is what requests the bits
> with the correctly configured downloader (certs, proxy, etc). The streamer
> streams the bits into squid which provides caching and client multiplexing.
>
> To confirm my understanding this "squid-only" policy would be the same as
> on-demand except that it would *not* perform step 14 from the diagram here (
> https://pulp.plan.io/issues/3693). Is that right?
>
>
>>
>> [1] https://github.com/candlepin/thumbslug
>>
>> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik <mkovacik at redhat.com>
>> wrote:
>>
>>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban <dkliban at redhat.com>
>>> wrote:
>>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik <mkovacik at redhat.com>
>>> wrote:
>>> >>
>>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <dkliban at redhat.com>
>>> wrote:
>>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik <
>>> mkovacik at redhat.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Good point!
>>> >> >> More the second; it might be a bit crazy to utilize Squid for that
>>> but
>>> >> >> first, let's answer the why ;)
>>> >> >> So why does Pulp need to store the content here?
>>> >> >> Why don't we point the users to the Squid all the time (for the
>>> lazy
>>> >> >> repos)?
>>> >> >
>>> >> >
>>> >> > Pulp's Streamer needs to fetch and store the content because that's
>>> >> > Pulp's
>>> >> > primary responsibility.
>>> >>
>>> >> Maybe not that much the storing but rather the content views
>>> management?
>>> >> I mean the partitioning into repositories, promoting.
>>> >>
>>> >
>>> > Exactly this. We want Pulp users to be able to reuse content that was
>>> > brought in using the 'on_demand' download policy in other repositories.
>>> I see.
>>>
>>> >
>>> >>
>>> >> If some of the content lived in Squid and some lived
>>> >> > in Pulp, it would be difficult for the user to know what content is
>>> >> > actually
>>> >> > available in Pulp and what content needs to be fetched from a remote
>>> >> > repository.
>>> >>
>>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
>>> >> so not that difficult.
>>> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to
>>> >> a Pulp repo, vs. Proxy for it's repo storage policy?
>>> >>
>>> >
>>> > Squid removes things from the cache at some point. You can probably
>>> > configure it to never remove anything from the cache, but then we
>>> would need
>>> > to implement orphan cleanup that would work across two systems: pulp
>>> and
>>> > squid.
>>>
>>> Actually "remote" units wouldn't need orphan cleaning from the disk,
>>> just dropping them from the DB would suffice.
>>>
>>> >
>>> > Answering that question would still be difficult. Not all content that
>>> is in
>>> > the repository that was synced using on_demand download policy will be
>>> in
>>> > Squid - only the content that has been requested by clients. So it's
>>> still
>>> > hard to know which of the content units have been downloaded and which
>>> have
>>> > not been.
>>>
>>> But the beauty is exactly in that: we don't have to track whether the
>>> content is downloaded if it is reverse-proxied[1][2].
>>> Moreover, this would work both with and without a proxy between Pulp
>>> and the Origin of the remote unit.
>>> A "remote" content artifact might just need to carry it's URL in a DB
>>> column for this to work; so the async artifact model, instead of the
>>> "policy=on-demand"  would have a mandatory remote "URL" attribute; I
>>> wouldn't say it's more complex than tracking the "policy" attribute.
>>>
>>> >
>>> >
>>> >>
>>> >> >
>>> >> > As Pulp downloads an Artifact, it calculates all the checksums and
>>> it's
>>> >> > size. It then performs validation based on information that was
>>> provided
>>> >> > from the RemoteArtifact. After validation is performed, the
>>> Artifact, is
>>> >> > saved to the database and it's final place in
>>> >> > /var/lib/content/artifacts/.
>>> >>
>>> >> This could be still achieved by storing the content just temporarily
>>> >> in the Squid proxy i.e use Squid as the content source, not the disk.
>>> >>
>>> >> > Once this information is in the database, Pulp's web server can
>>> serve
>>> >> > the
>>> >> > content without having to involve the Streamer or Squid.
>>> >>
>>> >> Pulp might serve just the API and the metadata, the content might be
>>> >> redirected to the Proxy all the time, correct?
>>> >> Doesn't Crane do that btw?
>>> >
>>> >
>>> > Theoretically we could do this, but in practice we would run into
>>> problems
>>> > when we needed to scale out the Content app. Right now when the
>>> Content app
>>> > needs to be scaled, a user can launch another machine that will run the
>>> > Content app. Squid does not support that kind of scaling. Squid can
>>> only
>>> > take advantage of additional cores in a single machine
>>>
>>> I don't think I understand; proxies are actually designed to scale[1]
>>> and are used as tools to scale the web too.
>>>
>>> This is all about the How question but when it comes to my original
>>> Why, please correct me if I'm being wrong, the answer so far has been:
>>>  Pulp always downloads the content because that's what it is supposed to
>>> do.
>>>
>>> Cheers,
>>> milan
>>>
>>> [1] https://en.wikipedia.org/wiki/Reverse_proxy
>>> [2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA
>>> [3] https://wiki.squid-cache.org/Features/CacheHierarchy?highlig
>>> ht=%28faqlisted.yes%29
>>>
>>> >
>>> >>
>>> >>
>>> >> Cheers,
>>> >> milan
>>> >>
>>> >> >
>>> >> > -dennis
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> cheers
>>> >> >> milan
>>> >> >>
>>> >> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse <
>>> bbouters at redhat.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik <
>>> mkovacik at redhat.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi,
>>> >> >> >>
>>> >> >> >> Looking at the diagram[1] I'm wondering what's the reasoning
>>> behind
>>> >> >> >> Pulp having to actually fetch the content locally?
>>> >> >> >
>>> >> >> >
>>> >> >> > Is the question "why is Pulp doing the fetching and not Squid?"
>>> or
>>> >> >> > "why
>>> >> >> > is
>>> >> >> > Pulp storing the content after fetching it?" or both?
>>> >> >> >
>>> >> >> >> Couldn't Pulp just rely on the proxy with regards to the content
>>> >> >> >> streaming?
>>> >> >> >>
>>> >> >> >> Thanks,
>>> >> >> >> milan
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> [1] https://pulp.plan.io/attachments/130957
>>> >> >> >>
>>> >> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse
>>> >> >> >> <bbouters at redhat.com>
>>> >> >> >> wrote:
>>> >> >> >> > A mini-team of core devs** met to talk through lazy use cases
>>> for
>>> >> >> >> > Pulp3.
>>> >> >> >> > It's effectively the same lazy from Pulp2 except:
>>> >> >> >> >
>>> >> >> >> > * it's now built into core (not just RPM)
>>> >> >> >> > * It disincludes repo protection use cases because we haven't
>>> >> >> >> > added
>>> >> >> >> > repo
>>> >> >> >> > protection to Pulp3 yet
>>> >> >> >> > * It disincludes the "background" policy which based on
>>> feedback
>>> >> >> >> > from
>>> >> >> >> > stakeholders provided very little value
>>> >> >> >> > * it will no longer will depend on Twisted as a dependency. It
>>> >> >> >> > will
>>> >> >> >> > use
>>> >> >> >> > asyncio instead.
>>> >> >> >> >
>>> >> >> >> > While it is being built into core, it will require minimal
>>> support
>>> >> >> >> > by
>>> >> >> >> > a
>>> >> >> >> > plugin writer to add support for it. Details in the epic
>>> below.
>>> >> >> >> >
>>> >> >> >> > The current use cases along with a technical plan are written
>>> on
>>> >> >> >> > this
>>> >> >> >> > epic:
>>> >> >> >> > https://pulp.plan.io/issues/3693
>>> >> >> >> >
>>> >> >> >> > We're putting it out for comment, questions, and feedback
>>> before
>>> >> >> >> > we
>>> >> >> >> > start
>>> >> >> >> > into the code. I hope we are able to add this into our next
>>> >> >> >> > sprint.
>>> >> >> >> >
>>> >> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
>>> >> >> >> >
>>> >> >> >> > Thanks!
>>> >> >> >> > Brian
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > _______________________________________________
>>> >> >> >> > Pulp-dev mailing list
>>> >> >> >> > Pulp-dev at redhat.com
>>> >> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev
>>> >> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Pulp-dev mailing list
>>> >> >> Pulp-dev at redhat.com
>>> >> >> https://www.redhat.com/mailman/listinfo/pulp-dev
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180530/bb99bb06/attachment.htm>


More information about the Pulp-dev mailing list