[Pulp-dev] Lazy for Pulp3

Wed May 30 19:48:08 UTC 2018

On Wed, May 30, 2018 at 4:50 PM, Brian Bouterse <bbouters at redhat.com> wrote:
>
>
> On Wed, May 30, 2018 at 8:57 AM, Tom McKay <thomasmckay at redhat.com> wrote:
>>
>> I think there is a usecase for "proxy only" like is being described here.
>> Several years ago there was a project called thumbslug[1] that was used in a
>> version of katello instead of pulp. It's job was to check entitlements and
>> then proxy content from a cdn. The same functionality could be implemented
>> in pulp. (Perhaps it's even as simple as telling squid not to cache anything
>> so the content would never make it from cache to pulp in current pulp-2.)
>
>
> What would you call this policy?
> policy=proxy?
> policy=stream-dont-save?
> policy=stream-no-save?
>
> Are the names 'on-demand' and 'immediate' clear enough? Are there better
> names?
>>
>>
>> Overall I'm +1 to the idea of an only-squid version, if others think it
>> would be useful.
>
>
> I understand describing this as a "only-squid" version, but for clarity, the
> streamer would still be required because it is what requests the bits with
> the correctly configured downloader (certs, proxy, etc). The streamer
> streams the bits into squid which provides caching and client multiplexing.

I have to admit it's just now I'm reading
https://docs.pulpproject.org/dev-guide/design/deferred-download.html#apache-reverse-proxy
again because of the SSL termination. So the new plan is to use the
streamer to terminate the SSL instead of the Apache reverse proxy?

W/r the construction of the URL of an artifact, I thought it would be
stored in the DB, so the Remote would create it during the sync.

>
> To confirm my understanding this "squid-only" policy would be the same as
> on-demand except that it would *not* perform step 14 from the diagram here
> (https://pulp.plan.io/issues/3693). Is that right?
yup
>
>>
>>
>> [1] https://github.com/candlepin/thumbslug
>>
>> On Wed, May 30, 2018 at 8:34 AM, Milan Kovacik <mkovacik at redhat.com>
>> wrote:
>>>
>>> On Tue, May 29, 2018 at 9:31 PM, Dennis Kliban <dkliban at redhat.com>
>>> wrote:
>>> > On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik <mkovacik at redhat.com>
>>> > wrote:
>>> >>
>>> >> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <dkliban at redhat.com>
>>> >> wrote:
>>> >> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik
>>> >> > <mkovacik at redhat.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> Good point!
>>> >> >> More the second; it might be a bit crazy to utilize Squid for that
>>> >> >> but
>>> >> >> first, let's answer the why ;)
>>> >> >> So why does Pulp need to store the content here?
>>> >> >> Why don't we point the users to the Squid all the time (for the
>>> >> >> lazy
>>> >> >> repos)?
>>> >> >
>>> >> >
>>> >> > Pulp's Streamer needs to fetch and store the content because that's
>>> >> > Pulp's
>>> >> > primary responsibility.
>>> >>
>>> >> Maybe not that much the storing but rather the content views
>>> >> management?
>>> >> I mean the partitioning into repositories, promoting.
>>> >>
>>> >
>>> > Exactly this. We want Pulp users to be able to reuse content that was
>>> > brought in using the 'on_demand' download policy in other repositories.
>>> I see.
>>>
>>> >
>>> >>
>>> >> If some of the content lived in Squid and some lived
>>> >> > in Pulp, it would be difficult for the user to know what content is
>>> >> > actually
>>> >> > available in Pulp and what content needs to be fetched from a remote
>>> >> > repository.
>>> >>
>>> >> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
>>> >> so not that difficult.
>>> >> Maybe Pulp could have a concept of Origin, where folks upload stuff to
>>> >> a Pulp repo, vs. Proxy for it's repo storage policy?
>>> >>
>>> >
>>> > Squid removes things from the cache at some point. You can probably
>>> > configure it to never remove anything from the cache, but then we would
>>> > need
>>> > to implement orphan cleanup that would work across two systems: pulp
>>> > and
>>> > squid.
>>>
>>> Actually "remote" units wouldn't need orphan cleaning from the disk,
>>> just dropping them from the DB would suffice.
>>>
>>> >
>>> > Answering that question would still be difficult. Not all content that
>>> > is in
>>> > the repository that was synced using on_demand download policy will be
>>> > in
>>> > Squid - only the content that has been requested by clients. So it's
>>> > still
>>> > hard to know which of the content units have been downloaded and which
>>> > have
>>> > not been.
>>>
>>> But the beauty is exactly in that: we don't have to track whether the
>>> content is downloaded if it is reverse-proxied[1][2].
>>> Moreover, this would work both with and without a proxy between Pulp
>>> and the Origin of the remote unit.
>>> A "remote" content artifact might just need to carry it's URL in a DB
>>> column for this to work; so the async artifact model, instead of the
>>> "policy=on-demand"  would have a mandatory remote "URL" attribute; I
>>> wouldn't say it's more complex than tracking the "policy" attribute.
>>>
>>> >
>>> >
>>> >>
>>> >> >
>>> >> > As Pulp downloads an Artifact, it calculates all the checksums and
>>> >> > it's
>>> >> > size. It then performs validation based on information that was
>>> >> > provided
>>> >> > from the RemoteArtifact. After validation is performed, the
>>> >> > Artifact, is
>>> >> > saved to the database and it's final place in
>>> >> > /var/lib/content/artifacts/.
>>> >>
>>> >> This could be still achieved by storing the content just temporarily
>>> >> in the Squid proxy i.e use Squid as the content source, not the disk.
>>> >>
>>> >> > Once this information is in the database, Pulp's web server can
>>> >> > serve
>>> >> > the
>>> >> > content without having to involve the Streamer or Squid.
>>> >>
>>> >> Pulp might serve just the API and the metadata, the content might be
>>> >> redirected to the Proxy all the time, correct?
>>> >> Doesn't Crane do that btw?
>>> >
>>> >
>>> > Theoretically we could do this, but in practice we would run into
>>> > problems
>>> > when we needed to scale out the Content app. Right now when the Content
>>> > app
>>> > needs to be scaled, a user can launch another machine that will run the
>>> > Content app. Squid does not support that kind of scaling. Squid can
>>> > only
>>> > take advantage of additional cores in a single machine
>>>
>>> I don't think I understand; proxies are actually designed to scale[1]
>>> and are used as tools to scale the web too.
>>>
>>> This is all about the How question but when it comes to my original
>>> Why, please correct me if I'm being wrong, the answer so far has been:
>>>  Pulp always downloads the content because that's what it is supposed to
>>> do.
>>>
>>> Cheers,
>>> milan
>>>
>>> [1] https://en.wikipedia.org/wiki/Reverse_proxy
>>> [2] https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA
>>> [3]
>>> https://wiki.squid-cache.org/Features/CacheHierarchy?highlight=%28faqlisted.yes%29
>>>
>>> >
>>> >>
>>> >>
>>> >> Cheers,
>>> >> milan
>>> >>
>>> >> >
>>> >> > -dennis
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> cheers
>>> >> >> milan
>>> >> >>
>>> >> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse
>>> >> >> <bbouters at redhat.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik
>>> >> >> > <mkovacik at redhat.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> Hi,
>>> >> >> >>
>>> >> >> >> Looking at the diagram[1] I'm wondering what's the reasoning
>>> >> >> >> behind
>>> >> >> >> Pulp having to actually fetch the content locally?
>>> >> >> >
>>> >> >> >
>>> >> >> > Is the question "why is Pulp doing the fetching and not Squid?"
>>> >> >> > or
>>> >> >> > "why
>>> >> >> > is
>>> >> >> > Pulp storing the content after fetching it?" or both?
>>> >> >> >
>>> >> >> >> Couldn't Pulp just rely on the proxy with regards to the content
>>> >> >> >> streaming?
>>> >> >> >>
>>> >> >> >> Thanks,
>>> >> >> >> milan
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> [1] https://pulp.plan.io/attachments/130957
>>> >> >> >>
>>> >> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse
>>> >> >> >> <bbouters at redhat.com>
>>> >> >> >> wrote:
>>> >> >> >> > A mini-team of core devs** met to talk through lazy use cases
>>> >> >> >> > for
>>> >> >> >> > Pulp3.
>>> >> >> >> > It's effectively the same lazy from Pulp2 except:
>>> >> >> >> >
>>> >> >> >> > * it's now built into core (not just RPM)
>>> >> >> >> > * It disincludes repo protection use cases because we haven't
>>> >> >> >> > added
>>> >> >> >> > repo
>>> >> >> >> > protection to Pulp3 yet
>>> >> >> >> > * It disincludes the "background" policy which based on
>>> >> >> >> > feedback
>>> >> >> >> > from
>>> >> >> >> > stakeholders provided very little value
>>> >> >> >> > * it will no longer will depend on Twisted as a dependency. It
>>> >> >> >> > will
>>> >> >> >> > use
>>> >> >> >> > asyncio instead.
>>> >> >> >> >
>>> >> >> >> > While it is being built into core, it will require minimal
>>> >> >> >> > support
>>> >> >> >> > by
>>> >> >> >> > a
>>> >> >> >> > plugin writer to add support for it. Details in the epic
>>> >> >> >> > below.
>>> >> >> >> >
>>> >> >> >> > The current use cases along with a technical plan are written
>>> >> >> >> > on
>>> >> >> >> > this
>>> >> >> >> > epic:
>>> >> >> >> > https://pulp.plan.io/issues/3693
>>> >> >> >> >
>>> >> >> >> > We're putting it out for comment, questions, and feedback
>>> >> >> >> > before
>>> >> >> >> > we
>>> >> >> >> > start
>>> >> >> >> > into the code. I hope we are able to add this into our next
>>> >> >> >> > sprint.
>>> >> >> >> >
>>> >> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
>>> >> >> >> >
>>> >> >> >> > Thanks!
>>> >> >> >> > Brian
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > _______________________________________________
>>> >> >> >> > Pulp-dev mailing list
>>> >> >> >> > Pulp-dev at redhat.com
>>> >> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev
>>> >> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Pulp-dev mailing list
>>> >> >> Pulp-dev at redhat.com
>>> >> >> https://www.redhat.com/mailman/listinfo/pulp-dev
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>