[Pulp-dev] Lazy for Pulp3

Tue May 29 19:31:23 UTC 2018

On Tue, May 29, 2018 at 11:42 AM, Milan Kovacik <mkovacik at redhat.com> wrote:

> On Tue, May 29, 2018 at 5:13 PM, Dennis Kliban <dkliban at redhat.com> wrote:
> > On Tue, May 29, 2018 at 10:41 AM, Milan Kovacik <mkovacik at redhat.com>
> wrote:
> >>
> >> Good point!
> >> More the second; it might be a bit crazy to utilize Squid for that but
> >> first, let's answer the why ;)
> >> So why does Pulp need to store the content here?
> >> Why don't we point the users to the Squid all the time (for the lazy
> >> repos)?
> >
> >
> > Pulp's Streamer needs to fetch and store the content because that's
> Pulp's
> > primary responsibility.
>
> Maybe not that much the storing but rather the content views management?
> I mean the partitioning into repositories, promoting.
>
>
Exactly this. We want Pulp users to be able to reuse content that was
brought in using the 'on_demand' download policy in other repositories.

> If some of the content lived in Squid and some lived
> > in Pulp, it would be difficult for the user to know what content is
> actually
> > available in Pulp and what content needs to be fetched from a remote
> > repository.
>
> I'd say the rule of the thumb would be: lazy -> squid, regular -> pulp
> so not that difficult.
> Maybe Pulp could have a concept of Origin, where folks upload stuff to
> a Pulp repo, vs. Proxy for it's repo storage policy?
>
>
Squid removes things from the cache at some point. You can probably
configure it to never remove anything from the cache, but then we would
need to implement orphan cleanup that would work across two systems: pulp
and squid.

Answering that question would still be difficult. Not all content that is
in the repository that was synced using on_demand download policy will be
in Squid - only the content that has been requested by clients. So it's
still hard to know which of the content units have been downloaded and
which have not been.

> >
> > As Pulp downloads an Artifact, it calculates all the checksums and it's
> > size. It then performs validation based on information that was provided
> > from the RemoteArtifact. After validation is performed, the Artifact, is
> > saved to the database and it's final place in
> /var/lib/content/artifacts/.
>
> This could be still achieved by storing the content just temporarily
> in the Squid proxy i.e use Squid as the content source, not the disk.
>
> > Once this information is in the database, Pulp's web server can serve the
> > content without having to involve the Streamer or Squid.
>
> Pulp might serve just the API and the metadata, the content might be
> redirected to the Proxy all the time, correct?
> Doesn't Crane do that btw?
>

Theoretically we could do this, but in practice we would run into problems
when we needed to scale out the Content app. Right now when the Content app
needs to be scaled, a user can launch another machine that will run the
Content app. Squid does not support that kind of scaling. Squid can only
take advantage of additional cores in a single machine.

>
> Cheers,
> milan
>
> >
> > -dennis
> >
> >
> >
> >
> >
> >>
> >>
> >> --
> >> cheers
> >> milan
> >>
> >> On Tue, May 29, 2018 at 4:25 PM, Brian Bouterse <bbouters at redhat.com>
> >> wrote:
> >> >
> >> > On Mon, May 28, 2018 at 9:57 AM, Milan Kovacik <mkovacik at redhat.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Looking at the diagram[1] I'm wondering what's the reasoning behind
> >> >> Pulp having to actually fetch the content locally?
> >> >
> >> >
> >> > Is the question "why is Pulp doing the fetching and not Squid?" or
> "why
> >> > is
> >> > Pulp storing the content after fetching it?" or both?
> >> >
> >> >> Couldn't Pulp just rely on the proxy with regards to the content
> >> >> streaming?
> >> >>
> >> >> Thanks,
> >> >> milan
> >> >>
> >> >>
> >> >> [1] https://pulp.plan.io/attachments/130957
> >> >>
> >> >> On Fri, May 25, 2018 at 9:11 PM, Brian Bouterse <bbouters at redhat.com
> >
> >> >> wrote:
> >> >> > A mini-team of core devs** met to talk through lazy use cases for
> >> >> > Pulp3.
> >> >> > It's effectively the same lazy from Pulp2 except:
> >> >> >
> >> >> > * it's now built into core (not just RPM)
> >> >> > * It disincludes repo protection use cases because we haven't added
> >> >> > repo
> >> >> > protection to Pulp3 yet
> >> >> > * It disincludes the "background" policy which based on feedback
> from
> >> >> > stakeholders provided very little value
> >> >> > * it will no longer will depend on Twisted as a dependency. It will
> >> >> > use
> >> >> > asyncio instead.
> >> >> >
> >> >> > While it is being built into core, it will require minimal support
> by
> >> >> > a
> >> >> > plugin writer to add support for it. Details in the epic below.
> >> >> >
> >> >> > The current use cases along with a technical plan are written on
> this
> >> >> > epic:
> >> >> > https://pulp.plan.io/issues/3693
> >> >> >
> >> >> > We're putting it out for comment, questions, and feedback before we
> >> >> > start
> >> >> > into the code. I hope we are able to add this into our next sprint.
> >> >> >
> >> >> > ** ipanova, jortel, ttereshc, dkliban, bmbouter
> >> >> >
> >> >> > Thanks!
> >> >> > Brian
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > Pulp-dev mailing list
> >> >> > Pulp-dev at redhat.com
> >> >> > https://www.redhat.com/mailman/listinfo/pulp-dev
> >> >> >
> >> >
> >> >
> >>
> >> _______________________________________________
> >> Pulp-dev mailing list
> >> Pulp-dev at redhat.com
> >> https://www.redhat.com/mailman/listinfo/pulp-dev
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180529/0a5e9d61/attachment.htm>