<div dir="ltr"><div>@ipanova, +1 to your names, I updated the epic.<br></div><div><br></div><div>FYI, I updated the epic in several ways to allow for the "cache_only" option in the design.</div><div><br></div><div>I added a new task to add "policy" also to ContentUnit so the streamer can know what to do: <a href="https://pulp.plan.io/issues/3763">https://pulp.plan.io/issues/3763</a></div><div><br></div><div>Other updates to allow for "cache_only":<br></div><div><a href="https://pulp.plan.io/issues/3695#note-2">https://pulp.plan.io/issues/3695#note-2</a></div><div><a href="https://pulp.plan.io/issues/3699#note-3">https://pulp.plan.io/issues/3699#note-3</a></div><div><a href="https://pulp.plan.io/issues/3693">https://pulp.plan.io/issues/3693</a></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jun 7, 2018 at 5:10 AM, Ina Panova <span dir="ltr"><<a href="mailto:ipanova@redhat.com" target="_blank">ipanova@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">we could try to go with:<span class=""><br><br><div>policy=immediate -> downloads now while the task runs
(no lazy). Also the default if unspecified.</div>
</span><div>policy=on_demand -> All the steps in the
diagram. Content that is downloaded is saved so that it's
only ever downloaded once.<br>
</div>
<div>policy=cache_only -> All the steps in the diagram
except step 14. If squid pushes the bits out of the cache,
it will be re-downloaded again to serve to other clients
requesting the same bits.</div></div><div class="gmail_extra"><br clear="all"><div><div class="m_-4554632907977466884gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><br><br>--------<br>Regards,<br><br>Ina Panova<br>Software Engineer| Pulp| Red Hat Inc.<br><br>"Do not go where the path may lead,<br> go instead where there is no path and leave a trail."<br></div></div></div><div><div class="h5">
<br><div class="gmail_quote">On Fri, Jun 1, 2018 at 12:36 AM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<br>
<div class="m_-4554632907977466884m_-146064033248386755moz-cite-prefix">On 05/31/2018 04:39 PM, Brian Bouterse
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>I updated the epic (<a href="https://pulp.plan.io/issues/3693" target="_blank">https://pulp.plan.io/issues/3<wbr>693</a>)
to use this new language.<br>
</div>
<div><br>
</div>
<div>
<div>policy=immediate -> downloads now while the task runs
(no lazy). Also the default if unspecified.</div>
<div>policy=cache-and-save -> All the steps in the
diagram. Content that is downloaded is saved so that it's
only ever downloaded once.<br>
</div>
<div>policy=cache -> All the steps in the diagram
except step 14. If squid pushes the bits out of the cache,
it will be re-downloaded again to serve to other clients
requesting the same bits.<br>
</div>
</div>
</div>
</blockquote>
<br></span>
These policy names strike me as an odd, non-intuitive mixture. I
think we need to brainstorm on policy names and/or additional
attributes to best capture this. Suggest the epic be updated to
describe the "modes" or use cases without the names for now. I'll
try to follow up with other suggestions.<div><div class="m_-4554632907977466884h5"><br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div><br>
</div>
Also @milan, see inline for answers to your question.</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, May 30, 2018 at 3:48 PM,
Milan Kovacik <span dir="ltr"><<a href="mailto:mkovacik@redhat.com" target="_blank">mkovacik@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="m_-4554632907977466884m_-146064033248386755gmail-">On
Wed, May 30, 2018 at 4:50 PM, Brian Bouterse <<a href="mailto:bbouters@redhat.com" target="_blank">bbouters@redhat.com</a>>
wrote:<br>
><br>
><br>
> On Wed, May 30, 2018 at 8:57 AM, Tom McKay <<a href="mailto:thomasmckay@redhat.com" target="_blank">thomasmckay@redhat.com</a>>
wrote:<br>
>><br>
>> I think there is a usecase for "proxy only"
like is being described here.<br>
>> Several years ago there was a project called
thumbslug[1] that was used in a<br>
>> version of katello instead of pulp. It's job
was to check entitlements and<br>
>> then proxy content from a cdn. The same
functionality could be implemented<br>
>> in pulp. (Perhaps it's even as simple as
telling squid not to cache anything<br>
>> so the content would never make it from cache
to pulp in current pulp-2.)<br>
><br>
><br>
> What would you call this policy?<br>
> policy=proxy?<br>
> policy=stream-dont-save?<br>
> policy=stream-no-save?<br>
><br>
> Are the names 'on-demand' and 'immediate' clear
enough? Are there better<br>
> names?<br>
>><br>
>><br>
>> Overall I'm +1 to the idea of an only-squid
version, if others think it<br>
>> would be useful.<br>
><br>
><br>
> I understand describing this as a "only-squid"
version, but for clarity, the<br>
> streamer would still be required because it is what
requests the bits with<br>
> the correctly configured downloader (certs, proxy,
etc). The streamer<br>
> streams the bits into squid which provides caching
and client multiplexing.<br>
<br>
</span>I have to admit it's just now I'm reading<br>
<a href="https://docs.pulpproject.org/dev-guide/design/deferred-download.html#apache-reverse-proxy" rel="noreferrer" target="_blank">https://docs.pulpproject.org/d<wbr>ev-guide/design/deferred-downl<wbr>oad.html#apache-reverse-proxy</a><br>
again because of the SSL termination. So the new plan is
to use the<br>
streamer to terminate the SSL instead of the Apache
reverse proxy?<br>
</blockquote>
<div><br>
</div>
<div>The plan for right now is to not use a reverse proxy
and have the client's connection terminate at squid
directly either via http or https depending on how squid
is configured. The Reverse proxy in pulp2's design served
to validate the signed urls and rewrite them for squid.
This first implementation won't use signed urls. I believe
that means we don't need a reverse proxy here yet.<br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
W/r the construction of the URL of an artifact, I thought
it would be<br>
stored in the DB, so the Remote would create it during the
sync.<br>
</blockquote>
<div><br>
</div>
<div>This is correct. The inbound URL from the client after
the redirect will still be a reference that the "Pulp
content app" will resolve to a RemoteArtifact. Then the
streamer will use that RemoteArtifact data to correctly
build the downloader. That's the gist of it at least.<br>
</div>
<div><br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<span class="m_-4554632907977466884m_-146064033248386755gmail-"><br>
><br>
> To confirm my understanding this "squid-only"
policy would be the same as<br>
> on-demand except that it would *not* perform step
14 from the diagram here<br>
> (<a href="https://pulp.plan.io/issues/3693" rel="noreferrer" target="_blank">https://pulp.plan.io/issues/3<wbr>693</a>).
Is that right?<br>
</span>yup<br>
<div class="m_-4554632907977466884m_-146064033248386755gmail-HOEnZb">
<div class="m_-4554632907977466884m_-146064033248386755gmail-h5">><br>
>><br>
>><br>
>> [1] <a href="https://github.com/candlepin/thumbslug" rel="noreferrer" target="_blank">https://github.com/candlepin/t<wbr>humbslug</a><br>
>><br>
>> On Wed, May 30, 2018 at 8:34 AM, Milan
Kovacik <<a href="mailto:mkovacik@redhat.com" target="_blank">mkovacik@redhat.com</a>><br>
>> wrote:<br>
>>><br>
>>> On Tue, May 29, 2018 at 9:31 PM, Dennis
Kliban <<a href="mailto:dkliban@redhat.com" target="_blank">dkliban@redhat.com</a>><br>
>>> wrote:<br>
>>> > On Tue, May 29, 2018 at 11:42 AM,
Milan Kovacik <<a href="mailto:mkovacik@redhat.com" target="_blank">mkovacik@redhat.com</a>><br>
>>> > wrote:<br>
>>> >><br>
>>> >> On Tue, May 29, 2018 at 5:13 PM,
Dennis Kliban <<a href="mailto:dkliban@redhat.com" target="_blank">dkliban@redhat.com</a>><br>
>>> >> wrote:<br>
>>> >> > On Tue, May 29, 2018 at
10:41 AM, Milan Kovacik<br>
>>> >> > <<a href="mailto:mkovacik@redhat.com" target="_blank">mkovacik@redhat.com</a>><br>
>>> >> > wrote:<br>
>>> >> >><br>
>>> >> >> Good point!<br>
>>> >> >> More the second; it
might be a bit crazy to utilize Squid for that<br>
>>> >> >> but<br>
>>> >> >> first, let's answer the
why ;)<br>
>>> >> >> So why does Pulp need
to store the content here?<br>
>>> >> >> Why don't we point the
users to the Squid all the time (for the<br>
>>> >> >> lazy<br>
>>> >> >> repos)?<br>
>>> >> ><br>
>>> >> ><br>
>>> >> > Pulp's Streamer needs to
fetch and store the content because that's<br>
>>> >> > Pulp's<br>
>>> >> > primary responsibility.<br>
>>> >><br>
>>> >> Maybe not that much the storing
but rather the content views<br>
>>> >> management?<br>
>>> >> I mean the partitioning into
repositories, promoting.<br>
>>> >><br>
>>> ><br>
>>> > Exactly this. We want Pulp users to
be able to reuse content that was<br>
>>> > brought in using the 'on_demand'
download policy in other repositories.<br>
>>> I see.<br>
>>><br>
>>> ><br>
>>> >><br>
>>> >> If some of the content lived in
Squid and some lived<br>
>>> >> > in Pulp, it would be
difficult for the user to know what content is<br>
>>> >> > actually<br>
>>> >> > available in Pulp and what
content needs to be fetched from a remote<br>
>>> >> > repository.<br>
>>> >><br>
>>> >> I'd say the rule of the thumb
would be: lazy -> squid, regular -> pulp<br>
>>> >> so not that difficult.<br>
>>> >> Maybe Pulp could have a concept
of Origin, where folks upload stuff to<br>
>>> >> a Pulp repo, vs. Proxy for it's
repo storage policy?<br>
>>> >><br>
>>> ><br>
>>> > Squid removes things from the cache
at some point. You can probably<br>
>>> > configure it to never remove
anything from the cache, but then we would<br>
>>> > need<br>
>>> > to implement orphan cleanup that
would work across two systems: pulp<br>
>>> > and<br>
>>> > squid.<br>
>>><br>
>>> Actually "remote" units wouldn't need
orphan cleaning from the disk,<br>
>>> just dropping them from the DB would
suffice.<br>
>>><br>
>>> ><br>
>>> > Answering that question would still
be difficult. Not all content that<br>
>>> > is in<br>
>>> > the repository that was synced using
on_demand download policy will be<br>
>>> > in<br>
>>> > Squid - only the content that has
been requested by clients. So it's<br>
>>> > still<br>
>>> > hard to know which of the content
units have been downloaded and which<br>
>>> > have<br>
>>> > not been.<br>
>>><br>
>>> But the beauty is exactly in that: we
don't have to track whether the<br>
>>> content is downloaded if it is
reverse-proxied[1][2].<br>
>>> Moreover, this would work both with and
without a proxy between Pulp<br>
>>> and the Origin of the remote unit.<br>
>>> A "remote" content artifact might just
need to carry it's URL in a DB<br>
>>> column for this to work; so the async
artifact model, instead of the<br>
>>> "policy=on-demand" would have a
mandatory remote "URL" attribute; I<br>
>>> wouldn't say it's more complex than
tracking the "policy" attribute.<br>
>>><br>
>>> ><br>
>>> ><br>
>>> >><br>
>>> >> ><br>
>>> >> > As Pulp downloads an
Artifact, it calculates all the checksums and<br>
>>> >> > it's<br>
>>> >> > size. It then performs
validation based on information that was<br>
>>> >> > provided<br>
>>> >> > from the RemoteArtifact.
After validation is performed, the<br>
>>> >> > Artifact, is<br>
>>> >> > saved to the database and
it's final place in<br>
>>> >> >
/var/lib/content/artifacts/.<br>
>>> >><br>
>>> >> This could be still achieved by
storing the content just temporarily<br>
>>> >> in the Squid proxy i.e use Squid
as the content source, not the disk.<br>
>>> >><br>
>>> >> > Once this information is in
the database, Pulp's web server can<br>
>>> >> > serve<br>
>>> >> > the<br>
>>> >> > content without having to
involve the Streamer or Squid.<br>
>>> >><br>
>>> >> Pulp might serve just the API
and the metadata, the content might be<br>
>>> >> redirected to the Proxy all the
time, correct?<br>
>>> >> Doesn't Crane do that btw?<br>
>>> ><br>
>>> ><br>
>>> > Theoretically we could do this, but
in practice we would run into<br>
>>> > problems<br>
>>> > when we needed to scale out the
Content app. Right now when the Content<br>
>>> > app<br>
>>> > needs to be scaled, a user can
launch another machine that will run the<br>
>>> > Content app. Squid does not support
that kind of scaling. Squid can<br>
>>> > only<br>
>>> > take advantage of additional cores
in a single machine<br>
>>><br>
>>> I don't think I understand; proxies are
actually designed to scale[1]<br>
>>> and are used as tools to scale the web
too.<br>
>>><br>
>>> This is all about the How question but
when it comes to my original<br>
>>> Why, please correct me if I'm being
wrong, the answer so far has been:<br>
>>> Pulp always downloads the content
because that's what it is supposed to<br>
>>> do.<br>
>>><br>
>>> Cheers,<br>
>>> milan<br>
>>><br>
>>> [1] <a href="https://en.wikipedia.org/wiki/Reverse_proxy" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/<wbr>Reverse_proxy</a><br>
>>> [2] <a href="https://paste.fedoraproject.org/paste/zkBTyxZjm330FsqvPP0lIA" rel="noreferrer" target="_blank">https://paste.fedoraproject.or<wbr>g/paste/zkBTyxZjm330FsqvPP0lIA</a><br>
>>> [3]<br>
>>> <a href="https://wiki.squid-cache.org/Features/CacheHierarchy?highlight=%28faqlisted.yes%29" rel="noreferrer" target="_blank">https://wiki.squid-cache.org/F<wbr>eatures/CacheHierarchy?highlig<wbr>ht=%28faqlisted.yes%29</a><br>
>>><br>
>>> ><br>
>>> >><br>
>>> >><br>
>>> >> Cheers,<br>
>>> >> milan<br>
>>> >><br>
>>> >> ><br>
>>> >> > -dennis<br>
>>> >> ><br>
>>> >> ><br>
>>> >> ><br>
>>> >> ><br>
>>> >> ><br>
>>> >> >><br>
>>> >> >><br>
>>> >> >> --<br>
>>> >> >> cheers<br>
>>> >> >> milan<br>
>>> >> >><br>
>>> >> >> On Tue, May 29, 2018 at
4:25 PM, Brian Bouterse<br>
>>> >> >> <<a href="mailto:bbouters@redhat.com" target="_blank">bbouters@redhat.com</a>><br>
>>> >> >> wrote:<br>
>>> >> >> ><br>
>>> >> >> > On Mon, May 28,
2018 at 9:57 AM, Milan Kovacik<br>
>>> >> >> > <<a href="mailto:mkovacik@redhat.com" target="_blank">mkovacik@redhat.com</a>><br>
>>> >> >> > wrote:<br>
>>> >> >> >><br>
>>> >> >> >> Hi,<br>
>>> >> >> >><br>
>>> >> >> >> Looking at the
diagram[1] I'm wondering what's the reasoning<br>
>>> >> >> >> behind<br>
>>> >> >> >> Pulp having to
actually fetch the content locally?<br>
>>> >> >> ><br>
>>> >> >> ><br>
>>> >> >> > Is the question
"why is Pulp doing the fetching and not Squid?"<br>
>>> >> >> > or<br>
>>> >> >> > "why<br>
>>> >> >> > is<br>
>>> >> >> > Pulp storing the
content after fetching it?" or both?<br>
>>> >> >> ><br>
>>> >> >> >> Couldn't Pulp
just rely on the proxy with regards to the content<br>
>>> >> >> >> streaming?<br>
>>> >> >> >><br>
>>> >> >> >> Thanks,<br>
>>> >> >> >> milan<br>
>>> >> >> >><br>
>>> >> >> >><br>
>>> >> >> >> [1] <a href="https://pulp.plan.io/attachments/130957" rel="noreferrer" target="_blank">https://pulp.plan.io/attachmen<wbr>ts/130957</a><br>
>>> >> >> >><br>
>>> >> >> >> On Fri, May
25, 2018 at 9:11 PM, Brian Bouterse<br>
>>> >> >> >> <<a href="mailto:bbouters@redhat.com" target="_blank">bbouters@redhat.com</a>><br>
>>> >> >> >> wrote:<br>
>>> >> >> >> > A
mini-team of core devs** met to talk through lazy use
cases<br>
>>> >> >> >> > for<br>
>>> >> >> >> > Pulp3.<br>
>>> >> >> >> > It's
effectively the same lazy from Pulp2 except:<br>
>>> >> >> >> ><br>
>>> >> >> >> > * it's
now built into core (not just RPM)<br>
>>> >> >> >> > * It
disincludes repo protection use cases because we
haven't<br>
>>> >> >> >> > added<br>
>>> >> >> >> > repo<br>
>>> >> >> >> >
protection to Pulp3 yet<br>
>>> >> >> >> > * It
disincludes the "background" policy which based on<br>
>>> >> >> >> > feedback<br>
>>> >> >> >> > from<br>
>>> >> >> >> >
stakeholders provided very little value<br>
>>> >> >> >> > * it will
no longer will depend on Twisted as a dependency. It<br>
>>> >> >> >> > will<br>
>>> >> >> >> > use<br>
>>> >> >> >> > asyncio
instead.<br>
>>> >> >> >> ><br>
>>> >> >> >> > While it
is being built into core, it will require minimal<br>
>>> >> >> >> > support<br>
>>> >> >> >> > by<br>
>>> >> >> >> > a<br>
>>> >> >> >> > plugin
writer to add support for it. Details in the epic<br>
>>> >> >> >> > below.<br>
>>> >> >> >> ><br>
>>> >> >> >> > The
current use cases along with a technical plan are
written<br>
>>> >> >> >> > on<br>
>>> >> >> >> > this<br>
>>> >> >> >> > epic:<br>
>>> >> >> >> > <a href="https://pulp.plan.io/issues/3693" rel="noreferrer" target="_blank">https://pulp.plan.io/issues/36<wbr>93</a><br>
>>> >> >> >> ><br>
>>> >> >> >> > We're
putting it out for comment, questions, and feedback<br>
>>> >> >> >> > before<br>
>>> >> >> >> > we<br>
>>> >> >> >> > start<br>
>>> >> >> >> > into the
code. I hope we are able to add this into our next<br>
>>> >> >> >> > sprint.<br>
>>> >> >> >> ><br>
>>> >> >> >> > **
ipanova, jortel, ttereshc, dkliban, bmbouter<br>
>>> >> >> >> ><br>
>>> >> >> >> > Thanks!<br>
>>> >> >> >> > Brian<br>
>>> >> >> >> ><br>
>>> >> >> >> ><br>
>>> >> >> >> >
______________________________<wbr>_________________<br>
>>> >> >> >> > Pulp-dev
mailing list<br>
>>> >> >> >> > <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
>>> >> >> >> > <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
>>> >> >> >> ><br>
>>> >> >> ><br>
>>> >> >> ><br>
>>> >> >><br>
>>> >> >>
______________________________<wbr>_________________<br>
>>> >> >> Pulp-dev mailing list<br>
>>> >> >> <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
>>> >> >> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
>>> >> ><br>
>>> >> ><br>
>>> ><br>
>>> ><br>
>>><br>
>>> ______________________________<wbr>_________________<br>
>>> Pulp-dev mailing list<br>
>>> <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
>>> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
>><br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> Pulp-dev mailing list<br>
>> <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
>> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
>><br>
><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="m_-4554632907977466884m_-146064033248386755mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Pulp-dev mailing list
<a class="m_-4554632907977466884m_-146064033248386755moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a>
<a class="m_-4554632907977466884m_-146064033248386755moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a>
</pre>
</blockquote>
<br>
</div></div></div>
<br>______________________________<wbr>_________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
<br></blockquote></div><br></div></div></div>
<br>______________________________<wbr>_________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/<wbr>mailman/listinfo/pulp-dev</a><br>
<br></blockquote></div><br></div>