<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 04/12/2018 10:01 AM, Brian Bouterse
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAAcvrTFm6Eugp7Jb_d0yNKZdk=yUiddjvgUWPmAetRPpzeDSFQ@mail.gmail.com">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Apr 11, 2018 at 6:07 PM, Jeff
Ortel <span dir="ltr"><<a
href="mailto:jortel@redhat.com" target="_blank"
moz-do-not-send="true">jortel@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span> <br>
<br>
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278moz-cite-prefix">On
04/11/2018 03:29 PM, Brian Bouterse wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I think we should look into this in
the near-term. Changing an interface on an object
used by all plugins will be significantly easier,
earlier.<br>
<br>
<div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Apr 11,
2018 at 12:25 PM, Jeff Ortel <span
dir="ltr"><<a
href="mailto:jortel@redhat.com"
target="_blank"
moz-do-not-send="true">jortel@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<br>
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045moz-cite-prefix">On
04/11/2018 10:59 AM, Brian
Bouterse wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On
Tue, Apr 10, 2018 at 10:43
AM, Jeff Ortel <span
dir="ltr"><<a
href="mailto:jortel@redhat.com"
target="_blank"
moz-do-not-send="true">jortel@redhat.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
<div text="#000000"
bgcolor="#FFFFFF">
<br>
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container"><br>
<br>
<table
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-email-headers-table"
border="0"
height="86"
width="380"
cellspacing="0"
cellpadding="0">
<tbody>
<tr>
<th
valign="BASELINE"
align="RIGHT"
nowrap="nowrap"><br>
</th>
<td><br>
</td>
</tr>
<tr>
<th
valign="BASELINE"
align="RIGHT"
nowrap="nowrap"><br>
</th>
<td><br>
</td>
</tr>
<tr>
<th
valign="BASELINE"
align="RIGHT"
nowrap="nowrap"><br>
</th>
<td><br>
</td>
</tr>
<tr>
<th
valign="BASELINE"
align="RIGHT"
nowrap="nowrap"><br>
</th>
<td><br>
</td>
</tr>
</tbody>
</table>
<br>
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-cite-prefix">On
04/06/2018
09:15 AM,
Brian Bouterse
wrote:<br>
</div>
<blockquote
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>Several
plugins have
started using
the Changesets
including
pulp_ansible,
pulp_python,
pulp_file, and
perhaps
others. The
Changesets
provide
several
distinct
points of
value which
are great, but
there are two
challenges I
want to bring
up. I want to
focus only on
the problem
statements
first.<br>
<br>
</div>
1. There is
redundant
"differencing"
code in all
plugins. The
Changeset
interface
requires the
plugin writer
to determine
what units
need to be
added and
those to be
removed. This
requires all
plugin writers
to write the
same
non-trivial
differencing
code over and
over. For
example, you
can see the
same
non-trivial
differencing
code present
in <a
href="https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306"
target="_blank" moz-do-not-send="true">pulp_ansible</a>, <a
href="https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193"
target="_blank" moz-do-not-send="true">pulp_file</a>, and <a
href="https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223"
target="_blank" moz-do-not-send="true">pulp_python</a>. Line-wise, this
"differencing"
code makes up
a large
portion (maybe
50%) of the
sync code
itself in each
plugin.<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Ten lines of
trivial set
logic hardly
seems like a big
deal but any
duplication is
worth exploring.
<br>
</div>
</div>
</div>
</div>
</blockquote>
<div>It's more than ten
lines. Take pulp_ansible
for example. By my count
(the linked to section)
it's 89 lines, which out
of 306 lines of plugin
code for sync is 29% of
extra redundant code.
The other plugins have
similar numbers. So with
those numbers in mind,
what do you think?<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> I was counting the lines
(w/o comments) in find_delta() based
on the linked code. Which functions
are you counting?<span><br>
</span></div>
</blockquote>
<div><br>
</div>
<div>I was counting the find_delta,
build_additions, and build_removals
methods. Regardless of how the lines
are counted, that differencing code is
the duplication I'm talking about.
There isn't a way to use the
changesets without duplicating that
differencing code in a plugin.<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> The differencing code is limited to find_delta()
and perhaps build_removals(). Agreed, the line count is
less useful than specifically identifying duplicate
code. Outside of find_delta(), I see similar code (in
part because it got copied from file plugin) but not
seeing actual duplication. Can you be more specific?<span><br>
</span></div>
</blockquote>
<div><br>
</div>
<div>Very similar code or identical code, I think it begs
the question why are we having plugin writer's do this at
all? What value are they creating with it? I don't have a
reasonable answer to that question, so the requirement for
plugin writer's to write that code brings me back to the
problem statement: "plugin writers have redundant
differencing code when using Changesets". More info on why
it is valuable for the plugin writer to do the
differencing code versus the Changesets would be helpful.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
The ChangeSet abstraction (and API) is based on following division
of responsibility:<br>
<br>
The plugin (with an understanding of the remote and its content):<br>
- Download metadata.<br>
- Parse metadata<br>
- Based on the metadata:<br>
- determine content to be added to the repository.<br>
- define how artifacts are downloaded.<br>
- construct content<br>
- determine content to be removed to the repository.<br>
<br>
Core (without understand of specific remote or its content):<br>
- Provide low level API for plugin to affect the changes it has
determined need to be made to the repository. This is downloaders,
models etc.<br>
- Provide high(er) level API for plugin to affect the changes it
has determined need to be made to the repository. This is the
ChangeSet.<br>
<br>
Are you proposing that this is not the correct division?<br>
<blockquote type="cite"
cite="mid:CAAcvrTFm6Eugp7Jb_d0yNKZdk=yUiddjvgUWPmAetRPpzeDSFQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span> <br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
So a shorter, simpler problem
statement is: "to use the changesets
plugin writers have to do extra work
to compute additions and removals
parameters".<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> This statement ^ is better but still too vague
to actually solve. Can we elaborate on specifically
what "to do extra work" means?<span><br>
</span></div>
</blockquote>
<div><br>
</div>
<div>Sure. Removing that vague language is one way to
resolve its vagueness. Here's a revised problem statement:
<span
class="m_-2307966803311662826m_-4677839867557154933gmail-">"to
use the changesets plugin writers have to compute
additions and removals parameters". This problem
statement would be resolved by a solution that causes
the plugin writer to never have to produce these
parameters and be replaced by an interface that would
require less effort from a plugin writer.</span><br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
I think it's the plugin's responsibility to determine the
difference. Aside from that: without an understanding of the
metadata and content type, how could the ChangeSet do this? What
might that looks like?<br>
<br>
<blockquote type="cite"
cite="mid:CAAcvrTFm6Eugp7Jb_d0yNKZdk=yUiddjvgUWPmAetRPpzeDSFQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span> <br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
<div text="#000000"
bgcolor="#FFFFFF">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
<br>
<blockquote
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div><br>
</div>
2. Plugins
can't do
end-to-end
stream
processing.
The Changesets
themselves do
stream
processing,
but when you
call into
changeset.apply_and_drain()
you have to
have fully
parsed the
metadata
already.
Currently when
fetching all
metadata from
Galaxy,
pulp_ansible
takes about
380 seconds
(6+ min). This
means that the
actual
Changeset
content
downloading
starts 380
seconds later
than it could.
At the heart
of the
problem, the
fetching+parsing
of the
metadata is
not part of
the stream
processing.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
The
additions/removals
can be any
interable (like
generator) and
by using
ChangeSet.apply()
and iterating
the returned
object, the
pluign can "turn
the crank" while
downloading and
processing the
metadata. The
ChangeSet.apply_and_drain()
is just a
convenience
method. I don't
see how this is
a limitation of
the ChangeSet.<br>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>That is new info for
me (and maybe everyone).
OK so Changesets have
two interfaces. apply()
and apply_and_drain().
Why do we have two
interfaces when apply()
can support all existing
use cases (that I know
of) and do end-to-end
stream processing but
apply_and_drain()
cannot? I see all of our
examples (and all of our
new plugins) using
apply_and_drain().<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> The ChangeSet.apply() was
how I designed (and documented) it.
Not sure when/who added the
apply_and_drain(). +1 for removing
it.<span><br>
</span></div>
</blockquote>
<div><br>
</div>
<div>I read through the changeset docs.
I think this stream processing thing
is still a problem but perhaps in how
we're presenting the Changeset with
it's arguments. I don't think apply()
versus apply_and_drain() are at all
related. Regardless of if you are
using apply() or apply_and_drain(),
the Changeset requires an 'additions'
and 'removals' arguments. This sends a
clear message to the plugin writer
that they need to compute additions
and removals. They will fetch the
metadata to compute these which is
mostly how the changeset documentation
reads. To know that they could present
a generator that would correctly allow
the metdata from inside the Changeset
is I feel as non-obvious. I want the
high-performing implementation to be
the obvious one.<br>
<br>
So what about a problem statement like
this: "Changesets are presented such
that when you call into them you
should already have fetched the
metadata"?<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> I'm not sure what is meant by "presented". If
this means that we should provide an example of how the
ChangeSet can be used by plugins (with large metadata)
in such a way that does not require downloading all the
metadata first - that sounds like a good idea. <br>
</div>
</blockquote>
<div><br>
</div>
<div>Cool so this is transitioning to ideas for resolution.
The solution to add documentation on how to do this with
the existing interface is one option. My concern with
adding additional docs on how to use the current interface
better is that if users choose to follow the existing docs
then they will have the stream processing problem once
again. To me, this suggests that this new example should
actually replace the existing documentation.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Seems like both example would be useful. I'm not convinced that all
plugins would benefit from this. For example: the File plugin
manifest is small and would likely not benefit from the extra
complexity. For complicated plugins (like RPM), can differencing
decision be made before analyzing the entire metadata (eg:
primary.xml)? Also, it's not clear to me how this would work using
the Downloader. Are you suggesting that the plugin would
parse/process metadata files while they're being downloaded?
Perhaps a better understanding of the flow to be supported would
help me understand this.<br>
<br>
<blockquote type="cite"
cite="mid:CAAcvrTFm6Eugp7Jb_d0yNKZdk=yUiddjvgUWPmAetRPpzeDSFQ@mail.gmail.com">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span> <br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px
#ccc
solid;padding-left:1ex">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
<div text="#000000"
bgcolor="#FFFFFF">
<div
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
<br>
<blockquote
type="cite">
<div dir="ltr">
<div>
<div>
<div><br>
</div>
Do you see the
same
challenges I
do? Are these
the right
problem
statements? I
think with
clear problem
statements a
solution will
be easy to see
and agree on.<br>
</div>
</div>
</div>
</blockquote>
<br>
I'm not
convinced that
these are actual
problems/challenges that need to be addressed in the near term.<br>
<br>
<blockquote
type="cite">
<div dir="ltr">
<div>
<div><br>
</div>
Thanks!<br>
</div>
Brian<br>
</div>
<br>
<fieldset
class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Pulp-dev mailing list
<a class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" target="_blank" moz-do-not-send="true">Pulp-dev@redhat.com</a>
<a class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" target="_blank" moz-do-not-send="true">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a>
</pre>
</blockquote>
<br>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
Pulp-dev mailing list<br>
<a
href="mailto:Pulp-dev@redhat.com"
target="_blank"
moz-do-not-send="true">Pulp-dev@redhat.com</a><br>
<a
href="https://www.redhat.com/mailman/listinfo/pulp-dev"
rel="noreferrer"
target="_blank"
moz-do-not-send="true">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>