<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 11, 2018 at 6:07 PM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<br>
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278moz-cite-prefix">On 04/11/2018 03:29 PM, Brian Bouterse
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">I think we should look into this in the near-term.
Changing an interface on an object used by all plugins will be
significantly easier, earlier.<br>
<br>
<div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Apr 11, 2018 at 12:25
PM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span> <br>
<br>
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045moz-cite-prefix">On
04/11/2018 10:59 AM, Brian Bouterse wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Apr 10,
2018 at 10:43 AM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
<div text="#000000" bgcolor="#FFFFFF"> <br>
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container"><br>
<br>
<table class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-email-headers-table" border="0" width="380" cellspacing="0" cellpadding="0" height="86">
<tbody>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><br>
</th>
<td><br>
</td>
</tr>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><br>
</th>
<td><br>
</td>
</tr>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><br>
</th>
<td><br>
</td>
</tr>
<tr>
<th align="RIGHT" valign="BASELINE" nowrap><br>
</th>
<td><br>
</td>
</tr>
</tbody>
</table>
<br>
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-cite-prefix">On
04/06/2018 09:15 AM, Brian
Bouterse wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>Several
plugins have
started using
the Changesets
including
pulp_ansible,
pulp_python,
pulp_file, and
perhaps others.
The Changesets
provide several
distinct points
of value which
are great, but
there are two
challenges I
want to bring
up. I want to
focus only on
the problem
statements
first.<br>
<br>
</div>
1. There is
redundant
"differencing"
code in all
plugins. The
Changeset
interface requires
the plugin writer
to determine what
units need to be
added and those to
be removed. This
requires all
plugin writers to
write the same
non-trivial
differencing code
over and over. For
example, you can
see the same
non-trivial
differencing code
present in <a href="https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306" target="_blank">pulp_ansible</a>, <a href="https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193" target="_blank">pulp_file</a>, and <a href="https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223" target="_blank">pulp_python</a>. Line-wise, this "differencing"
code makes up a
large portion
(maybe 50%) of the
sync code itself
in each plugin.<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Ten lines of trivial set logic
hardly seems like a big deal
but any duplication is worth
exploring. <br>
</div>
</div>
</div>
</div>
</blockquote>
<div>It's more than ten lines. Take
pulp_ansible for example. By my count
(the linked to section) it's 89 lines,
which out of 306 lines of plugin code
for sync is 29% of extra redundant
code. The other plugins have similar
numbers. So with those numbers in
mind, what do you think?<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> I was counting the lines (w/o comments) in
find_delta() based on the linked code. Which
functions are you counting?<span><br>
</span></div>
</blockquote>
<div><br>
</div>
<div>I was counting the find_delta, build_additions,
and build_removals methods. Regardless of how the
lines are counted, that differencing code is the
duplication I'm talking about. There isn't a way to
use the changesets without duplicating that
differencing code in a plugin.<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br></span>
The differencing code is limited to find_delta() and perhaps
build_removals(). Agreed, the line count is less useful than
specifically identifying duplicate code. Outside of find_delta(), I
see similar code (in part because it got copied from file plugin)
but not seeing actual duplication. Can you be more specific?<span><br></span></div></blockquote><div><br></div><div>Very similar code or identical code, I think it begs the question why are we having plugin writer's do this at all? What value are they creating with it? I don't have a reasonable answer to that question, so the requirement for plugin writer's to write that code brings me back to the problem statement: "plugin writers have redundant differencing code when using Changesets". More info on why it is valuable for the plugin writer to do the differencing code versus the Changesets would be helpful.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
So a shorter, simpler problem statement is: "to use
the changesets plugin writers have to do extra work
to compute additions and removals parameters".<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br></span>
This statement ^ is better but still too vague to actually solve.
Can we elaborate on specifically what "to do extra work" means?<span><br></span></div></blockquote><div><br></div><div>Sure. Removing that vague language is one way to resolve its vagueness. Here's a revised problem statement: <span class="m_-2307966803311662826m_-4677839867557154933gmail-">"to use
the changesets plugin writers have to compute additions and removals parameters". This problem statement would be resolved by a solution that causes the plugin writer to never have to produce these parameters and be replaced by an interface that would require less effort from a plugin writer.</span><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
<div text="#000000" bgcolor="#FFFFFF">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div><br>
</div>
2. Plugins can't do
end-to-end stream
processing. The
Changesets
themselves do stream
processing, but when
you call into
changeset.apply_and_drain()
you have to have
fully parsed the
metadata already.
Currently when
fetching all
metadata from
Galaxy, pulp_ansible
takes about 380
seconds (6+ min).
This means that the
actual Changeset
content downloading
starts 380 seconds
later than it could.
At the heart of the
problem, the
fetching+parsing of
the metadata is not
part of the stream
processing.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
The additions/removals can be
any interable (like generator)
and by using ChangeSet.apply()
and iterating the returned
object, the pluign can "turn
the crank" while downloading
and processing the metadata.
The
ChangeSet.apply_and_drain() is
just a convenience method. I
don't see how this is a
limitation of the ChangeSet.<br>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
<div>That is new info for me (and maybe
everyone). OK so Changesets have two
interfaces. apply() and
apply_and_drain(). Why do we have two
interfaces when apply() can support
all existing use cases (that I know
of) and do end-to-end stream
processing but apply_and_drain()
cannot? I see all of our examples (and
all of our new plugins) using
apply_and_drain().<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span> The ChangeSet.apply() was how I designed
(and documented) it. Not sure when/who added the
apply_and_drain(). +1 for removing it.<span><br>
</span></div>
</blockquote>
<div><br>
</div>
<div>I read through the changeset docs. I think this
stream processing thing is still a problem but
perhaps in how we're presenting the Changeset with
it's arguments. I don't think apply() versus
apply_and_drain() are at all related. Regardless of
if you are using apply() or apply_and_drain(), the
Changeset requires an 'additions' and 'removals'
arguments. This sends a clear message to the plugin
writer that they need to compute additions and
removals. They will fetch the metadata to compute
these which is mostly how the changeset
documentation reads. To know that they could present
a generator that would correctly allow the metdata
from inside the Changeset is I feel as non-obvious.
I want the high-performing implementation to be the
obvious one.<br>
<br>
So what about a problem statement like this:
"Changesets are presented such that when you call
into them you should already have fetched the
metadata"?<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br></span>
I'm not sure what is meant by "presented". If this means that we
should provide an example of how the ChangeSet can be used by
plugins (with large metadata) in such a way that does not require
downloading all the metadata first - that sounds like a good idea. <br></div></blockquote><div><br></div><div>Cool so this is transitioning to ideas for resolution. The solution to add documentation on how to do this with the existing interface is one option. My concern with adding additional docs on how to use the current interface better is that if users choose to follow the existing docs then they will have the stream processing problem once again. To me, this suggests that this new example should actually replace the existing documentation.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div class="gmail_extra">
<div class="gmail_quote">
<div> <br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"><span> <br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
<div text="#000000" bgcolor="#FFFFFF">
<div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>
<div><br>
</div>
Do you see the same
challenges I do? Are
these the right
problem statements? I
think with clear
problem statements a
solution will be easy
to see and agree on.<br>
</div>
</div>
</div>
</blockquote>
<br>
I'm not convinced that these
are actual problems/challenges
that need to be addressed in
the near term.<br>
<br>
<blockquote type="cite">
<div dir="ltr">
<div>
<div><br>
</div>
Thanks!<br>
</div>
Brian<br>
</div>
<br>
<fieldset class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587mimeAttachmentHeader"></fieldset>
<br>
<pre>______________________________<wbr>_________________
Pulp-dev mailing list
<a class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a>
<a class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a>
</pre>
</blockquote>
<br>
</div>
</div>
</div>
</div>
<br>
______________________________<wbr>_________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</span></div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</span></div>
</blockquote></div><br></div></div>