<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 11, 2018 at 6:07 PM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><span>
    <br>
    <br>
    <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278moz-cite-prefix">On 04/11/2018 03:29 PM, Brian Bouterse
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">I think we should look into this in the near-term.
        Changing an interface on an object used by all plugins will be
        significantly easier, earlier.<br>
        <br>
        <div>
          <div>
            <div>
              <div class="gmail_extra"><br>
                <div class="gmail_quote">On Wed, Apr 11, 2018 at 12:25
                  PM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span>
                  wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div text="#000000" bgcolor="#FFFFFF"><span> <br>
                        <br>
                        <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045moz-cite-prefix">On
                          04/11/2018 10:59 AM, Brian Bouterse wrote:<br>
                        </div>
                        <blockquote type="cite">
                          <div dir="ltr"><br>
                            <div class="gmail_extra"><br>
                              <div class="gmail_quote">On Tue, Apr 10,
                                2018 at 10:43 AM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span>
                                wrote:<br>
                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                  <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
                                    <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
                                      <div text="#000000" bgcolor="#FFFFFF"> <br>
                                        <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container"><br>
                                          <br>
                                          <table class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-email-headers-table" border="0" width="380" cellspacing="0" cellpadding="0" height="86">
                                            <tbody>
                                              <tr>
                                                <th align="RIGHT" valign="BASELINE" nowrap><br>
                                                </th>
                                                <td><br>
                                                </td>
                                              </tr>
                                              <tr>
                                                <th align="RIGHT" valign="BASELINE" nowrap><br>
                                                </th>
                                                <td><br>
                                                </td>
                                              </tr>
                                              <tr>
                                                <th align="RIGHT" valign="BASELINE" nowrap><br>
                                                </th>
                                                <td><br>
                                                </td>
                                              </tr>
                                              <tr>
                                                <th align="RIGHT" valign="BASELINE" nowrap><br>
                                                </th>
                                                <td><br>
                                                </td>
                                              </tr>
                                            </tbody>
                                          </table>
                                          <br>
                                          <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-cite-prefix">On
                                            04/06/2018 09:15 AM, Brian
                                            Bouterse wrote:<br>
                                          </div>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div>
                                                <div>
                                                  <div>
                                                    <div>
                                                      <div>Several
                                                        plugins have
                                                        started using
                                                        the Changesets
                                                        including
                                                        pulp_ansible,
                                                        pulp_python,
                                                        pulp_file, and
                                                        perhaps others.
                                                        The Changesets
                                                        provide several
                                                        distinct points
                                                        of value which
                                                        are great, but
                                                        there are two
                                                        challenges I
                                                        want to bring
                                                        up. I want to
                                                        focus only on
                                                        the problem
                                                        statements
                                                        first.<br>
                                                        <br>
                                                      </div>
                                                      1. There is
                                                      redundant
                                                      "differencing"
                                                      code in all
                                                      plugins. The
                                                      Changeset
                                                      interface requires
                                                      the plugin writer
                                                      to determine what
                                                      units need to be
                                                      added and those to
                                                      be removed. This
                                                      requires all
                                                      plugin writers to
                                                      write the same
                                                      non-trivial
                                                      differencing code
                                                      over and over. For
                                                      example, you can
                                                      see the same
                                                      non-trivial
                                                      differencing code
                                                      present in <a href="https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306" target="_blank">pulp_ansible</a>, <a href="https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193" target="_blank">pulp_file</a>, and <a href="https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223" target="_blank">pulp_python</a>. Line-wise, this "differencing"
                                                      code makes up a
                                                      large portion
                                                      (maybe 50%) of the
                                                      sync code itself
                                                      in each plugin.<br>
                                                    </div>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <br>
                                          Ten lines of trivial set logic
                                          hardly seems like a big deal
                                          but any duplication is worth
                                          exploring. <br>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <div>It's more than ten lines. Take
                                  pulp_ansible for example. By my count
                                  (the linked to section) it's 89 lines,
                                  which out of 306 lines of plugin code
                                  for sync is 29% of extra redundant
                                  code. The other plugins have similar
                                  numbers. So with those numbers in
                                  mind, what do you think?<br>
                                </div>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <br>
                      </span> I was counting the lines (w/o comments) in
                      find_delta() based on the linked code.  Which
                      functions are you counting?<span><br>
                      </span></div>
                  </blockquote>
                  <div><br>
                  </div>
                  <div>I was counting the find_delta, build_additions,
                    and build_removals methods. Regardless of how the
                    lines are counted, that differencing code is the
                    duplication I'm talking about. There isn't a way to
                    use the changesets without duplicating that
                    differencing code in a plugin.<br>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    The differencing code is limited to find_delta() and perhaps
    build_removals().  Agreed, the line count is less useful than
    specifically identifying duplicate code.  Outside of find_delta(), I
    see similar code (in part because it got copied from file plugin)
    but not seeing actual duplication.  Can you be more specific?<span><br></span></div></blockquote><div><br></div><div>Very similar code or identical code, I think it begs the question why are we having plugin writer's do this at all? What value are they creating with it? I don't have a reasonable answer to that question, so the requirement for plugin writer's to write that code brings me back to the problem statement: "plugin writers have redundant differencing code when using Changesets". More info on why it is valuable for the plugin writer to do the differencing code versus the Changesets would be helpful.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div><br>
                    So a shorter, simpler problem statement is: "to use
                    the changesets plugin writers have to do extra work
                    to compute additions and removals parameters".<br>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    This statement ^ is better but still too vague to actually solve. 
    Can we elaborate on specifically what "to do extra work" means?<span><br></span></div></blockquote><div><br></div><div>Sure. Removing that vague language is one way to resolve its vagueness. Here's a revised problem statement: <span class="m_-2307966803311662826m_-4677839867557154933gmail-">"to use
                    the changesets plugin writers have to compute additions and removals parameters". This problem statement would be resolved by a solution that causes the plugin writer to never have to produce these parameters and be replaced by an interface that would require less effort from a plugin writer.</span><br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div> </div>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div text="#000000" bgcolor="#FFFFFF"><span> <br>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_extra">
                              <div class="gmail_quote">
                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                  <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
                                    <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
                                      <div text="#000000" bgcolor="#FFFFFF">
                                        <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
                                          <br>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div>
                                                <div>
                                                  <div>
                                                    <div><br>
                                                    </div>
                                                    2. Plugins can't do
                                                    end-to-end stream
                                                    processing. The
                                                    Changesets
                                                    themselves do stream
                                                    processing, but when
                                                    you call into
                                                    changeset.apply_and_drain()
                                                    you have to have
                                                    fully parsed the
                                                    metadata already.
                                                    Currently when
                                                    fetching all
                                                    metadata from
                                                    Galaxy, pulp_ansible
                                                    takes about 380
                                                    seconds (6+ min).
                                                    This means that the
                                                    actual Changeset
                                                    content downloading
                                                    starts 380 seconds
                                                    later than it could.
                                                    At the heart of the
                                                    problem, the
                                                    fetching+parsing of
                                                    the metadata is not
                                                    part of the stream
                                                    processing.<br>
                                                  </div>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <br>
                                          The additions/removals can be
                                          any interable (like generator)
                                          and by using ChangeSet.apply()
                                          and iterating the returned
                                          object, the pluign can "turn
                                          the crank" while downloading
                                          and processing the metadata. 
                                          The
                                          ChangeSet.apply_and_drain() is
                                          just a convenience method.  I
                                          don't see how this is a
                                          limitation of the ChangeSet.<br>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <div><br>
                                </div>
                                <div>That is new info for me (and maybe
                                  everyone). OK so Changesets have two
                                  interfaces. apply() and
                                  apply_and_drain(). Why do we have two
                                  interfaces when apply() can support
                                  all existing use cases (that I know
                                  of) and do end-to-end stream
                                  processing but apply_and_drain()
                                  cannot? I see all of our examples (and
                                  all of our new plugins) using
                                  apply_and_drain().<br>
                                </div>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                        <br>
                      </span> The ChangeSet.apply() was how I designed
                      (and documented) it.  Not sure when/who added the
                      apply_and_drain().  +1 for removing it.<span><br>
                      </span></div>
                  </blockquote>
                  <div><br>
                  </div>
                  <div>I read through the changeset docs. I think this
                    stream processing thing is still a problem but
                    perhaps in how we're presenting the Changeset with
                    it's arguments. I don't think apply() versus
                    apply_and_drain() are at all related. Regardless of
                    if you are using apply() or apply_and_drain(), the
                    Changeset requires an 'additions' and 'removals'
                    arguments. This sends a clear message to the plugin
                    writer that they need to compute additions and
                    removals. They will fetch the metadata to compute
                    these which is mostly how the changeset
                    documentation reads. To know that they could present
                    a generator that would correctly allow the metdata
                    from inside the Changeset is I feel as non-obvious.
                    I want the high-performing implementation to be the
                    obvious one.<br>
                    <br>
                    So what about a problem statement like this:
                    "Changesets are presented such that when you call
                    into them you should already have fetched the
                    metadata"?<br>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    I'm not sure what is meant by "presented".  If this means that we
    should provide an example of how the ChangeSet can be used by
    plugins (with large metadata) in such a way that does not require
    downloading all the metadata first - that sounds like a good idea. <br></div></blockquote><div><br></div><div>Cool so this is transitioning to ideas for resolution. The solution to add documentation on how to do this with the existing interface is one option. My concern with adding additional docs on how to use the current interface better is that if users choose to follow the existing docs then they will have the stream processing problem once again. To me, this suggests that this new example should actually replace the existing documentation.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div> <br>
                  </div>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div text="#000000" bgcolor="#FFFFFF"><span> <br>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_extra">
                              <div class="gmail_quote">
                                <div> </div>
                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                  <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045HOEnZb">
                                    <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045h5">
                                      <div text="#000000" bgcolor="#FFFFFF">
                                        <div class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
                                          <br>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div>
                                                <div>
                                                  <div><br>
                                                  </div>
                                                  Do you see the same
                                                  challenges I do? Are
                                                  these the right
                                                  problem statements? I
                                                  think with clear
                                                  problem statements a
                                                  solution will be easy
                                                  to see and agree on.<br>
                                                </div>
                                              </div>
                                            </div>
                                          </blockquote>
                                          <br>
                                          I'm not convinced that these
                                          are actual problems/challenges
                                          that need to be addressed in
                                          the near term.<br>
                                          <br>
                                          <blockquote type="cite">
                                            <div dir="ltr">
                                              <div>
                                                <div><br>
                                                </div>
                                                Thanks!<br>
                                              </div>
                                              Brian<br>
                                            </div>
                                            <br>
                                            <fieldset class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587mimeAttachmentHeader"></fieldset>
                                            <br>
                                            <pre>______________________________<wbr>_________________
Pulp-dev mailing list
<a class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a>
<a class="m_-2307966803311662826m_-4677839867557154933m_-6005010689108662278m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a>
</pre>
                                          </blockquote>
                                          <br>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                  <br>
                                  ______________________________<wbr>_________________<br>
                                  Pulp-dev mailing list<br>
                                  <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
                                  <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
                                  <br>
                                </blockquote>
                              </div>
                              <br>
                            </div>
                          </div>
                        </blockquote>
                        <br>
                      </span></div>
                  </blockquote>
                </div>
                <br>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </span></div>

</blockquote></div><br></div></div>