<div dir="ltr">I think we should look into this in the near-term. Changing an interface on an object used by all plugins will be significantly easier, earlier.<br><br><div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 11, 2018 at 12:25 PM, Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><span>
    <br>
    <br>
    <div class="m_2987425865952863938m_-2372350557339798045moz-cite-prefix">On 04/11/2018 10:59 AM, Brian Bouterse
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Tue, Apr 10, 2018 at 10:43 AM,
            Jeff Ortel <span dir="ltr"><<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="m_2987425865952863938m_-2372350557339798045HOEnZb">
                <div class="m_2987425865952863938m_-2372350557339798045h5">
                  <div text="#000000" bgcolor="#FFFFFF"> <br>
                    <div class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container"><br>
                      <br>
                      <table class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-email-headers-table" border="0" width="380" cellspacing="0" cellpadding="0" height="86">
                        <tbody>
                          <tr>
                            <th align="RIGHT" valign="BASELINE" nowrap><br>
                            </th>
                            <td><br>
                            </td>
                          </tr>
                          <tr>
                            <th align="RIGHT" valign="BASELINE" nowrap><br>
                            </th>
                            <td><br>
                            </td>
                          </tr>
                          <tr>
                            <th align="RIGHT" valign="BASELINE" nowrap><br>
                            </th>
                            <td><br>
                            </td>
                          </tr>
                          <tr>
                            <th align="RIGHT" valign="BASELINE" nowrap><br>
                            </th>
                            <td><br>
                            </td>
                          </tr>
                        </tbody>
                      </table>
                      <br>
                      <div class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-cite-prefix">On
                        04/06/2018 09:15 AM, Brian Bouterse wrote:<br>
                      </div>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div>
                            <div>
                              <div>
                                <div>
                                  <div>Several plugins have started
                                    using the Changesets including
                                    pulp_ansible, pulp_python,
                                    pulp_file, and perhaps others. The
                                    Changesets provide several distinct
                                    points of value which are great, but
                                    there are two challenges I want to
                                    bring up. I want to focus only on
                                    the problem statements first.<br>
                                    <br>
                                  </div>
                                  1. There is redundant "differencing"
                                  code in all plugins. The Changeset
                                  interface requires the plugin writer
                                  to determine what units need to be
                                  added and those to be removed. This
                                  requires all plugin writers to write
                                  the same non-trivial differencing code
                                  over and over. For example, you can
                                  see the same non-trivial differencing
                                  code present in <a href="https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306" target="_blank">pulp_ansible</a>,
                                  <a href="https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193" target="_blank">pulp_file</a>,
                                  and <a href="https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223" target="_blank">pulp_python</a>.
                                  Line-wise, this "differencing" code
                                  makes up a large portion (maybe 50%)
                                  of the sync code itself in each
                                  plugin.<br>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <br>
                      Ten lines of trivial set logic hardly seems like a
                      big deal but any duplication is worth exploring. <br>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
            <div>It's more than ten lines. Take pulp_ansible for
              example. By my count (the linked to section) it's 89
              lines, which out of 306 lines of plugin code for sync is
              29% of extra redundant code. The other plugins have
              similar numbers. So with those numbers in mind, what do
              you think?<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    I was counting the lines (w/o comments) in find_delta() based on the
    linked code.  Which functions are you counting?<span><br></span></div></blockquote><div><br></div><div>I was counting the find_delta, build_additions, and build_removals methods. Regardless of how the lines are counted, that differencing code is the duplication I'm talking about. There isn't a way to use the changesets without duplicating that differencing code in a plugin.<br><br>So a shorter, simpler problem statement is: "to use the changesets plugin writers have to do extra work to compute additions and removals parameters".<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="m_2987425865952863938m_-2372350557339798045HOEnZb">
                <div class="m_2987425865952863938m_-2372350557339798045h5">
                  <div text="#000000" bgcolor="#FFFFFF">
                    <div class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
                      <br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div>
                            <div>
                              <div>
                                <div><br>
                                </div>
                                2. Plugins can't do end-to-end stream
                                processing. The Changesets themselves do
                                stream processing, but when you call
                                into changeset.apply_and_drain() you
                                have to have fully parsed the metadata
                                already. Currently when fetching all
                                metadata from Galaxy, pulp_ansible takes
                                about 380 seconds (6+ min). This means
                                that the actual Changeset content
                                downloading starts 380 seconds later
                                than it could. At the heart of the
                                problem, the fetching+parsing of the
                                metadata is not part of the stream
                                processing.<br>
                              </div>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <br>
                      The additions/removals can be any interable (like
                      generator) and by using ChangeSet.apply() and
                      iterating the returned object, the pluign can
                      "turn the crank" while downloading and processing
                      the metadata.  The ChangeSet.apply_and_drain() is
                      just a convenience method.  I don't see how this
                      is a limitation of the ChangeSet.<br>
                    </div>
                  </div>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>That is new info for me (and maybe everyone). OK so
              Changesets have two interfaces. apply() and
              apply_and_drain(). Why do we have two interfaces when
              apply() can support all existing use cases (that I know
              of) and do end-to-end stream processing but
              apply_and_drain() cannot? I see all of our examples (and
              all of our new plugins) using apply_and_drain().<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></span>
    The ChangeSet.apply() was how I designed (and documented) it.  Not
    sure when/who added the apply_and_drain().  +1 for removing it.<span><br></span></div></blockquote><div><br></div><div>I read through the changeset docs. I think this stream processing thing is still a problem but perhaps in how we're presenting the Changeset with it's arguments. I don't think apply() versus apply_and_drain() are at all related. Regardless of if you are using apply() or apply_and_drain(), the Changeset requires an 'additions' and 'removals' arguments. This sends a clear message to the plugin writer that they need to compute additions and removals. They will fetch the metadata to compute these which is mostly how the changeset documentation reads. To know that they could present a generator that would correctly allow the metdata from inside the Changeset is I feel as non-obvious. I want the high-performing implementation to be the obvious one.<br><br>So what about a problem statement like this: "Changesets are presented such that when you call into them you should already have fetched the metadata"?<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><span>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div class="m_2987425865952863938m_-2372350557339798045HOEnZb">
                <div class="m_2987425865952863938m_-2372350557339798045h5">
                  <div text="#000000" bgcolor="#FFFFFF">
                    <div class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-forward-container">
                      <br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div>
                            <div>
                              <div><br>
                              </div>
                              Do you see the same challenges I do? Are
                              these the right problem statements? I
                              think with clear problem statements a
                              solution will be easy to see and agree on.<br>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <br>
                      I'm not convinced that these are actual
                      problems/challenges that need to be addressed in
                      the near term.<br>
                      <br>
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div>
                            <div><br>
                            </div>
                            Thanks!<br>
                          </div>
                          Brian<br>
                        </div>
                        <br>
                        <fieldset class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587mimeAttachmentHeader"></fieldset>
                        <br>
                        <pre>______________________________<wbr>_________________
Pulp-dev mailing list
<a class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a>
<a class="m_2987425865952863938m_-2372350557339798045m_-5307126087037616587moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a>
</pre>
                      </blockquote>
                      <br>
                    </div>
                  </div>
                </div>
              </div>
              <br>
              ______________________________<wbr>_________________<br>
              Pulp-dev mailing list<br>
              <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
              <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman<wbr>/listinfo/pulp-dev</a><br>
              <br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </span></div>

</blockquote></div><br></div></div></div></div></div>