<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <div class="moz-forward-container"><br>
      <br>
      <table class="moz-email-headers-table" border="0" height="86"
        cellspacing="0" cellpadding="0" width="380">
        <tbody>
          <tr>
            <th valign="BASELINE" align="RIGHT" nowrap="nowrap"><br>
            </th>
            <td><br>
            </td>
          </tr>
          <tr>
            <th valign="BASELINE" align="RIGHT" nowrap="nowrap"><br>
            </th>
            <td><br>
            </td>
          </tr>
          <tr>
            <th valign="BASELINE" align="RIGHT" nowrap="nowrap"><br>
            </th>
            <td><br>
            </td>
          </tr>
          <tr>
            <th valign="BASELINE" align="RIGHT" nowrap="nowrap"><br>
            </th>
            <td><br>
            </td>
          </tr>
        </tbody>
      </table>
      <br>
      <div class="moz-cite-prefix">On 04/06/2018 09:15 AM, Brian
        Bouterse wrote:<br>
      </div>
      <blockquote type="cite"
cite="mid:CAAcvrTH4qGmbpS7wst2TaGmDTd38eTadEfEVG5gpsvN1cSHteQ@mail.gmail.com">
        <div dir="ltr">
          <div>
            <div>
              <div>
                <div>
                  <div>Several plugins have started using the Changesets
                    including pulp_ansible, pulp_python, pulp_file, and
                    perhaps others. The Changesets provide several
                    distinct points of value which are great, but there
                    are two challenges I want to bring up. I want to
                    focus only on the problem statements first.<br>
                    <br>
                  </div>
                  1. There is redundant "differencing" code in all
                  plugins. The Changeset interface requires the plugin
                  writer to determine what units need to be added and
                  those to be removed. This requires all plugin writers
                  to write the same non-trivial differencing code over
                  and over. For example, you can see the same
                  non-trivial differencing code present in <a
href="https://github.com/pulp/pulp_ansible/blob/d0eb9d125f9a6cdc82e2807bcad38749967a1245/pulp_ansible/app/tasks/synchronizing.py#L217-L306"
                    moz-do-not-send="true">pulp_ansible</a>, <a
href="https://github.com/pulp/pulp_file/blob/30afa7cce667b57d8fe66d5fc1fe87fd77029210/pulp_file/app/tasks/synchronizing.py#L114-L193"
                    moz-do-not-send="true">pulp_file</a>, and <a
href="https://github.com/pulp/pulp_python/blob/066d33990e64b5781c8419b96acaf2acf1982324/pulp_python/app/tasks/sync.py#L172-L223"
                    moz-do-not-send="true">pulp_python</a>. Line-wise,
                  this "differencing" code makes up a large portion
                  (maybe 50%) of the sync code itself in each plugin.<br>
                </div>
              </div>
            </div>
          </div>
        </div>
      </blockquote>
      <br>
      Ten lines of trivial set logic hardly seems like a big deal but
      any duplication is worth exploring. <br>
      <br>
      <blockquote type="cite"
cite="mid:CAAcvrTH4qGmbpS7wst2TaGmDTd38eTadEfEVG5gpsvN1cSHteQ@mail.gmail.com">
        <div dir="ltr">
          <div>
            <div>
              <div>
                <div><br>
                </div>
                2. Plugins can't do end-to-end stream processing. The
                Changesets themselves do stream processing, but when you
                call into changeset.apply_and_drain() you have to have
                fully parsed the metadata already. Currently when
                fetching all metadata from Galaxy, pulp_ansible takes
                about 380 seconds (6+ min). This means that the actual
                Changeset content downloading starts 380 seconds later
                than it could. At the heart of the problem, the
                fetching+parsing of the metadata is not part of the
                stream processing.<br>
              </div>
            </div>
          </div>
        </div>
      </blockquote>
      <br>
      The additions/removals can be any interable (like generator) and
      by using ChangeSet.apply() and iterating the returned object, the
      pluign can "turn the crank" while downloading and processing the
      metadata.  The ChangeSet.apply_and_drain() is just a convenience
      method.  I don't see how this is a limitation of the ChangeSet. <br>
      <br>
      <blockquote type="cite"
cite="mid:CAAcvrTH4qGmbpS7wst2TaGmDTd38eTadEfEVG5gpsvN1cSHteQ@mail.gmail.com">
        <div dir="ltr">
          <div>
            <div>
              <div><br>
              </div>
              Do you see the same challenges I do? Are these the right
              problem statements? I think with clear problem statements
              a solution will be easy to see and agree on.<br>
            </div>
          </div>
        </div>
      </blockquote>
      <br>
      I'm not convinced that these are actual problems/challenges that
      need to be addressed in the near term.<br>
      <br>
      <blockquote type="cite"
cite="mid:CAAcvrTH4qGmbpS7wst2TaGmDTd38eTadEfEVG5gpsvN1cSHteQ@mail.gmail.com">
        <div dir="ltr">
          <div>
            <div><br>
            </div>
            Thanks!<br>
          </div>
          Brian<br>
        </div>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br>
        <pre wrap="">_______________________________________________
Pulp-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" moz-do-not-send="true">Pulp-dev@redhat.com</a>
<a class="moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" moz-do-not-send="true">https://www.redhat.com/mailman/listinfo/pulp-dev</a>
</pre>
      </blockquote>
      <br>
    </div>
  </body>
</html>