[Pulp-dev] RPM plugin Copy API discussion

Daniel Alley dalley at redhat.com
Thu Dec 12 19:49:40 UTC 2019

In the coming weeks we will need to settle on a strategy for the Pulp 3
advanced copy APIs for the RPM plugin. This is one of, if not the most
complicated plugins, so there are a lot of factors to consider.  We'd like
to invite the community to participate in the discussion and get an idea of
what patterns and workflows you would like to have, and help elaborate on
the pros and cons of each approach, and possibly suggest new approaches we
haven't thought of.

Here are the basic use cases that we have come up with, and some points of
interest/concern that we noted during the RPM meeting today:

Use cases?


   As a user, I can copy all content from one repository to another

   As a user, I can copy content matching certain "search criteria" from
   one repository to another repository

      Search criteria with copying more multiple content types is a
      difficult problem -- what content types do the criteria correspond to?
      - Possible solutions:
         - Allow criteria to be specified as a JSON data structure so that
         it can be kept organized by type
         - Only allow copying one content type at a time -- but couple this
         with a feature to allow "incomplete" repository versions to
be built up
         over the course of many different operations
            - This was suggested as a possible necessary use case, but one
            that we need more feedback on

   As a user, when copying content that directly references other content,
   the referenced content is *always* copied (if present)

      e.g. Modules referencing Modules, Modules referencing RPMs, Erratum
      referencing RPMs, and {{other types}}
      - Can't think of a reason to allow bypassing this -- the manual UX if
      you wanted to bypass this would be excruciatingly painful and require
      looking at the details of the individual content units
      - Modules declare debug packages as artifacts, debug packages are not
      always present in the same repo, so we can only copy referenced units "if


   As a user, I can optionally choose to copy all indirect dependencies of
   content that is being copied (recursive copy)

      Should this be the default?

      Arguments for both 'yes' and 'no'

      Let’s see what perf looks like in real-world scenarios

   Some content types create a new content unit in the destination repo
   instead of just copying

      e.g. yum_repo_metadata_file

   Multi-repo copy needed for modularity
   - Multiple source repositories used for depsolving, search criteria only
      applies to the primary source repo, multiple target repositories
      with matching source repos

What should the API look like?


   If we want to support all of the use cases with API endpoint and one
   task, then we might need to use ggainey’s proposal for a complex filter
   provided by a JSON blob (proposal #3 on the issue)
   - {
          'package': ["name=firefox, arch=x86_64, version>=70",
      "name=chrome, version==72.0.1"],
          'modulemd': ["name=ripgrep, stream=master"]

      The Python plugin does something like this, but the criteria matching
      library is provided for us by the Python packaging utils. We
would have to
      implement this ourselves for RPM

      Brian proposed a modification of this idea where the search criteria
      can be saved and re-used between tasks rather than provided each
and every

      Ina’s concern: How we would process these queries?

         We will perform search on each content type separately and return
         a result only if both of the queries would give back a result?

         It can happen that repo will have package foo but not modulemd
         - Do we fail if any are missing, or succeed regardless of matching

   If we had a feature where the user can progressively build up a complete
   repository version by modifying an incomplete repository version, then one
   single very complex search criteria layout is unnecessary. You could run
   several copy tasks, one for each content type you want to copy, with
   criteria corresponding only to that type.

      BUT, with recursive copy, that might lead to a lot of overhead since
      it has to be set up for each individual task

         BUT, there are ways to mitigate that overhead, albeit it would be
         very challenging

Hopefully it is clear from this summary that the topic is complicated and
that it could be accomplished in several very different ways. We would love
your feedback!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20191212/289c1741/attachment.htm>

More information about the Pulp-dev mailing list