[Pulp-dev] Thoughts needed on Yet Another Advisory-Merge Oddness

Wed Apr 21 12:03:36 UTC 2021

Hey Grant,

Let's start with thoughts without modularity :)
I think the main reasons for the existing behavior are semantical and also
aim to preserve collections as they are.
The situation you describe (when I change only pkglist and nothing else at
all) usually means one of the following:
 1. A user wants to combine the same advisory which came from different
repos, e.g RHEL base and RHEL debuginfo.
 2. A user tries to fix an advisory in a bad way. This should  never rarely
happen. If they change advisory, they need to update the `updated`
timestamp and or version, and no merge will be done.

For situation #1, if it happens that collection names are the same (which
is often not the case in the wild), we are not modifying the existing one,
and you can clearly see that it was combined, it also preserves grouping of
packages in it which might be relevant, e.g. debuginfo vs others.
For situation #2, I agree it looks a bit weird but they should not do it in
the first place.

Having said that, there is no problem functionality wise, one or two
collections will work fine if their names are different.
I personally would leave it as is. It gives me more confidence that we are
not modifying a collection package list but just adding a new one and
feeding it to createrepo_c as is. Plus if there is a problem, it's easier
to see that it was a merge.

Now the modularity part :)
This might require some experimenting with dnf or just asking them how the
situation with multiple collections for the same modules is handled.
Here is how modular collection looks:

Collection
    - name_0
    - module NSVCA_0
    - list of packages

If we look at 2 modular collections with the same name but different
modules, those can not be merged.
Current behaviour covers that case. If we choose to put all packages into
one collection at the merge time, we need to check the modularity aspect of
the collection.

If we look at 2 modular collections with the same name and same modules, we
technically can merge them.
And here is the unknown part for me, if we leave 2 collections with the
same modules and make names different, will dnf handle it correctly, or
does it expect only one collection per module.
I'd think that dnf doesn't care for module uniqueness if the collection
names are different but it's worth checking.

Thanks for sorting it all out,
Tanya

P.S. +1 to give a better name to the unnamed collections, fwiw, in pulp 2
it was something like "collection_0", collection_1", etc.

On Tue, Apr 20, 2021 at 3:03 PM Grant Gainey <ggainey at redhat.com> wrote:

> Hey gang,
>
> In the process of fixing issue 7282 <https://pulp.plan.io/issues/7282>, I
> have encountered a behavior from advisory-merge that appears to be the
> deliberately-desired behavior - and I'm not sure it's really what we want
> to do? I need some thoughts on this. Figured I'd start in email, because
> it's advisory-merge, and complicated, and trying to follow a verbal
> description is confusing. More confusing.
>
> The PR with the proposed fix is https://github.com/pulp/pulp_rpm/pull/1973
>
> The scenario is as follows:
>
> 1) Upload an advisory that has a package-list with one RPM, say 'bear', to
> a repository.
> 2) Upload an *identical* advisory, except for having a different package
> list with one RPM, say 'camel', to the same repository.
> 3) The resulting advisory-conflict-resolution falls into the "date and
> version identical, pkg-list disjoint" case, which calls for an advisory
> merge here:
>
> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/advisory.py#L222
>
> 4) advisory_merge() finds two 'collections' - one for each
> pkglist.packages in each advisory, neither of which is named.
> 5) advisory_merge() results in an advisory with two collections. One has
> no name, and is a package-list containing 'bear'. The other is named
> "None_0"[0], and is a package-list containing "camel".
>
> You can see where this decision is made here:
>
> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/advisory.py#L273-L284
>
> That result was surprising to me. My assumption going in was, if we were
> merging two advisories, we would merge their same-named collections into
> one collection with elements from both. But we're very carefully choosing
> to not-do that.
>
> So, my question here is, can someone explain to me the rationale for the
> current behavior? I'm sure it has something to do with modularity, because
> half of the weird things I run into in RPM-land are because of modularity
> :) But I'd like to understand why we're doing what we're doing, so I can
> add some commentary in the test or in the merge_advisories() code itself.
>
> If this turns out to be undesirable emergent behavior, then I can address
> it as part of this PR.
>
> The test I added for all this fun is here:
>
> https://github.com/pulp/pulp_rpm/pull/1973/files#diff-781abba15eb7e61b9ae644ae112bf1292dfd7eb78e7162b74830d4ddb43e1868R274
>
> Thoughts very much appreciated!
> G
>
> [0] "None_0" is Rude. Even if this is the right behavior, we prob could
> stand to notice 'unnamed' collections and do something a little less
> painful...
> --
> Grant Gainey
> Principal Software Engineer, Red Hat System Management Engineering
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20210421/3fa371ba/attachment.htm>