[Pulp-dev] Duplicate nevra but not pkgId (suse repos)

Daniel Alley dalley at redhat.com
Fri Mar 20 04:42:52 UTC 2020


I discussed this a little bit on the #rpm.org channel.  Here is the gist of
that discussion

   - The metadata is "crazy, but technically valid"
   - "the entire SUSE ecosystem tends to do this a lot, anything using OBS,
   including nvidia and dell and friends"
   - "also, SUSE packages can have the same NEVRA with being completely
   different packages because of how their build system makes packages"

I'm not sure what the best means to fix it would be.  Perhaps the
uniqueness constraint should be on the location_href, instead of on the
NEVRA?  Or on NEVRA + location_href?

On Wed, Mar 18, 2020 at 9:47 AM Ina Panova <ipanova at redhat.com> wrote:

> Pavel,
> I meant to say, that pulp3 does not have such limitation as pulp2 had (
> saving rpms on the filesystem with same nevra).
> The error is raised in pulp3 [0] when a repo version is created, because
> of the repo key[1], we cannot have 2 rpms with save NEVRA.
>
> We can enable that, if we decide to, by adding location_href to the
> repo_key, *but* this needs to be evaluated, it can have side effects and we
> should involve our stakeholders to weigh in.
>
> [0]
> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
> [1]
> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188
>
> --------
> Regards,
>
> Ina Panova
> Senior Software Engineer| Pulp| Red Hat Inc.
>
> "Do not go where the path may lead,
>  go instead where there is no path and leave a trail."
>
>
> On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka <ppicka at redhat.com> wrote:
>
>> True in opensuse repository there are two possibilities 'src' and 'nosrc'
>> (this one should be legacy without source code), both are recognized by
>> createrepo_c as arch 'src'.
>>
>> To point the pulp2 code I mentioned I found here [0] (base rpm package
>> what I understood).
>>
>> The rise of error in pulp3 happening here [1] in pulpcore when adding
>> packages to repository version.
>> So as Ina mentioned it doesn't have to be an issue with packages itself
>> than the logic in sync.
>>
>> [0]
>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779
>> [1]
>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
>>
>> On Wed, Mar 18, 2020 at 1:55 PM Ina Panova <ipanova at redhat.com> wrote:
>>
>>> Tanya and Pavel,
>>> in this issue it is explained why we cannot keep 2 packages with same
>>> NEVRA but different checksums within a repo
>>> https://pulp.plan.io/issues/494
>>>
>>> Pulp2 had a limitation where it was not able to save on the filesystem 2
>>> rpms with same filename, it lead to the primary.xml that could have pointed
>>> to the rpm that did not actually get saved.
>>> I believe in Pulp3 we could allow having rpm with same NEVRA if they
>>> have different location_href within a repo.
>>>
>>> --------
>>> Regards,
>>>
>>> Ina Panova
>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>
>>> "Do not go where the path may lead,
>>>  go instead where there is no path and leave a trail."
>>>
>>>
>>> On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko <
>>> ttereshc at redhat.com> wrote:
>>>
>>>> Hi Pavel,
>>>>
>>>> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka <ppicka at redhat.com> wrote:
>>>>
>>>>> Hello, would like to ask you how to proceed with issue with duplicate
>>>>> (but not really) packages.
>>>>>
>>>>> I am syncing suse repository (opensuse42 and SLE12) and get and
>>>>> duplicate error. But when checking the packages [0](from primary.xml) glibc
>>>>> and glibc they got same nevra but different checksum (and a few more as
>>>>> size..) so doesn't look like real duplicates.
>>>>>
>>>> Those are weird, the have the same nevra but see the location_href, one
>>>> is src and the other one is nosrc! :/ :
>>>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>>>> <location href="src/glibc-2.19-20.3.src.rpm"/>
>>>>
>>>> It looks like something OpenSUSE specific. I'm not sure if it's a valid
>>>> way to create a repo with such metadata, we need to figure it out at some
>>>> point.
>>>>
>>>>
>>>>> I've checked Pulp2 and there is used nevra+sum for repository
>>>>> uniqueness. In pulp3 we use only nevra.
>>>>>
>>>> Why do you think that in pulp 2 we use NEVRA + checksum? have you
>>>> tested it?  please point to the code.
>>>> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with
>>>> different checksums in Pulp storage.
>>>> I don't think we allow having the same packages with different
>>>> checksums in the same repo.
>>>> FWIW, in pulp 2 the most recently added package is chosen to stay in a
>>>> repo, no packages with duplicate NEVRA left after sync, see
>>>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>>>>
>>>>
>>>>>
>>>>> My suggestion is to extend repo_key_fields for rpm package as is in
>>>>> pulp2 with pkgId (checksum). As I don't think they are really duplicates
>>>>> and other software can rely on specific version of package.
>>>>>
>>>>
>>>> Unfortunately, I don't remember the main reason to remove duplicates
>>>> based on nevra. Was it because some tooling will complain, or was it just
>>>> to avoid duplicates at resync time? Does anyone know?
>>>> We should not change it unless we know for sure that it's needed + we
>>>> would need to have an agreement from all our stakeholders for that change.
>>>>
>>>> For now, I think we can move on and ensure that no duplicates are in a
>>>> repo version. To my understanding, the behaviour will be the same as in
>>>> pulp 2.
>>>> Feel free to share where you get duplicate error to see if it's a bug
>>>> or not. I wonder why duplicates are not removed automatically. Maybe
>>>> because the first version contains duplicates due to this bug
>>>> https://pulp.plan.io/issues/6217 ?
>>>>
>>>> Tanya
>>>>
>>>>
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>> [0]
>>>>>
>>>>>> <package type="rpm">
>>>>>>   <name>glibc</name>
>>>>>>   <arch>src</arch>
>>>>>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>>>>>   <checksum type="sha256"
>>>>>> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum>
>>>>>>   <summary>Standard Shared Libraries (from the GNU C
>>>>>> Library)</summary>
>>>>>>   <description>The GNU C Library provides the most important standard
>>>>>> libraries used
>>>>>> by nearly all programs: the standard C library, the standard math
>>>>>> library, and the POSIX thread library. A system is not functional
>>>>>> without these libraries.</description>
>>>>>>   <packager>https://www.suse.com/</packager>
>>>>>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>>>>>   <time file="1426696882" build="1425645307"/>
>>>>>>   <size package="591662" installed="13047428" archive="974464"/>
>>>>>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>>>>>>   <format>
>>>>>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>>>>>> GPL-2.0+</rpm:license>
>>>>>>     <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor>
>>>>>>     <rpm:group>System/Libraries</rpm:group>
>>>>>>     <rpm:buildhost>sheep16</rpm:buildhost>
>>>>>>     <rpm:sourcerpm/>
>>>>>>     <rpm:header-range start="872" end="144403"/>
>>>>>>     <rpm:requires>
>>>>>>       <rpm:entry name="pwdutils"/>
>>>>>>       <rpm:entry name="xz"/>
>>>>>>       <rpm:entry name="fdupes"/>
>>>>>>       <rpm:entry name="systemd-rpm-macros"/>
>>>>>>       <rpm:entry name="libselinux-devel"/>
>>>>>>       <rpm:entry name="makeinfo"/>
>>>>>>     </rpm:requires>
>>>>>>   </format>
>>>>>> </package>
>>>>>>
>>>>>> <package type="rpm">
>>>>>>   <name>glibc</name>
>>>>>>   <arch>src</arch>
>>>>>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>>>>>   <checksum type="sha256"
>>>>>> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum>
>>>>>>   <summary>Standard Shared Libraries (from the GNU C
>>>>>> Library)</summary>
>>>>>>   <description>The GNU C Library provides the most important standard
>>>>>> libraries used
>>>>>> by nearly all programs: the standard C library, the standard math
>>>>>> library, and the POSIX thread library. A system is not functional
>>>>>> without these libraries.</description>
>>>>>>   <packager>https://www.suse.com/</packager>
>>>>>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>>>>>   <time file="1426696883" build="1423750734"/>
>>>>>>   <size package="12678975" installed="13047285" archive="13057760"/>
>>>>>> <location href="src/glibc-2.19-20.3.src.rpm"/>
>>>>>>   <format>
>>>>>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>>>>>> GPL-2.0+</rpm:license>
>>>>>>     <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor>
>>>>>>     <rpm:group>System/Libraries</rpm:group>
>>>>>>     <rpm:buildhost>sheep02</rpm:buildhost>
>>>>>>     <rpm:sourcerpm/>
>>>>>>     <rpm:header-range start="872" end="144334"/>
>>>>>>     <rpm:requires>
>>>>>>       <rpm:entry name="pwdutils"/>
>>>>>>       <rpm:entry name="xz"/>
>>>>>>       <rpm:entry name="fdupes"/>
>>>>>>       <rpm:entry name="systemd-rpm-macros"/>
>>>>>>       <rpm:entry name="libselinux-devel"/>
>>>>>>       <rpm:entry name="makeinfo"/>
>>>>>>     </rpm:requires>
>>>>>>   </format>
>>>>>> </package>
>>>>>
>>>>>
>>>>> --
>>>>> Pavel Picka
>>>>> Red Hat
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>
>>
>> --
>> Pavel Picka
>> Red Hat
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200320/29700d6a/attachment.htm>


More information about the Pulp-dev mailing list