[Pulp-dev] Duplicate nevra but not pkgId (suse repos)

David Davis daviddavis at redhat.com
Fri Mar 20 11:24:31 UTC 2020


I think using pkgid is problematic though. Consider the case where you have
two packages with the same location_href but different pkgIds. Since the
pulp_rpm code uses location_href (which also gets stored as relative_path)
as the filename, which one will get published when a repo version is
published?

PS - Don't tell me that two different packages will never have the same
location_href. If it's one thing I've learned working on RPM, things that
will never happen sometimes do happen.

David


On Fri, Mar 20, 2020 at 4:46 AM Pavel Picka <ppicka at redhat.com> wrote:

> I think we should keep nevra as unique constraint, but as I mentioned
> before (above in this thread) your idea is similar to mine as my suggestion
> was NEVRA + checksum (pkgId).
> With pkgId I've already tested it and working good.
>
> On Fri, Mar 20, 2020 at 5:43 AM Daniel Alley <dalley at redhat.com> wrote:
>
>> I discussed this a little bit on the #rpm.org channel.  Here is the gist
>> of that discussion
>>
>>    - The metadata is "crazy, but technically valid"
>>    - "the entire SUSE ecosystem tends to do this a lot, anything using
>>    OBS, including nvidia and dell and friends"
>>    - "also, SUSE packages can have the same NEVRA with being completely
>>    different packages because of how their build system makes packages"
>>
>> I'm not sure what the best means to fix it would be.  Perhaps the
>> uniqueness constraint should be on the location_href, instead of on the
>> NEVRA?  Or on NEVRA + location_href?
>>
>> On Wed, Mar 18, 2020 at 9:47 AM Ina Panova <ipanova at redhat.com> wrote:
>>
>>> Pavel,
>>> I meant to say, that pulp3 does not have such limitation as pulp2 had (
>>> saving rpms on the filesystem with same nevra).
>>> The error is raised in pulp3 [0] when a repo version is created, because
>>> of the repo key[1], we cannot have 2 rpms with save NEVRA.
>>>
>>> We can enable that, if we decide to, by adding location_href to the
>>> repo_key, *but* this needs to be evaluated, it can have side effects and we
>>> should involve our stakeholders to weigh in.
>>>
>>> [0]
>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
>>> [1]
>>> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188
>>>
>>> --------
>>> Regards,
>>>
>>> Ina Panova
>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>
>>> "Do not go where the path may lead,
>>>  go instead where there is no path and leave a trail."
>>>
>>>
>>> On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka <ppicka at redhat.com> wrote:
>>>
>>>> True in opensuse repository there are two possibilities 'src' and
>>>> 'nosrc' (this one should be legacy without source code), both are
>>>> recognized by createrepo_c as arch 'src'.
>>>>
>>>> To point the pulp2 code I mentioned I found here [0] (base rpm package
>>>> what I understood).
>>>>
>>>> The rise of error in pulp3 happening here [1] in pulpcore when adding
>>>> packages to repository version.
>>>> So as Ina mentioned it doesn't have to be an issue with packages itself
>>>> than the logic in sync.
>>>>
>>>> [0]
>>>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779
>>>> [1]
>>>> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
>>>>
>>>> On Wed, Mar 18, 2020 at 1:55 PM Ina Panova <ipanova at redhat.com> wrote:
>>>>
>>>>> Tanya and Pavel,
>>>>> in this issue it is explained why we cannot keep 2 packages with same
>>>>> NEVRA but different checksums within a repo
>>>>> https://pulp.plan.io/issues/494
>>>>>
>>>>> Pulp2 had a limitation where it was not able to save on the filesystem
>>>>> 2 rpms with same filename, it lead to the primary.xml that could have
>>>>> pointed to the rpm that did not actually get saved.
>>>>> I believe in Pulp3 we could allow having rpm with same NEVRA if they
>>>>> have different location_href within a repo.
>>>>>
>>>>> --------
>>>>> Regards,
>>>>>
>>>>> Ina Panova
>>>>> Senior Software Engineer| Pulp| Red Hat Inc.
>>>>>
>>>>> "Do not go where the path may lead,
>>>>>  go instead where there is no path and leave a trail."
>>>>>
>>>>>
>>>>> On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko <
>>>>> ttereshc at redhat.com> wrote:
>>>>>
>>>>>> Hi Pavel,
>>>>>>
>>>>>> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka <ppicka at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello, would like to ask you how to proceed with issue with
>>>>>>> duplicate (but not really) packages.
>>>>>>>
>>>>>>> I am syncing suse repository (opensuse42 and SLE12) and get and
>>>>>>> duplicate error. But when checking the packages [0](from primary.xml) glibc
>>>>>>> and glibc they got same nevra but different checksum (and a few more as
>>>>>>> size..) so doesn't look like real duplicates.
>>>>>>>
>>>>>> Those are weird, the have the same nevra but see the location_href,
>>>>>> one is src and the other one is nosrc! :/ :
>>>>>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>>>>>> <location href="src/glibc-2.19-20.3.src.rpm"/>
>>>>>>
>>>>>> It looks like something OpenSUSE specific. I'm not sure if it's a
>>>>>> valid way to create a repo with such metadata, we need to figure it out at
>>>>>> some point.
>>>>>>
>>>>>>
>>>>>>> I've checked Pulp2 and there is used nevra+sum for repository
>>>>>>> uniqueness. In pulp3 we use only nevra.
>>>>>>>
>>>>>> Why do you think that in pulp 2 we use NEVRA + checksum? have you
>>>>>> tested it?  please point to the code.
>>>>>> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages
>>>>>> with different checksums in Pulp storage.
>>>>>> I don't think we allow having the same packages with different
>>>>>> checksums in the same repo.
>>>>>> FWIW, in pulp 2 the most recently added package is chosen to stay in
>>>>>> a repo, no packages with duplicate NEVRA left after sync, see
>>>>>> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> My suggestion is to extend repo_key_fields for rpm package as is in
>>>>>>> pulp2 with pkgId (checksum). As I don't think they are really duplicates
>>>>>>> and other software can rely on specific version of package.
>>>>>>>
>>>>>>
>>>>>> Unfortunately, I don't remember the main reason to remove duplicates
>>>>>> based on nevra. Was it because some tooling will complain, or was it just
>>>>>> to avoid duplicates at resync time? Does anyone know?
>>>>>> We should not change it unless we know for sure that it's needed + we
>>>>>> would need to have an agreement from all our stakeholders for that change.
>>>>>>
>>>>>> For now, I think we can move on and ensure that no duplicates are in
>>>>>> a repo version. To my understanding, the behaviour will be the same as in
>>>>>> pulp 2.
>>>>>> Feel free to share where you get duplicate error to see if it's a bug
>>>>>> or not. I wonder why duplicates are not removed automatically. Maybe
>>>>>> because the first version contains duplicates due to this bug
>>>>>> https://pulp.plan.io/issues/6217 ?
>>>>>>
>>>>>> Tanya
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>> [0]
>>>>>>>
>>>>>>>> <package type="rpm">
>>>>>>>>   <name>glibc</name>
>>>>>>>>   <arch>src</arch>
>>>>>>>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>>>>>>>   <checksum type="sha256"
>>>>>>>> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum>
>>>>>>>>   <summary>Standard Shared Libraries (from the GNU C
>>>>>>>> Library)</summary>
>>>>>>>>   <description>The GNU C Library provides the most important
>>>>>>>> standard libraries used
>>>>>>>> by nearly all programs: the standard C library, the standard math
>>>>>>>> library, and the POSIX thread library. A system is not functional
>>>>>>>> without these libraries.</description>
>>>>>>>>   <packager>https://www.suse.com/</packager>
>>>>>>>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>>>>>>>   <time file="1426696882" build="1425645307"/>
>>>>>>>>   <size package="591662" installed="13047428" archive="974464"/>
>>>>>>>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>>>>>>>>   <format>
>>>>>>>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception
>>>>>>>> and GPL-2.0+</rpm:license>
>>>>>>>>     <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor>
>>>>>>>>     <rpm:group>System/Libraries</rpm:group>
>>>>>>>>     <rpm:buildhost>sheep16</rpm:buildhost>
>>>>>>>>     <rpm:sourcerpm/>
>>>>>>>>     <rpm:header-range start="872" end="144403"/>
>>>>>>>>     <rpm:requires>
>>>>>>>>       <rpm:entry name="pwdutils"/>
>>>>>>>>       <rpm:entry name="xz"/>
>>>>>>>>       <rpm:entry name="fdupes"/>
>>>>>>>>       <rpm:entry name="systemd-rpm-macros"/>
>>>>>>>>       <rpm:entry name="libselinux-devel"/>
>>>>>>>>       <rpm:entry name="makeinfo"/>
>>>>>>>>     </rpm:requires>
>>>>>>>>   </format>
>>>>>>>> </package>
>>>>>>>>
>>>>>>>> <package type="rpm">
>>>>>>>>   <name>glibc</name>
>>>>>>>>   <arch>src</arch>
>>>>>>>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>>>>>>>   <checksum type="sha256"
>>>>>>>> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum>
>>>>>>>>   <summary>Standard Shared Libraries (from the GNU C
>>>>>>>> Library)</summary>
>>>>>>>>   <description>The GNU C Library provides the most important
>>>>>>>> standard libraries used
>>>>>>>> by nearly all programs: the standard C library, the standard math
>>>>>>>> library, and the POSIX thread library. A system is not functional
>>>>>>>> without these libraries.</description>
>>>>>>>>   <packager>https://www.suse.com/</packager>
>>>>>>>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>>>>>>>   <time file="1426696883" build="1423750734"/>
>>>>>>>>   <size package="12678975" installed="13047285" archive="13057760"/>
>>>>>>>> <location href="src/glibc-2.19-20.3.src.rpm"/>
>>>>>>>>   <format>
>>>>>>>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception
>>>>>>>> and GPL-2.0+</rpm:license>
>>>>>>>>     <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor>
>>>>>>>>     <rpm:group>System/Libraries</rpm:group>
>>>>>>>>     <rpm:buildhost>sheep02</rpm:buildhost>
>>>>>>>>     <rpm:sourcerpm/>
>>>>>>>>     <rpm:header-range start="872" end="144334"/>
>>>>>>>>     <rpm:requires>
>>>>>>>>       <rpm:entry name="pwdutils"/>
>>>>>>>>       <rpm:entry name="xz"/>
>>>>>>>>       <rpm:entry name="fdupes"/>
>>>>>>>>       <rpm:entry name="systemd-rpm-macros"/>
>>>>>>>>       <rpm:entry name="libselinux-devel"/>
>>>>>>>>       <rpm:entry name="makeinfo"/>
>>>>>>>>     </rpm:requires>
>>>>>>>>   </format>
>>>>>>>> </package>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pavel Picka
>>>>>>> Red Hat
>>>>>>> _______________________________________________
>>>>>>> Pulp-dev mailing list
>>>>>>> Pulp-dev at redhat.com
>>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Pulp-dev mailing list
>>>>>> Pulp-dev at redhat.com
>>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Pavel Picka
>>>> Red Hat
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>
>
> --
> Pavel Picka
> Red Hat
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200320/f53ddbcb/attachment.htm>


More information about the Pulp-dev mailing list