[Pulp-dev] Duplicate nevra but not pkgId (suse repos)

Ina Panova ipanova at redhat.com
Wed Mar 18 12:54:46 UTC 2020


Tanya and Pavel,
in this issue it is explained why we cannot keep 2 packages with same NEVRA
but different checksums within a repo https://pulp.plan.io/issues/494

Pulp2 had a limitation where it was not able to save on the filesystem 2
rpms with same filename, it lead to the primary.xml that could have pointed
to the rpm that did not actually get saved.
I believe in Pulp3 we could allow having rpm with same NEVRA if they have
different location_href within a repo.

--------
Regards,

Ina Panova
Senior Software Engineer| Pulp| Red Hat Inc.

"Do not go where the path may lead,
 go instead where there is no path and leave a trail."


On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko <ttereshc at redhat.com>
wrote:

> Hi Pavel,
>
> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka <ppicka at redhat.com> wrote:
>
>> Hello, would like to ask you how to proceed with issue with duplicate
>> (but not really) packages.
>>
>> I am syncing suse repository (opensuse42 and SLE12) and get and duplicate
>> error. But when checking the packages [0](from primary.xml) glibc and glibc
>> they got same nevra but different checksum (and a few more as size..) so
>> doesn't look like real duplicates.
>>
> Those are weird, the have the same nevra but see the location_href, one is
> src and the other one is nosrc! :/ :
> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
> <location href="src/glibc-2.19-20.3.src.rpm"/>
>
> It looks like something OpenSUSE specific. I'm not sure if it's a valid
> way to create a repo with such metadata, we need to figure it out at some
> point.
>
>
>> I've checked Pulp2 and there is used nevra+sum for repository uniqueness.
>> In pulp3 we use only nevra.
>>
> Why do you think that in pulp 2 we use NEVRA + checksum? have you tested
> it?  please point to the code.
> I believe in Pulp 2 as well as in Pulp 3 we allow to have packages with
> different checksums in Pulp storage.
> I don't think we allow having the same packages with different checksums
> in the same repo.
> FWIW, in pulp 2 the most recently added package is chosen to stay in a
> repo, no packages with duplicate NEVRA left after sync, see
> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>
>
>>
>> My suggestion is to extend repo_key_fields for rpm package as is in pulp2
>> with pkgId (checksum). As I don't think they are really duplicates and
>> other software can rely on specific version of package.
>>
>
> Unfortunately, I don't remember the main reason to remove duplicates based
> on nevra. Was it because some tooling will complain, or was it just to
> avoid duplicates at resync time? Does anyone know?
> We should not change it unless we know for sure that it's needed + we
> would need to have an agreement from all our stakeholders for that change.
>
> For now, I think we can move on and ensure that no duplicates are in a
> repo version. To my understanding, the behaviour will be the same as in
> pulp 2.
> Feel free to share where you get duplicate error to see if it's a bug or
> not. I wonder why duplicates are not removed automatically. Maybe because
> the first version contains duplicates due to this bug
> https://pulp.plan.io/issues/6217 ?
>
> Tanya
>
>
>>
>> What do you think?
>>
>>
>> [0]
>>
>>> <package type="rpm">
>>>   <name>glibc</name>
>>>   <arch>src</arch>
>>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>>   <checksum type="sha256"
>>> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum>
>>>   <summary>Standard Shared Libraries (from the GNU C Library)</summary>
>>>   <description>The GNU C Library provides the most important standard
>>> libraries used
>>> by nearly all programs: the standard C library, the standard math
>>> library, and the POSIX thread library. A system is not functional
>>> without these libraries.</description>
>>>   <packager>https://www.suse.com/</packager>
>>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>>   <time file="1426696882" build="1425645307"/>
>>>   <size package="591662" installed="13047428" archive="974464"/>
>>> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>>>   <format>
>>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>>> GPL-2.0+</rpm:license>
>>>     <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor>
>>>     <rpm:group>System/Libraries</rpm:group>
>>>     <rpm:buildhost>sheep16</rpm:buildhost>
>>>     <rpm:sourcerpm/>
>>>     <rpm:header-range start="872" end="144403"/>
>>>     <rpm:requires>
>>>       <rpm:entry name="pwdutils"/>
>>>       <rpm:entry name="xz"/>
>>>       <rpm:entry name="fdupes"/>
>>>       <rpm:entry name="systemd-rpm-macros"/>
>>>       <rpm:entry name="libselinux-devel"/>
>>>       <rpm:entry name="makeinfo"/>
>>>     </rpm:requires>
>>>   </format>
>>> </package>
>>>
>>> <package type="rpm">
>>>   <name>glibc</name>
>>>   <arch>src</arch>
>>>   <version epoch="0" ver="2.19" rel="20.3"/>
>>>   <checksum type="sha256"
>>> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum>
>>>   <summary>Standard Shared Libraries (from the GNU C Library)</summary>
>>>   <description>The GNU C Library provides the most important standard
>>> libraries used
>>> by nearly all programs: the standard C library, the standard math
>>> library, and the POSIX thread library. A system is not functional
>>> without these libraries.</description>
>>>   <packager>https://www.suse.com/</packager>
>>>   <url>http://www.gnu.org/software/libc/libc.html</url>
>>>   <time file="1426696883" build="1423750734"/>
>>>   <size package="12678975" installed="13047285" archive="13057760"/>
>>> <location href="src/glibc-2.19-20.3.src.rpm"/>
>>>   <format>
>>>     <rpm:license>LGPL-2.1+ and SUSE-LGPL-2.1+-with-GCC-exception and
>>> GPL-2.0+</rpm:license>
>>>     <rpm:vendor>SUSE LLC <https://www.suse.com/></rpm:vendor>
>>>     <rpm:group>System/Libraries</rpm:group>
>>>     <rpm:buildhost>sheep02</rpm:buildhost>
>>>     <rpm:sourcerpm/>
>>>     <rpm:header-range start="872" end="144334"/>
>>>     <rpm:requires>
>>>       <rpm:entry name="pwdutils"/>
>>>       <rpm:entry name="xz"/>
>>>       <rpm:entry name="fdupes"/>
>>>       <rpm:entry name="systemd-rpm-macros"/>
>>>       <rpm:entry name="libselinux-devel"/>
>>>       <rpm:entry name="makeinfo"/>
>>>     </rpm:requires>
>>>   </format>
>>> </package>
>>
>>
>> --
>> Pavel Picka
>> Red Hat
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200318/714f71da/attachment.htm>


More information about the Pulp-dev mailing list