[Pulp-dev] Duplicate nevra but not pkgId (suse repos)

Bryan Kearney bkearney at redhat.com
Fri Mar 20 12:33:19 UTC 2020


Is this just about where to store the files on disk?

-- bk

On 3/20/20 7:24 AM, David Davis wrote:
> I think using pkgid is problematic though. Consider the case where you have two
> packages with the same location_href but different pkgIds. Since the pulp_rpm code
> uses location_href (which also gets stored as relative_path) as the filename, which
> one will get published when a repo version is published?
> 
> PS - Don't tell me that two different packages will never have the same
> location_href. If it's one thing I've learned working on RPM, things that will never
> happen sometimes do happen.
> 
> David
> 
> 
> On Fri, Mar 20, 2020 at 4:46 AM Pavel Picka <ppicka at redhat.com
> <mailto:ppicka at redhat.com>> wrote:
> 
>     I think we should keep nevra as unique constraint, but as I mentioned before
>     (above in this thread) your idea is similar to mine as my suggestion was NEVRA +
>     checksum (pkgId). 
>     With pkgId I've already tested it and working good. 
> 
>     On Fri, Mar 20, 2020 at 5:43 AM Daniel Alley <dalley at redhat.com
>     <mailto:dalley at redhat.com>> wrote:
> 
>         I discussed this a little bit on the #rpm.org <http://rpm.org> channel.  Here
>         is the gist of that discussion
> 
>           * The metadata is "crazy, but technically valid"
>           * "the entire SUSE ecosystem tends to do this a lot, anything using OBS,
>             including nvidia and dell and friends"
>           * "also, SUSE packages can have the same NEVRA with being completely
>             different packages because of how their build system makes packages"
> 
>         I'm not sure what the best means to fix it would be.  Perhaps the uniqueness
>         constraint should be on the location_href, instead of on the NEVRA?  Or on
>         NEVRA + location_href?
> 
>         On Wed, Mar 18, 2020 at 9:47 AM Ina Panova <ipanova at redhat.com
>         <mailto:ipanova at redhat.com>> wrote:
> 
>             Pavel,
>             I meant to say, that pulp3 does not have such limitation as pulp2 had (
>             saving rpms on the filesystem with same nevra).
>             The error is raised in pulp3 [0] when a repo version is created, because
>             of the repo key[1], we cannot have 2 rpms with save NEVRA.
> 
>             We can enable that, if we decide to, by adding location_href to the
>             repo_key, *but* this needs to be evaluated, it can have side effects and
>             we should involve our stakeholders to weigh in.
> 
>             [0]
>             https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
>             [1]
>             https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188
> 
>             --------
>             Regards,
> 
>             Ina Panova
>             Senior Software Engineer| Pulp| Red Hat Inc.
> 
>             "Do not go where the path may lead,
>              go instead where there is no path and leave a trail."
> 
> 
>             On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka <ppicka at redhat.com
>             <mailto:ppicka at redhat.com>> wrote:
> 
>                 True in opensuse repository there are two possibilities 'src' and
>                 'nosrc' (this one should be legacy without source code), both are
>                 recognized by createrepo_c as arch 'src'. 
> 
>                 To point the pulp2 code I mentioned I found here [0] (base rpm
>                 package what I understood).
> 
>                 The rise of error in pulp3 happening here [1] in pulpcore when adding
>                 packages to repository version. 
>                 So as Ina mentioned it doesn't have to be an issue with packages
>                 itself than the logic in sync. 
> 
>                 [0] https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779
>                 [1] https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
> 
>                 On Wed, Mar 18, 2020 at 1:55 PM Ina Panova <ipanova at redhat.com
>                 <mailto:ipanova at redhat.com>> wrote:
> 
>                     Tanya and Pavel,
>                     in this issue it is explained why we cannot keep 2 packages with
>                     same NEVRA but different checksums within a repo
>                     https://pulp.plan.io/issues/494
> 
>                     Pulp2 had a limitation where it was not able to save on the
>                     filesystem 2 rpms with same filename, it lead to the primary.xml
>                     that could have pointed to the rpm that did not actually get saved.
>                     I believe in Pulp3 we could allow having rpm with same NEVRA if
>                     they have different location_href within a repo.
> 
>                     --------
>                     Regards,
> 
>                     Ina Panova
>                     Senior Software Engineer| Pulp| Red Hat Inc.
> 
>                     "Do not go where the path may lead,
>                      go instead where there is no path and leave a trail."
> 
> 
>                     On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko
>                     <ttereshc at redhat.com <mailto:ttereshc at redhat.com>> wrote:
> 
>                         Hi Pavel,
> 
>                         On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka
>                         <ppicka at redhat.com <mailto:ppicka at redhat.com>> wrote:
> 
>                             Hello, would like to ask you how to proceed with issue
>                             with duplicate (but not really) packages.
> 
>                             I am syncing suse repository (opensuse42 and SLE12) and
>                             get and duplicate error. But when checking the packages
>                             [0](from primary.xml) glibc and glibc they got same nevra
>                             but different checksum (and a few more as size..) so
>                             doesn't look like real duplicates.
> 
>                         Those are weird, the have the same nevra but see the
>                         location_href, one is src and the other one is nosrc! :/ :
>                         <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>                         <location href="src/glibc-2.19-20.3.src.rpm"/>
>                          
>                         It looks like something OpenSUSE specific. I'm not sure if
>                         it's a valid way to create a repo with such metadata, we need
>                         to figure it out at some point.
> 
> 
>                             I've checked Pulp2 and there is used nevra+sum for
>                             repository uniqueness. In pulp3 we use only nevra.
> 
>                         Why do you think that in pulp 2 we use NEVRA + checksum? have
>                         you tested it?  please point to the code.
>                         I believe in Pulp 2 as well as in Pulp 3 we allow to have
>                         packages with different checksums in Pulp storage.
>                         I don't think we allow having the same packages with
>                         different checksums in the same repo. 
>                         FWIW, in pulp 2 the most recently added package is chosen to
>                         stay in a repo, no packages with duplicate NEVRA left after
>                         sync,
>                         see https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>                          
> 
> 
>                             My suggestion is to extend repo_key_fields for rpm
>                             package as is in pulp2 with pkgId (checksum). As I don't
>                             think they are really duplicates and other software can
>                             rely on specific version of package.
> 
> 
>                         Unfortunately, I don't remember the main reason to remove
>                         duplicates based on nevra. Was it because some tooling will
>                         complain, or was it just to avoid duplicates at resync time?
>                         Does anyone know?
>                         We should not change it unless we know for sure that it's
>                         needed + we would need to have an agreement from all our
>                         stakeholders for that change.
> 
>                         For now, I think we can move on and ensure that no duplicates
>                         are in a repo version. To my understanding, the behaviour
>                         will be the same as in pulp 2.
>                         Feel free to share where you get duplicate error to see if
>                         it's a bug or not. I wonder why duplicates are not removed
>                         automatically. Maybe because the first version contains
>                         duplicates due to this bug https://pulp.plan.io/issues/6217 ?
> 
>                         Tanya
>                          
> 
> 
>                             What do you think?
> 
> 
>                             [0]
> 
>                                 <package type="rpm">
>                                   <name>glibc</name>
>                                   <arch>src</arch>
>                                   <version epoch="0" ver="2.19" rel="20.3"/>
>                                   <checksum type="sha256"
>                                 pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum>
>                                   <summary>Standard Shared Libraries (from the GNU C
>                                 Library)</summary>
>                                   <description>The GNU C Library provides the most
>                                 important standard libraries used
>                                 by nearly all programs: the standard C library, the
>                                 standard math
>                                 library, and the POSIX thread library. A system is
>                                 not functional
>                                 without these libraries.</description>
>                                   <packager>https://www.suse.com/</packager>
>                                   <url>http://www.gnu.org/software/libc/libc.html</url>
>                                   <time file="1426696882" build="1425645307"/>
>                                   <size package="591662" installed="13047428"
>                                 archive="974464"/>
>                                 <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
>                                   <format>
>                                     <rpm:license>LGPL-2.1+ and
>                                 SUSE-LGPL-2.1+-with-GCC-exception and
>                                 GPL-2.0+</rpm:license>
>                                     <rpm:vendor>SUSE LLC
>                                 <https://www.suse.com/></rpm:vendor>
>                                     <rpm:group>System/Libraries</rpm:group>
>                                     <rpm:buildhost>sheep16</rpm:buildhost>
>                                     <rpm:sourcerpm/>
>                                     <rpm:header-range start="872" end="144403"/>
>                                     <rpm:requires>
>                                       <rpm:entry name="pwdutils"/>
>                                       <rpm:entry name="xz"/>
>                                       <rpm:entry name="fdupes"/>
>                                       <rpm:entry name="systemd-rpm-macros"/>
>                                       <rpm:entry name="libselinux-devel"/>
>                                       <rpm:entry name="makeinfo"/>
>                                     </rpm:requires>
>                                   </format>
>                                 </package>
> 
>                                 <package type="rpm">
>                                   <name>glibc</name>
>                                   <arch>src</arch>
>                                   <version epoch="0" ver="2.19" rel="20.3"/>
>                                   <checksum type="sha256"
>                                 pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum>
>                                   <summary>Standard Shared Libraries (from the GNU C
>                                 Library)</summary>
>                                   <description>The GNU C Library provides the most
>                                 important standard libraries used
>                                 by nearly all programs: the standard C library, the
>                                 standard math
>                                 library, and the POSIX thread library. A system is
>                                 not functional
>                                 without these libraries.</description>
>                                   <packager>https://www.suse.com/</packager>
>                                   <url>http://www.gnu.org/software/libc/libc.html</url>
>                                   <time file="1426696883" build="1423750734"/>
>                                   <size package="12678975" installed="13047285"
>                                 archive="13057760"/>
>                                 <location href="src/glibc-2.19-20.3.src.rpm"/>
>                                   <format>
>                                     <rpm:license>LGPL-2.1+ and
>                                 SUSE-LGPL-2.1+-with-GCC-exception and
>                                 GPL-2.0+</rpm:license>
>                                     <rpm:vendor>SUSE LLC
>                                 <https://www.suse.com/></rpm:vendor>
>                                     <rpm:group>System/Libraries</rpm:group>
>                                     <rpm:buildhost>sheep02</rpm:buildhost>
>                                     <rpm:sourcerpm/>
>                                     <rpm:header-range start="872" end="144334"/>
>                                     <rpm:requires>
>                                       <rpm:entry name="pwdutils"/>
>                                       <rpm:entry name="xz"/>
>                                       <rpm:entry name="fdupes"/>
>                                       <rpm:entry name="systemd-rpm-macros"/>
>                                       <rpm:entry name="libselinux-devel"/>
>                                       <rpm:entry name="makeinfo"/>
>                                     </rpm:requires>
>                                   </format>
>                                 </package>
> 
> 
>                             -- 
>                             Pavel Picka
>                             Red Hat
>                             _______________________________________________
>                             Pulp-dev mailing list
>                             Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>                             https://www.redhat.com/mailman/listinfo/pulp-dev
> 
>                         _______________________________________________
>                         Pulp-dev mailing list
>                         Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>                         https://www.redhat.com/mailman/listinfo/pulp-dev
> 
> 
> 
>                 -- 
>                 Pavel Picka
>                 Red Hat
>                 _______________________________________________
>                 Pulp-dev mailing list
>                 Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>                 https://www.redhat.com/mailman/listinfo/pulp-dev
> 
>             _______________________________________________
>             Pulp-dev mailing list
>             Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>             https://www.redhat.com/mailman/listinfo/pulp-dev
> 
> 
> 
>     -- 
>     Pavel Picka
>     Red Hat
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200320/f5dc95ae/attachment.sig>


More information about the Pulp-dev mailing list