[Pulp-dev] Duplicate nevra but not pkgId (suse repos)

Pavel Picka ppicka at redhat.com
Fri Mar 20 13:12:10 UTC 2020


David, That is a good point and agree now that location_href is much more
supportive to this case. Because when there will be two files with the
same path it is definitely an issue or incorrect repository (Even still
considering that anything can happen w/RPMs).

Bryan, in short this is more about validate metadata and correct way to
store them and recognize duplicates.
This case is in detail is about some packages in suse repo has the same
name (NEVRA) but they are different packages and have different usage (what
we learned recently - one can have some optimisation or legacy code in it
another is like original package from devs).


On Fri, Mar 20, 2020 at 1:33 PM Bryan Kearney <bkearney at redhat.com> wrote:

> Is this just about where to store the files on disk?
>
> -- bk
>
> On 3/20/20 7:24 AM, David Davis wrote:
> > I think using pkgid is problematic though. Consider the case where you
> have two
> > packages with the same location_href but different pkgIds. Since the
> pulp_rpm code
> > uses location_href (which also gets stored as relative_path) as the
> filename, which
> > one will get published when a repo version is published?
> >
> > PS - Don't tell me that two different packages will never have the same
> > location_href. If it's one thing I've learned working on RPM, things
> that will never
> > happen sometimes do happen.
> >
> > David
> >
> >
> > On Fri, Mar 20, 2020 at 4:46 AM Pavel Picka <ppicka at redhat.com
> > <mailto:ppicka at redhat.com>> wrote:
> >
> >     I think we should keep nevra as unique constraint, but as I
> mentioned before
> >     (above in this thread) your idea is similar to mine as my suggestion
> was NEVRA +
> >     checksum (pkgId).
> >     With pkgId I've already tested it and working good.
> >
> >     On Fri, Mar 20, 2020 at 5:43 AM Daniel Alley <dalley at redhat.com
> >     <mailto:dalley at redhat.com>> wrote:
> >
> >         I discussed this a little bit on the #rpm.org <http://rpm.org>
> channel.  Here
> >         is the gist of that discussion
> >
> >           * The metadata is "crazy, but technically valid"
> >           * "the entire SUSE ecosystem tends to do this a lot, anything
> using OBS,
> >             including nvidia and dell and friends"
> >           * "also, SUSE packages can have the same NEVRA with being
> completely
> >             different packages because of how their build system makes
> packages"
> >
> >         I'm not sure what the best means to fix it would be.  Perhaps
> the uniqueness
> >         constraint should be on the location_href, instead of on the
> NEVRA?  Or on
> >         NEVRA + location_href?
> >
> >         On Wed, Mar 18, 2020 at 9:47 AM Ina Panova <ipanova at redhat.com
> >         <mailto:ipanova at redhat.com>> wrote:
> >
> >             Pavel,
> >             I meant to say, that pulp3 does not have such limitation as
> pulp2 had (
> >             saving rpms on the filesystem with same nevra).
> >             The error is raised in pulp3 [0] when a repo version is
> created, because
> >             of the repo key[1], we cannot have 2 rpms with save NEVRA.
> >
> >             We can enable that, if we decide to, by adding location_href
> to the
> >             repo_key, *but* this needs to be evaluated, it can have side
> effects and
> >             we should involve our stakeholders to weigh in.
> >
> >             [0]
> >
> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
> >             [1]
> >
> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188
> >
> >             --------
> >             Regards,
> >
> >             Ina Panova
> >             Senior Software Engineer| Pulp| Red Hat Inc.
> >
> >             "Do not go where the path may lead,
> >              go instead where there is no path and leave a trail."
> >
> >
> >             On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka <
> ppicka at redhat.com
> >             <mailto:ppicka at redhat.com>> wrote:
> >
> >                 True in opensuse repository there are two
> possibilities 'src' and
> >                 'nosrc' (this one should be legacy without source code),
> both are
> >                 recognized by createrepo_c as arch 'src'.
> >
> >                 To point the pulp2 code I mentioned I found here [0]
> (base rpm
> >                 package what I understood).
> >
> >                 The rise of error in pulp3 happening here [1] in
> pulpcore when adding
> >                 packages to repository version.
> >                 So as Ina mentioned it doesn't have to be an issue with
> packages
> >                 itself than the logic in sync.
> >
> >                 [0]
> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779
> >                 [1]
> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
> >
> >                 On Wed, Mar 18, 2020 at 1:55 PM Ina Panova <
> ipanova at redhat.com
> >                 <mailto:ipanova at redhat.com>> wrote:
> >
> >                     Tanya and Pavel,
> >                     in this issue it is explained why we cannot keep 2
> packages with
> >                     same NEVRA but different checksums within a repo
> >                     https://pulp.plan.io/issues/494
> >
> >                     Pulp2 had a limitation where it was not able to save
> on the
> >                     filesystem 2 rpms with same filename, it lead to the
> primary.xml
> >                     that could have pointed to the rpm that did not
> actually get saved.
> >                     I believe in Pulp3 we could allow having rpm with
> same NEVRA if
> >                     they have different location_href within a repo.
> >
> >                     --------
> >                     Regards,
> >
> >                     Ina Panova
> >                     Senior Software Engineer| Pulp| Red Hat Inc.
> >
> >                     "Do not go where the path may lead,
> >                      go instead where there is no path and leave a
> trail."
> >
> >
> >                     On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko
> >                     <ttereshc at redhat.com <mailto:ttereshc at redhat.com>>
> wrote:
> >
> >                         Hi Pavel,
> >
> >                         On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka
> >                         <ppicka at redhat.com <mailto:ppicka at redhat.com>>
> wrote:
> >
> >                             Hello, would like to ask you how to proceed
> with issue
> >                             with duplicate (but not really) packages.
> >
> >                             I am syncing suse repository (opensuse42 and
> SLE12) and
> >                             get and duplicate error. But when checking
> the packages
> >                             [0](from primary.xml) glibc and glibc they
> got same nevra
> >                             but different checksum (and a few more as
> size..) so
> >                             doesn't look like real duplicates.
> >
> >                         Those are weird, the have the same nevra but see
> the
> >                         location_href, one is src and the other one is
> nosrc! :/ :
> >                         <location
> href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
> >                         <location href="src/glibc-2.19-20.3.src.rpm"/>
> >
> >                         It looks like something OpenSUSE specific. I'm
> not sure if
> >                         it's a valid way to create a repo with such
> metadata, we need
> >                         to figure it out at some point.
> >
> >
> >                             I've checked Pulp2 and there is used
> nevra+sum for
> >                             repository uniqueness. In pulp3 we use only
> nevra.
> >
> >                         Why do you think that in pulp 2 we use NEVRA +
> checksum? have
> >                         you tested it?  please point to the code.
> >                         I believe in Pulp 2 as well as in Pulp 3 we
> allow to have
> >                         packages with different checksums in Pulp
> storage.
> >                         I don't think we allow having the same packages
> with
> >                         different checksums in the same repo.
> >                         FWIW, in pulp 2 the most recently added package
> is chosen to
> >                         stay in a repo, no packages with duplicate NEVRA
> left after
> >                         sync,
> >                         see
> https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
> >
> >
> >
> >                             My suggestion is to extend repo_key_fields
> for rpm
> >                             package as is in pulp2 with pkgId
> (checksum). As I don't
> >                             think they are really duplicates and other
> software can
> >                             rely on specific version of package.
> >
> >
> >                         Unfortunately, I don't remember the main reason
> to remove
> >                         duplicates based on nevra. Was it because some
> tooling will
> >                         complain, or was it just to avoid duplicates at
> resync time?
> >                         Does anyone know?
> >                         We should not change it unless we know for sure
> that it's
> >                         needed + we would need to have an agreement from
> all our
> >                         stakeholders for that change.
> >
> >                         For now, I think we can move on and ensure that
> no duplicates
> >                         are in a repo version. To my understanding, the
> behaviour
> >                         will be the same as in pulp 2.
> >                         Feel free to share where you get duplicate error
> to see if
> >                         it's a bug or not. I wonder why duplicates are
> not removed
> >                         automatically. Maybe because the first version
> contains
> >                         duplicates due to this bug
> https://pulp.plan.io/issues/6217 ?
> >
> >                         Tanya
> >
> >
> >
> >                             What do you think?
> >
> >
> >                             [0]
> >
> >                                 <package type="rpm">
> >                                   <name>glibc</name>
> >                                   <arch>src</arch>
> >                                   <version epoch="0" ver="2.19"
> rel="20.3"/>
> >                                   <checksum type="sha256"
> >
>  pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum>
> >                                   <summary>Standard Shared Libraries
> (from the GNU C
> >                                 Library)</summary>
> >                                   <description>The GNU C Library
> provides the most
> >                                 important standard libraries used
> >                                 by nearly all programs: the standard C
> library, the
> >                                 standard math
> >                                 library, and the POSIX thread library. A
> system is
> >                                 not functional
> >                                 without these libraries.</description>
> >                                   <packager>https://www.suse.com/
> </packager>
> >                                   <url>
> http://www.gnu.org/software/libc/libc.html</url>
> >                                   <time file="1426696882"
> build="1425645307"/>
> >                                   <size package="591662"
> installed="13047428"
> >                                 archive="974464"/>
> >                                 <location
> href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
> >                                   <format>
> >                                     <rpm:license>LGPL-2.1+ and
> >                                 SUSE-LGPL-2.1+-with-GCC-exception and
> >                                 GPL-2.0+</rpm:license>
> >                                     <rpm:vendor>SUSE LLC
> >                                 <https://www.suse.com/&gt
> ;</rpm:vendor>
> >
> <rpm:group>System/Libraries</rpm:group>
> >
> <rpm:buildhost>sheep16</rpm:buildhost>
> >                                     <rpm:sourcerpm/>
> >                                     <rpm:header-range start="872"
> end="144403"/>
> >                                     <rpm:requires>
> >                                       <rpm:entry name="pwdutils"/>
> >                                       <rpm:entry name="xz"/>
> >                                       <rpm:entry name="fdupes"/>
> >                                       <rpm:entry
> name="systemd-rpm-macros"/>
> >                                       <rpm:entry
> name="libselinux-devel"/>
> >                                       <rpm:entry name="makeinfo"/>
> >                                     </rpm:requires>
> >                                   </format>
> >                                 </package>
> >
> >                                 <package type="rpm">
> >                                   <name>glibc</name>
> >                                   <arch>src</arch>
> >                                   <version epoch="0" ver="2.19"
> rel="20.3"/>
> >                                   <checksum type="sha256"
> >
>  pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum>
> >                                   <summary>Standard Shared Libraries
> (from the GNU C
> >                                 Library)</summary>
> >                                   <description>The GNU C Library
> provides the most
> >                                 important standard libraries used
> >                                 by nearly all programs: the standard C
> library, the
> >                                 standard math
> >                                 library, and the POSIX thread library. A
> system is
> >                                 not functional
> >                                 without these libraries.</description>
> >                                   <packager>https://www.suse.com/
> </packager>
> >                                   <url>
> http://www.gnu.org/software/libc/libc.html</url>
> >                                   <time file="1426696883"
> build="1423750734"/>
> >                                   <size package="12678975"
> installed="13047285"
> >                                 archive="13057760"/>
> >                                 <location
> href="src/glibc-2.19-20.3.src.rpm"/>
> >                                   <format>
> >                                     <rpm:license>LGPL-2.1+ and
> >                                 SUSE-LGPL-2.1+-with-GCC-exception and
> >                                 GPL-2.0+</rpm:license>
> >                                     <rpm:vendor>SUSE LLC
> >                                 <https://www.suse.com/&gt
> ;</rpm:vendor>
> >
> <rpm:group>System/Libraries</rpm:group>
> >
> <rpm:buildhost>sheep02</rpm:buildhost>
> >                                     <rpm:sourcerpm/>
> >                                     <rpm:header-range start="872"
> end="144334"/>
> >                                     <rpm:requires>
> >                                       <rpm:entry name="pwdutils"/>
> >                                       <rpm:entry name="xz"/>
> >                                       <rpm:entry name="fdupes"/>
> >                                       <rpm:entry
> name="systemd-rpm-macros"/>
> >                                       <rpm:entry
> name="libselinux-devel"/>
> >                                       <rpm:entry name="makeinfo"/>
> >                                     </rpm:requires>
> >                                   </format>
> >                                 </package>
> >
> >
> >                             --
> >                             Pavel Picka
> >                             Red Hat
> >
>  _______________________________________________
> >                             Pulp-dev mailing list
> >                             Pulp-dev at redhat.com <mailto:
> Pulp-dev at redhat.com>
> >
> https://www.redhat.com/mailman/listinfo/pulp-dev
> >
> >                         _______________________________________________
> >                         Pulp-dev mailing list
> >                         Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> >                         https://www.redhat.com/mailman/listinfo/pulp-dev
> >
> >
> >
> >                 --
> >                 Pavel Picka
> >                 Red Hat
> >                 _______________________________________________
> >                 Pulp-dev mailing list
> >                 Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> >                 https://www.redhat.com/mailman/listinfo/pulp-dev
> >
> >             _______________________________________________
> >             Pulp-dev mailing list
> >             Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> >             https://www.redhat.com/mailman/listinfo/pulp-dev
> >
> >
> >
> >     --
> >     Pavel Picka
> >     Red Hat
> >     _______________________________________________
> >     Pulp-dev mailing list
> >     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> >     https://www.redhat.com/mailman/listinfo/pulp-dev
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com
> > https://www.redhat.com/mailman/listinfo/pulp-dev
> >
>
>
>

-- 
Pavel Picka
Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200320/f2358af4/attachment.htm>


More information about the Pulp-dev mailing list