[Pulp-dev] Duplicate nevra but not pkgId (suse repos)
Bryan Kearney
bkearney at redhat.com
Fri Mar 20 12:33:19 UTC 2020
Is this just about where to store the files on disk?
-- bk
On 3/20/20 7:24 AM, David Davis wrote:
> I think using pkgid is problematic though. Consider the case where you have two
> packages with the same location_href but different pkgIds. Since the pulp_rpm code
> uses location_href (which also gets stored as relative_path) as the filename, which
> one will get published when a repo version is published?
>
> PS - Don't tell me that two different packages will never have the same
> location_href. If it's one thing I've learned working on RPM, things that will never
> happen sometimes do happen.
>
> David
>
>
> On Fri, Mar 20, 2020 at 4:46 AM Pavel Picka <ppicka at redhat.com
> <mailto:ppicka at redhat.com>> wrote:
>
> I think we should keep nevra as unique constraint, but as I mentioned before
> (above in this thread) your idea is similar to mine as my suggestion was NEVRA +
> checksum (pkgId).
> With pkgId I've already tested it and working good.
>
> On Fri, Mar 20, 2020 at 5:43 AM Daniel Alley <dalley at redhat.com
> <mailto:dalley at redhat.com>> wrote:
>
> I discussed this a little bit on the #rpm.org <http://rpm.org> channel. Here
> is the gist of that discussion
>
> * The metadata is "crazy, but technically valid"
> * "the entire SUSE ecosystem tends to do this a lot, anything using OBS,
> including nvidia and dell and friends"
> * "also, SUSE packages can have the same NEVRA with being completely
> different packages because of how their build system makes packages"
>
> I'm not sure what the best means to fix it would be. Perhaps the uniqueness
> constraint should be on the location_href, instead of on the NEVRA? Or on
> NEVRA + location_href?
>
> On Wed, Mar 18, 2020 at 9:47 AM Ina Panova <ipanova at redhat.com
> <mailto:ipanova at redhat.com>> wrote:
>
> Pavel,
> I meant to say, that pulp3 does not have such limitation as pulp2 had (
> saving rpms on the filesystem with same nevra).
> The error is raised in pulp3 [0] when a repo version is created, because
> of the repo key[1], we cannot have 2 rpms with save NEVRA.
>
> We can enable that, if we decide to, by adding location_href to the
> repo_key, *but* this needs to be evaluated, it can have side effects and
> we should involve our stakeholders to weigh in.
>
> [0]
> https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
> [1]
> https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/models/package.py#L188
>
> --------
> Regards,
>
> Ina Panova
> Senior Software Engineer| Pulp| Red Hat Inc.
>
> "Do not go where the path may lead,
> go instead where there is no path and leave a trail."
>
>
> On Wed, Mar 18, 2020 at 2:24 PM Pavel Picka <ppicka at redhat.com
> <mailto:ppicka at redhat.com>> wrote:
>
> True in opensuse repository there are two possibilities 'src' and
> 'nosrc' (this one should be legacy without source code), both are
> recognized by createrepo_c as arch 'src'.
>
> To point the pulp2 code I mentioned I found here [0] (base rpm
> package what I understood).
>
> The rise of error in pulp3 happening here [1] in pulpcore when adding
> packages to repository version.
> So as Ina mentioned it doesn't have to be an issue with packages
> itself than the logic in sync.
>
> [0] https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/db/models.py#L779
> [1] https://github.com/pulp/pulpcore/blob/master/pulpcore/app/models/repository.py#L570
>
> On Wed, Mar 18, 2020 at 1:55 PM Ina Panova <ipanova at redhat.com
> <mailto:ipanova at redhat.com>> wrote:
>
> Tanya and Pavel,
> in this issue it is explained why we cannot keep 2 packages with
> same NEVRA but different checksums within a repo
> https://pulp.plan.io/issues/494
>
> Pulp2 had a limitation where it was not able to save on the
> filesystem 2 rpms with same filename, it lead to the primary.xml
> that could have pointed to the rpm that did not actually get saved.
> I believe in Pulp3 we could allow having rpm with same NEVRA if
> they have different location_href within a repo.
>
> --------
> Regards,
>
> Ina Panova
> Senior Software Engineer| Pulp| Red Hat Inc.
>
> "Do not go where the path may lead,
> go instead where there is no path and leave a trail."
>
>
> On Wed, Mar 18, 2020 at 10:47 AM Tatiana Tereshchenko
> <ttereshc at redhat.com <mailto:ttereshc at redhat.com>> wrote:
>
> Hi Pavel,
>
> On Tue, Mar 17, 2020 at 7:31 PM Pavel Picka
> <ppicka at redhat.com <mailto:ppicka at redhat.com>> wrote:
>
> Hello, would like to ask you how to proceed with issue
> with duplicate (but not really) packages.
>
> I am syncing suse repository (opensuse42 and SLE12) and
> get and duplicate error. But when checking the packages
> [0](from primary.xml) glibc and glibc they got same nevra
> but different checksum (and a few more as size..) so
> doesn't look like real duplicates.
>
> Those are weird, the have the same nevra but see the
> location_href, one is src and the other one is nosrc! :/ :
> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
> <location href="src/glibc-2.19-20.3.src.rpm"/>
>
> It looks like something OpenSUSE specific. I'm not sure if
> it's a valid way to create a repo with such metadata, we need
> to figure it out at some point.
>
>
> I've checked Pulp2 and there is used nevra+sum for
> repository uniqueness. In pulp3 we use only nevra.
>
> Why do you think that in pulp 2 we use NEVRA + checksum? have
> you tested it? please point to the code.
> I believe in Pulp 2 as well as in Pulp 3 we allow to have
> packages with different checksums in Pulp storage.
> I don't think we allow having the same packages with
> different checksums in the same repo.
> FWIW, in pulp 2 the most recently added package is chosen to
> stay in a repo, no packages with duplicate NEVRA left after
> sync,
> see https://github.com/pulp/pulp_rpm/blob/2-master/plugins/pulp_rpm/plugins/importers/yum/purge.py#L285-L333
>
>
>
> My suggestion is to extend repo_key_fields for rpm
> package as is in pulp2 with pkgId (checksum). As I don't
> think they are really duplicates and other software can
> rely on specific version of package.
>
>
> Unfortunately, I don't remember the main reason to remove
> duplicates based on nevra. Was it because some tooling will
> complain, or was it just to avoid duplicates at resync time?
> Does anyone know?
> We should not change it unless we know for sure that it's
> needed + we would need to have an agreement from all our
> stakeholders for that change.
>
> For now, I think we can move on and ensure that no duplicates
> are in a repo version. To my understanding, the behaviour
> will be the same as in pulp 2.
> Feel free to share where you get duplicate error to see if
> it's a bug or not. I wonder why duplicates are not removed
> automatically. Maybe because the first version contains
> duplicates due to this bug https://pulp.plan.io/issues/6217 ?
>
> Tanya
>
>
>
> What do you think?
>
>
> [0]
>
> <package type="rpm">
> <name>glibc</name>
> <arch>src</arch>
> <version epoch="0" ver="2.19" rel="20.3"/>
> <checksum type="sha256"
> pkgid="YES">00d36c0f741b0c01a77ce318a2bbcfa59cb4dd0b24ce61f57c6205e4fa1bb310</checksum>
> <summary>Standard Shared Libraries (from the GNU C
> Library)</summary>
> <description>The GNU C Library provides the most
> important standard libraries used
> by nearly all programs: the standard C library, the
> standard math
> library, and the POSIX thread library. A system is
> not functional
> without these libraries.</description>
> <packager>https://www.suse.com/</packager>
> <url>http://www.gnu.org/software/libc/libc.html</url>
> <time file="1426696882" build="1425645307"/>
> <size package="591662" installed="13047428"
> archive="974464"/>
> <location href="nosrc/glibc-2.19-20.3.nosrc.rpm"/>
> <format>
> <rpm:license>LGPL-2.1+ and
> SUSE-LGPL-2.1+-with-GCC-exception and
> GPL-2.0+</rpm:license>
> <rpm:vendor>SUSE LLC
> <https://www.suse.com/></rpm:vendor>
> <rpm:group>System/Libraries</rpm:group>
> <rpm:buildhost>sheep16</rpm:buildhost>
> <rpm:sourcerpm/>
> <rpm:header-range start="872" end="144403"/>
> <rpm:requires>
> <rpm:entry name="pwdutils"/>
> <rpm:entry name="xz"/>
> <rpm:entry name="fdupes"/>
> <rpm:entry name="systemd-rpm-macros"/>
> <rpm:entry name="libselinux-devel"/>
> <rpm:entry name="makeinfo"/>
> </rpm:requires>
> </format>
> </package>
>
> <package type="rpm">
> <name>glibc</name>
> <arch>src</arch>
> <version epoch="0" ver="2.19" rel="20.3"/>
> <checksum type="sha256"
> pkgid="YES">353e1dc85eab8d434be83160eca4fcee11a72eec345385df125ca0835abd6068</checksum>
> <summary>Standard Shared Libraries (from the GNU C
> Library)</summary>
> <description>The GNU C Library provides the most
> important standard libraries used
> by nearly all programs: the standard C library, the
> standard math
> library, and the POSIX thread library. A system is
> not functional
> without these libraries.</description>
> <packager>https://www.suse.com/</packager>
> <url>http://www.gnu.org/software/libc/libc.html</url>
> <time file="1426696883" build="1423750734"/>
> <size package="12678975" installed="13047285"
> archive="13057760"/>
> <location href="src/glibc-2.19-20.3.src.rpm"/>
> <format>
> <rpm:license>LGPL-2.1+ and
> SUSE-LGPL-2.1+-with-GCC-exception and
> GPL-2.0+</rpm:license>
> <rpm:vendor>SUSE LLC
> <https://www.suse.com/></rpm:vendor>
> <rpm:group>System/Libraries</rpm:group>
> <rpm:buildhost>sheep02</rpm:buildhost>
> <rpm:sourcerpm/>
> <rpm:header-range start="872" end="144334"/>
> <rpm:requires>
> <rpm:entry name="pwdutils"/>
> <rpm:entry name="xz"/>
> <rpm:entry name="fdupes"/>
> <rpm:entry name="systemd-rpm-macros"/>
> <rpm:entry name="libselinux-devel"/>
> <rpm:entry name="makeinfo"/>
> </rpm:requires>
> </format>
> </package>
>
>
> --
> Pavel Picka
> Red Hat
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
>
> --
> Pavel Picka
> Red Hat
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
>
> --
> Pavel Picka
> Red Hat
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20200320/f5dc95ae/attachment.sig>
More information about the Pulp-dev
mailing list