[Pulp-dev] Concerns about bulk_create and PostgreSQL

David Davis daviddavis at redhat.com
Wed Jan 9 13:46:18 UTC 2019


The Rubygems api includes sha as part of the metadata for a gem. Couldn't
you use that as part of the natural key?

I'm surprised that Chef's supermarket API doesn't include this as well.
Maybe we could open a feature request?

David


On Tue, Jan 8, 2019 at 2:50 PM Simon Baatz <gmbnomis at gmail.com> wrote:

> On 08.01.2019 17:16, Jeff Ortel wrote:
> >
> >
> > On 1/3/19 1:28 PM, Simon Baatz wrote:
> >> On Thu, Jan 03, 2019 at 01:02:57PM -0500, David Davis wrote:
> >>>     I don't think that using integer ids with bulk_create and
> >>> supporting
> >>>     mysql/mariadb are necessarily mutually exclusive. I think there
> >>> might
> >>>     be a way to find the records created using bulk_create if we
> >>> know the
> >>>     natural key. It might be more performant than using UUIDs as well.
> >> This assumes that there is a natural key.  For content types with no
> >> digest information in the meta data, there may be a natural key
> >> for content within a repo version only, but no natural key for the
> >> overall content.  (If we want to support non-immediate modes for such
> >> content.  In immediate mode, a digest can be computed from the
> >> associated artifact(s)).
> >
> > Can you give some examples of Content without a natural key?
>
> For example, the meta-data obtained for Cookbooks is "version" and
> "name" (the same seems to apply to Ruby Gems). With immediate sync
> policy, we can add a digest to each content unit as we know the digest
> of the associated artifact. Thus, the natural key is "version", "name",
> and "digest"
>
> In "non-immediate mode", we only have "version" and "name" to work with
> during sync. Now, there is a trade-off (I think) and we have the
> following possibilities:
>
> 1. Just pretend that "version" and "name" are unique. We have a natural
> key, but it may lead to the cross-repo effects that I described a while
> ago on the list.
> 2. Use "version" and "name" as natural key within a repo version, but
> not globally. In this scenario, it may turn out that two content units
> are actually the same after downloading.
>
> I prefer option 2: Content sharing is not perfect, but as a user, I
> don't have to fear that repositories mix-up content that happens to have
> the same name and version.
>
> There is also an extension of 2., which allows content sharing during
> sync for immediate mode. Define a "pseudo" natural key on global
> content level: "version", "name" and "digest". "digest" may be null. Two
> content units are considered the same if they match in all three
> attributes and these attributes are not null. But even in immediate
> mode, the artifact will not be downloaded if "name" and "version" are
> already present in the repository version the sync is based on. A
> pipeline for this could look like:
>
>     def pipeline_stages(self, new_version):
>         pipeline = [
>             self.first_stage,
>             QueryExistingContentUnits(new_version=new_version),
>             ExistingContentNeedsNoArtifacts()
>         ]
>         if self.download_artifacts:
>             pipeline.extend([ArtifactDownloader(), ArtifactSaver(),
>                              UpdateContentWithDownloadResult(),
> QueryExistingContentUnits()])
>         pipeline.extend([ContentUnitSaver()])
>         return pipeline
>
> QueryExistingContentUnits(new_version=new_version) associates based on
> the "repo version key",
> QueryExistingContentUnits() associates globally based on the "pseudo
> natural key" (digest must be set to match at all)
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190109/e70543b5/attachment.htm>


More information about the Pulp-dev mailing list