[Pulp-dev] Concerns about bulk_create and PostgreSQL

Matthias Dellweg dellweg at atix.de
Wed Jan 9 10:20:09 UTC 2019


I also want to vote for solution 2.
Only thing i want to add is multi-tenancy capability. The repository of
one user should really not be (from a functionality view point)
affected by actions on another repository. Sharing the actual files for
performance is desireable, but sharing content units might be a problem
in any case.

On Tue, 8 Jan 2019 20:49:41 +0100
Simon Baatz <gmbnomis at gmail.com> wrote:

> On 08.01.2019 17:16, Jeff Ortel wrote:
> >
> >
> > On 1/3/19 1:28 PM, Simon Baatz wrote:  
> >> On Thu, Jan 03, 2019 at 01:02:57PM -0500, David Davis wrote:  
> >>>     I don't think that using integer ids with bulk_create and
> >>> supporting
> >>>     mysql/mariadb are necessarily mutually exclusive. I think
> >>> there might
> >>>     be a way to find the records created using bulk_create if we
> >>> know the
> >>>     natural key. It might be more performant than using UUIDs as
> >>> well.  
> >> This assumes that there is a natural key.  For content types with
> >> no digest information in the meta data, there may be a natural key
> >> for content within a repo version only, but no natural key for the
> >> overall content.  (If we want to support non-immediate modes for
> >> such content.  In immediate mode, a digest can be computed from the
> >> associated artifact(s)).  
> >
> > Can you give some examples of Content without a natural key?  
> 
> For example, the meta-data obtained for Cookbooks is "version" and
> "name" (the same seems to apply to Ruby Gems). With immediate sync
> policy, we can add a digest to each content unit as we know the digest
> of the associated artifact. Thus, the natural key is "version",
> "name", and "digest"
> 
> In "non-immediate mode", we only have "version" and "name" to work
> with during sync. Now, there is a trade-off (I think) and we have the
> following possibilities:
> 
> 1. Just pretend that "version" and "name" are unique. We have a
> natural key, but it may lead to the cross-repo effects that I
> described a while ago on the list.
> 2. Use "version" and "name" as natural key within a repo version, but
> not globally. In this scenario, it may turn out that two content units
> are actually the same after downloading.
> 
> I prefer option 2: Content sharing is not perfect, but as a user, I
> don't have to fear that repositories mix-up content that happens to
> have the same name and version.
> 
> There is also an extension of 2., which allows content sharing during
> sync for immediate mode. Define a "pseudo" natural key on global 
> content level: "version", "name" and "digest". "digest" may be null.
> Two content units are considered the same if they match in all three
> attributes and these attributes are not null. But even in immediate
> mode, the artifact will not be downloaded if "name" and "version" are
> already present in the repository version the sync is based on. A
> pipeline for this could look like:
> 
>     def pipeline_stages(self, new_version):
>         pipeline = [
>             self.first_stage,
>             QueryExistingContentUnits(new_version=new_version),
>             ExistingContentNeedsNoArtifacts()
>         ]
>         if self.download_artifacts:
>             pipeline.extend([ArtifactDownloader(), ArtifactSaver(),
>                              UpdateContentWithDownloadResult(),
> QueryExistingContentUnits()])
>         pipeline.extend([ContentUnitSaver()])
>         return pipeline
> 
> QueryExistingContentUnits(new_version=new_version) associates based on
> the "repo version key",
> QueryExistingContentUnits() associates globally based on the "pseudo
> natural key" (digest must be set to match at all)
> 
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190109/e4f5960b/attachment.sig>


More information about the Pulp-dev mailing list