[Pulp-dev] Concerns about bulk_create and PostgreSQL
dellweg at atix.de
Wed Jan 9 10:20:09 UTC 2019
I also want to vote for solution 2.
Only thing i want to add is multi-tenancy capability. The repository of
one user should really not be (from a functionality view point)
affected by actions on another repository. Sharing the actual files for
performance is desireable, but sharing content units might be a problem
in any case.
On Tue, 8 Jan 2019 20:49:41 +0100
Simon Baatz <gmbnomis at gmail.com> wrote:
> On 08.01.2019 17:16, Jeff Ortel wrote:
> > On 1/3/19 1:28 PM, Simon Baatz wrote:
> >> On Thu, Jan 03, 2019 at 01:02:57PM -0500, David Davis wrote:
> >>> I don't think that using integer ids with bulk_create and
> >>> supporting
> >>> mysql/mariadb are necessarily mutually exclusive. I think
> >>> there might
> >>> be a way to find the records created using bulk_create if we
> >>> know the
> >>> natural key. It might be more performant than using UUIDs as
> >>> well.
> >> This assumes that there is a natural key. For content types with
> >> no digest information in the meta data, there may be a natural key
> >> for content within a repo version only, but no natural key for the
> >> overall content. (If we want to support non-immediate modes for
> >> such content. In immediate mode, a digest can be computed from the
> >> associated artifact(s)).
> > Can you give some examples of Content without a natural key?
> For example, the meta-data obtained for Cookbooks is "version" and
> "name" (the same seems to apply to Ruby Gems). With immediate sync
> policy, we can add a digest to each content unit as we know the digest
> of the associated artifact. Thus, the natural key is "version",
> "name", and "digest"
> In "non-immediate mode", we only have "version" and "name" to work
> with during sync. Now, there is a trade-off (I think) and we have the
> following possibilities:
> 1. Just pretend that "version" and "name" are unique. We have a
> natural key, but it may lead to the cross-repo effects that I
> described a while ago on the list.
> 2. Use "version" and "name" as natural key within a repo version, but
> not globally. In this scenario, it may turn out that two content units
> are actually the same after downloading.
> I prefer option 2: Content sharing is not perfect, but as a user, I
> don't have to fear that repositories mix-up content that happens to
> have the same name and version.
> There is also an extension of 2., which allows content sharing during
> sync for immediate mode. Define a "pseudo" natural key on global
> content level: "version", "name" and "digest". "digest" may be null.
> Two content units are considered the same if they match in all three
> attributes and these attributes are not null. But even in immediate
> mode, the artifact will not be downloaded if "name" and "version" are
> already present in the repository version the sync is based on. A
> pipeline for this could look like:
> def pipeline_stages(self, new_version):
> pipeline = [
> if self.download_artifacts:
> pipeline.extend([ArtifactDownloader(), ArtifactSaver(),
> return pipeline
> QueryExistingContentUnits(new_version=new_version) associates based on
> the "repo version key",
> QueryExistingContentUnits() associates globally based on the "pseudo
> natural key" (digest must be set to match at all)
> Pulp-dev mailing list
> Pulp-dev at redhat.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 833 bytes
Desc: OpenPGP digital signature
More information about the Pulp-dev