[Pulp-dev] Concerns about bulk_create and PostgreSQL

Patrick Creech pcreech at redhat.com
Tue Nov 20 14:59:33 UTC 2018


On Mon, 2018-11-19 at 17:08 -0500, Brian Bouterse wrote:
> When we switched from UUID to integers for the PK
> with databases other than PostgreSQL [0].
> 
> With a goal of database agnosticism for Pulp3, if plugin writers plan to use bulk_create with any object inherited
> from one of ours, they can't will get different behaviors on different databases and they won't have PKs that they may
> require. bulk_create is a normal django thing, so plugin writers making a django plugin should be able to use it. This
> concerned me already, but today it was also brought up by non-RH plugin writers also [1] in a PR.
> 
> The tradeoffs bteween UUIDs versus PKs are pretty well summed up in our ticket where we discussed that change [2].
> Note, we did not consider this bulk_create downside at that time, which I think is the most significant downside to
> consider.
> 
> Having bulk_create effectively not available for plugin writers (since we can't rely on its pks being returned) I
> think is a non-starter for me. I love how short the UUIDs made our URLs so that's the tradeoff mainly in my mind.
> Those balanced against each other, I think we should switch back.
> 
> Another option is to become PostgreSQL only which (though I love psql) I think would be the wrong choice for Pulp from
> what I've heard from its users.
> 
> What do you think? What should we do?

So, my mind immediately goes to this question, which might be usefull for others to help make decisions, so I'll ask:

When you say: 

"we lost the ability to have the primary key set during bulk_create"

Can you clarify what you mean by this?

My mind immediately goes to this chain of events:

	When you use bulk_create, the existing in-memory model objects representing the data to create do not get
updated with the primary key values that are created in the database.  

	Upon a subsequent query of the database, for the exact same set of objects just added, those objects _will_ have
the primary key populated.

In other words, 

	The database records themselves get the auto-increment IDs added, they just don't get reported back in that
query to the ORM layer, therefore it takes a subsequent query to get those ids out.

Does that about sum it up?


> 
> [0]: https://docs.djangoproject.com/en/2.1/ref/models/querysets/#bulk-create
> [1]: https://github.com/pulp/pulp/pull/3764#discussion_r234780702
> [2]: https://pulp.plan.io/issues/3848
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev




More information about the Pulp-dev mailing list