[Pulp-dev] Concerns about bulk_create and PostgreSQL

Patrick Creech pcreech at redhat.com
Wed Dec 5 16:30:16 UTC 2018


On Wed, 2018-12-05 at 09:34 -0500, Daniel Alley wrote:
> Perhaps, but it's not a -1 so much as a call for more experimentation and testing.  I wouldn't feel comfortable saying
> Pulp is MySQL "compatible" if (if!) it was an order of magnitude slower than Pulp on Postgres, and we never found out
> about that because we never tested it... I think that kind of "compatibility" would be a net negative to Pulp.  So I
> just want us to make sure these bases are covered if we're going to make that claim.
> 
> s/it's chart chart/it's a pretty nasty chart

It shows the beginnings of an exponential curve of performance degredation as your record count grows.  yes, nasty.

> 
> On Wed, Dec 5, 2018 at 9:27 AM Dennis Kliban <dkliban at redhat.com> wrote:
> > It looks like the chart was generated using MySQL 5.0.45 which was released at least 10 years ago[0]. I don't think
> > we can rely on such old results. 
> > 
> > [0] https://en.wikipedia.org/wiki/MySQL#Milestones

https://mariadb.com/kb/en/library/guiduuid-performance/

MariaDB (the MySQL fork after oracle took over), still has concerns about guid/uuid performance.  The article here is
about 1-2 years old, and it spells out some of the still current-ish concerns with utilizing uuids/guids in the mysql
world.

I would argue that yes, you should at least pay attention to these old results, as far as a "this used to be a problem,
lets make sure it still isn't"

Most of these int vs uuid/guid concerns in the database world still apply today, and either side has nuanced tradeoffs
for each technology (postgresql, mysql/mariadb, oracle, sqlserver, etc...) that will affect wich choice is best.

FWIW, _generally_ the compromise solutions I've seen are where the actual pk is an int, and isn't exposed outside of the
db/model layer.  There is also an attached UUID/GUID that is code-generated that is treated as an object reference in
the api/code layers.  There are other solutions out there as well.

> > 
> > On Wed, Dec 5, 2018 at 9:18 AM Daniel Alley <dalley at redhat.com> wrote:
> > > I just want to point out that using UUID PKs works perfectly fine on PostgreSQL but is considered a Bad Idea™ on
> > > MySQL for performance reasons.  
> > > 
> > > http://kccoder.com/mysql/uuid-vs-int-insert-performance/
> > > 
> > > 
> > > 
> > > It's hard to notice at first, but the blue and red lines (representing integer PKs) are tracking near the bottom.
> > > 
> > > I did my testing with PostgreSQL, and I would completely agree that the tiny performance hit we noticed there
> > > would take a backseat to the functional benefits Brian is pointing out.  But if we really, truly want to be
> > > database agnostic, we should put more thought into this change (and others going forwards).
> > > 
> > > Another factor that makes this a more complicated decision is that the limitations on using bulk_create() with
> > > multi-table models are more of a "simplification" on the Django side than a fundamental limitation.  According to
> > > this comment [0] in the Django source code, and this issue [1] it's likely possible on PostgreSQL as-is, if we
> > > were willing to mess around inside the ORM a bit.  And it could be possible on MySQL also *if* we used UUID PKs. 
> > > And maybe the performance benefits of being able to use bulk_create() would override or reduce the performance
> > > downsides of using UUID with MySQL.  I don't know about that though... that's chart chart and without some
> > > experimentation this is all speculation.
> > > 
> > > TL;DR If we want to stay DB agnostic it needs to be worked into our decision making process and not be an
> > > afterthought

This ^.  While on the surface the orm layer helps provide you with the ability to say you're db agnostic, to truly have
the same experience across database technologies will require up front research and informed choices.

> > > 
> > > [0] https://github.com/django/django/blob/master/django/db/models/query.py#L438
> > > [1] https://code.djangoproject.com/ticket/28821
> > > 
> > > 




More information about the Pulp-dev mailing list