[Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys

David Davis daviddavis at redhat.com
Fri Mar 1 14:14:36 UTC 2019

I just want to bump this thread. If we hope to make the Pulp 3 RC date, we
need feedback today.


On Wed, Feb 27, 2019 at 5:09 PM Matt Pusateri <mpusater at redhat.com> wrote:

> Not sure if https://www.webyog.com/ Monyog will give a free opensource
> project license.  But that might help diagnose the MariaDB performance.
> Monyog is really nice, I wish it supported Postgres.
> Matt P.
> On Tue, Feb 26, 2019 at 7:23 PM Daniel Alley <dalley at redhat.com> wrote:
>> Hello all,
>> We've had an ongoing discussion about whether Pulp would be able to
>> perform acceptably if we switched back to UUID primary keys.  I've finished
>> doing the performance testing and I *think* the answer is yes.  Although to
>> be honest, I'm not sure that I understand why, in the case of MariaDB.
>> I linked my testing methodology and results here:
>> https://pulp.plan.io/issues/4290#note-18
>> To summarize, I tested the following:
>> * How long it takes to perform subsequent large (lazy) syncs, with lots
>> of content in the database (100-400k content units)
>> * How long it takes to perform various small but important database
>> queries
>> The results were weirdly in contrast in some cases.
>> The first four syncs (202,000 content total) behaved mostly the same on
>> PostgreSQL whether it used an autoincrement or UUID primary key.
>> Subsequent syncs had a performance drop of between 30-40%.  Likewise, the
>> code snippets performed 30+% worse.  Sync time scaled linearly"ish" with
>> the amont of content in the repository in both cases, which was a bit
>> surprising to me.  The size of the database at the end was 30-40% larger
>> with UUID primary keys, 736 MB vs 521 MB.  The gap would be smaller in
>> typical usage when you consider that most content types have more metadata
>> than FileContent (what I was testing).
>> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form
>> https://www.diffchecker.com/40AF8vvM
>> With MariaDB the first sync was almost 80% slower than the first sync w/
>> PostgreSQL, but every subsequent sync was as fast or faster, despite the
>> tests of specific queries performing multiple times worse.  Additionally
>> the sync performance did not decrease as rapidly as it did under
>> PostgreSQL.  With MariaDB, one of my test queries that worked fine when
>> backed by PostgreSQL ended up hanging endlessly and I had to cut it off
>> after 25 or so minutes. [0]  I would consider that a blocker to claiming we
>> support MariaDB / MySQL.
>> But overall I'm not sure how to interpret the fact that on one hand the
>> real-usage performance is equal or better better, and on the performance of
>> some of the underlying queries is noticably worse.  Maybe there's some
>> weird caching going on in the backend, or the generated indexes are
>> different?
>> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form
>> https://www.diffchecker.com/W1nnIQgj
>> I'd like to invite some discussion on this, but nothing I've mentioned
>> seems like it would be a problem for going forwards with using UUID primary
>> keys in a general sense.  If we're all in agreement about that engineering
>> decision then we can move forwards with that work.
>> [0] for *some* but not all repository versions.  No idea what's up there.
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190301/e43142d4/attachment.htm>

More information about the Pulp-dev mailing list