[Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys
daviddavis at redhat.com
Fri Mar 1 14:14:36 UTC 2019
I just want to bump this thread. If we hope to make the Pulp 3 RC date, we
need feedback today.
On Wed, Feb 27, 2019 at 5:09 PM Matt Pusateri <mpusater at redhat.com> wrote:
> Not sure if https://www.webyog.com/ Monyog will give a free opensource
> project license. But that might help diagnose the MariaDB performance.
> Monyog is really nice, I wish it supported Postgres.
> Matt P.
> On Tue, Feb 26, 2019 at 7:23 PM Daniel Alley <dalley at redhat.com> wrote:
>> Hello all,
>> We've had an ongoing discussion about whether Pulp would be able to
>> perform acceptably if we switched back to UUID primary keys. I've finished
>> doing the performance testing and I *think* the answer is yes. Although to
>> be honest, I'm not sure that I understand why, in the case of MariaDB.
>> I linked my testing methodology and results here:
>> To summarize, I tested the following:
>> * How long it takes to perform subsequent large (lazy) syncs, with lots
>> of content in the database (100-400k content units)
>> * How long it takes to perform various small but important database
>> The results were weirdly in contrast in some cases.
>> The first four syncs (202,000 content total) behaved mostly the same on
>> PostgreSQL whether it used an autoincrement or UUID primary key.
>> Subsequent syncs had a performance drop of between 30-40%. Likewise, the
>> code snippets performed 30+% worse. Sync time scaled linearly"ish" with
>> the amont of content in the repository in both cases, which was a bit
>> surprising to me. The size of the database at the end was 30-40% larger
>> with UUID primary keys, 736 MB vs 521 MB. The gap would be smaller in
>> typical usage when you consider that most content types have more metadata
>> than FileContent (what I was testing).
>> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form
>> With MariaDB the first sync was almost 80% slower than the first sync w/
>> PostgreSQL, but every subsequent sync was as fast or faster, despite the
>> tests of specific queries performing multiple times worse. Additionally
>> the sync performance did not decrease as rapidly as it did under
>> PostgreSQL. With MariaDB, one of my test queries that worked fine when
>> backed by PostgreSQL ended up hanging endlessly and I had to cut it off
>> after 25 or so minutes.  I would consider that a blocker to claiming we
>> support MariaDB / MySQL.
>> But overall I'm not sure how to interpret the fact that on one hand the
>> real-usage performance is equal or better better, and on the performance of
>> some of the underlying queries is noticably worse. Maybe there's some
>> weird caching going on in the backend, or the generated indexes are
>> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form
>> I'd like to invite some discussion on this, but nothing I've mentioned
>> seems like it would be a problem for going forwards with using UUID primary
>> keys in a general sense. If we're all in agreement about that engineering
>> decision then we can move forwards with that work.
>>  for *some* but not all repository versions. No idea what's up there.
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
> Pulp-dev mailing list
> Pulp-dev at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pulp-dev