[Pulp-dev] Performance testing results, autoincrement ID vs UUID primary keys

David Davis daviddavis at redhat.com
Wed Feb 27 09:10:08 UTC 2019


Daniel,

Thanks for the work on this. I'm wondering where you got the times from.
The task timestamps? I'm asking because when you say 30-40% slow down, I am
wondering if that's the overall time it takes to sync or if that's just
part of the sync. I think it's the former which I do find a bit troubling.
That said, I think I agree with your conclusion that we should probably
switch to UUIDs anyway. Perhaps we can find other ways to speed up sync
times.

David


On Wed, Feb 27, 2019 at 1:23 AM Daniel Alley <dalley at redhat.com> wrote:

> Hello all,
>
> We've had an ongoing discussion about whether Pulp would be able to
> perform acceptably if we switched back to UUID primary keys.  I've finished
> doing the performance testing and I *think* the answer is yes.  Although to
> be honest, I'm not sure that I understand why, in the case of MariaDB.
>
> I linked my testing methodology and results here:
> https://pulp.plan.io/issues/4290#note-18
>
> To summarize, I tested the following:
>
> * How long it takes to perform subsequent large (lazy) syncs, with lots of
> content in the database (100-400k content units)
> * How long it takes to perform various small but important database queries
>
> The results were weirdly in contrast in some cases.
>
> The first four syncs (202,000 content total) behaved mostly the same on
> PostgreSQL whether it used an autoincrement or UUID primary key.
> Subsequent syncs had a performance drop of between 30-40%.  Likewise, the
> code snippets performed 30+% worse.  Sync time scaled linearly"ish" with
> the amont of content in the repository in both cases, which was a bit
> surprising to me.  The size of the database at the end was 30-40% larger
> with UUID primary keys, 736 MB vs 521 MB.  The gap would be smaller in
> typical usage when you consider that most content types have more metadata
> than FileContent (what I was testing).
>
> Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form
> https://www.diffchecker.com/40AF8vvM
>
> With MariaDB the first sync was almost 80% slower than the first sync w/
> PostgreSQL, but every subsequent sync was as fast or faster, despite the
> tests of specific queries performing multiple times worse.  Additionally
> the sync performance did not decrease as rapidly as it did under
> PostgreSQL.  With MariaDB, one of my test queries that worked fine when
> backed by PostgreSQL ended up hanging endlessly and I had to cut it off
> after 25 or so minutes. [0]  I would consider that a blocker to claiming we
> support MariaDB / MySQL.
>
> But overall I'm not sure how to interpret the fact that on one hand the
> real-usage performance is equal or better better, and on the performance of
> some of the underlying queries is noticably worse.  Maybe there's some
> weird caching going on in the backend, or the generated indexes are
> different?
>
> UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form
> https://www.diffchecker.com/W1nnIQgj
>
> I'd like to invite some discussion on this, but nothing I've mentioned
> seems like it would be a problem for going forwards with using UUID primary
> keys in a general sense.  If we're all in agreement about that engineering
> decision then we can move forwards with that work.
>
> [0] for *some* but not all repository versions.  No idea what's up there.
>
>
>
>
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20190227/be7bd4c8/attachment.htm>


More information about the Pulp-dev mailing list