<div dir="ltr"><div></div><div>Yes, I used the "started_at" and "finished_at" timestamps. And there's definitely things we can do to speed up sync times since they dropped by a sizable amount since the last time I did this testing. I'm not sure where that slowdown could have come from but I'm sure we can figure it out.<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 27, 2019 at 4:10 AM David Davis <<a href="mailto:daviddavis@redhat.com">daviddavis@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Daniel,<div><br></div><div>Thanks for the work on this. I'm wondering where you got the times from. The task timestamps? I'm asking because when you say 30-40% slow down, I am wondering if that's the overall time it takes to sync or if that's just part of the sync. I think it's the former which I do find a bit troubling. That said, I think I agree with your conclusion that we should probably switch to UUIDs anyway. Perhaps we can find other ways to speed up sync times.<br clear="all"><div><div dir="ltr" class="gmail-m_-6456657613207418953gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><br></div><div>David<br></div></div></div></div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 27, 2019 at 1:23 AM Daniel Alley <<a href="mailto:dalley@redhat.com" target="_blank">dalley@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div>Hello all,</div><div><br></div><div>We've had an ongoing discussion about whether Pulp would be able to perform acceptably if we switched back to UUID primary keys. I've finished doing the performance testing and I *think* the answer is yes. Although to be honest, I'm not sure that I understand why, in the case of MariaDB.</div><div><br></div><div>I linked my testing methodology and results here: <a href="https://pulp.plan.io/issues/4290#note-18" target="_blank">https://pulp.plan.io/issues/4290#note-18</a></div><div><br></div><div>To summarize, I tested the following:</div><div><br></div><div>* How long it takes to perform subsequent large (lazy) syncs, with lots of content in the database (100-400k content units)<br></div><div>* How long it takes to perform various small but important database queries<br></div><div><br></div><div>The results were weirdly in contrast in some cases.</div><div><br></div><div>The first four syncs (202,000 content total) behaved mostly the same on PostgreSQL whether it used an autoincrement or UUID primary key. Subsequent syncs had a performance drop of between 30-40%. Likewise, the code snippets performed 30+% worse. Sync time scaled linearly"ish" with the amont of content in the repository in both cases, which was a bit surprising to me. The size of the database at the end was 30-40% larger with UUID primary keys, 736 MB vs 521 MB. The gap would be smaller in typical usage when you consider that most content types have more metadata than FileContent (what I was testing).<br></div><div><br></div><div>Autoincrement PostgreSQL (left) vs. UUID PostgreSQL (right) in diff form<br></div><div><a href="https://www.diffchecker.com/40AF8vvM" target="_blank">https://www.diffchecker.com/40AF8vvM</a></div><div><br></div><div>With MariaDB the first sync was almost 80% slower than the first sync w/ PostgreSQL, but every subsequent sync was as fast or faster, despite the tests of specific queries performing multiple times worse. Additionally the sync performance did not decrease as rapidly as it did under PostgreSQL. With MariaDB, one of my test queries that worked fine when backed by PostgreSQL ended up hanging endlessly and I had to cut it off after 25 or so minutes. [0] I would consider that a blocker to claiming we support MariaDB / MySQL.<br></div><div><br></div><div>But overall I'm not sure how to interpret the fact that on one hand the real-usage performance is equal or better better, and on the performance of some of the underlying queries is noticably worse. Maybe there's some weird caching going on in the backend, or the generated indexes are different?<br></div><div><br></div><div>UUID PostgreSQL (left) vs. UUID MariaDB (right) in diff form</div><div><a href="https://www.diffchecker.com/W1nnIQgj" target="_blank">https://www.diffchecker.com/W1nnIQgj</a></div><div><br></div><div>I'd like to invite some discussion on this, but nothing I've mentioned seems like it would be a problem for going forwards with using UUID primary keys in a general sense. If we're all in agreement about that engineering decision then we can move forwards with that work.<br></div><div><br></div><div>[0] for *some* but not all repository versions. No idea what's up there.<br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div></div></div></div> _______________________________________________<br> Pulp-dev mailing list<br> <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br> </blockquote></div> </blockquote></div>