<div dir="ltr">The changes to switch to UUIDs have been merged. I opened issues against all the Pulp 3 plugins I could think of to update their docs. There may be some other changes needed too though.<br clear="all"><div><div dir="ltr" class="m_8951786006495689380gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><br></div><div>David<br></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 6, 2019 at 9:18 AM David Davis <<a href="mailto:daviddavis@redhat.com" target="_blank">daviddavis@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Since there seems to be no objections to switching to UUIDs, I’d like to propose we merge the PRs[0][1] that will switch core to use UUID PKs tomorrow (in 24 hours). After that, we'll open redmine issues to update plugins to use UUIDs.<div><br></div><div>[0] <a href="https://github.com/pulp/pulpcore/pull/16" target="_blank">https://github.com/pulp/pulpcore/pull/16</a></div><div>[1] <a href="https://github.com/pulp/pulpcore-plugin/pull/69" target="_blank">https://github.com/pulp/pulpcore-plugin/pull/69</a><br clear="all"><div><div dir="ltr" class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><br></div><div>David<br></div></div></div></div></div></div></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 5, 2019 at 5:15 PM Jeff Ortel <<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
+1 to switching back to UUIDs for the reasons Brian gave.<br>
<br>
On 3/1/19 2:23 PM, Brian Bouterse wrote:<br>
<blockquote type="cite">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>I've finally gotten to read through the numbers and
this thread. It is a tradeoff but I am +1 for switching to
UUIDs. I focus on the PostgreSQL UUID vs int case because
that is our default database. I don't think too much about
how things perform on MariaDB because they can improve
their own performance to catch up to PostgreSQL which
regularly is performing better afaict. I agree with the
assessment of 30% ish slowdown in the large unit cases for
PostgreSQL. Still, I believe the advantages of switching
to UUIDs are worth it. Two main reasons stick out in my
mind.<br>
</div>
<div><br>
</div>
<div>1. Our core code and all plugin code will always be
compatible with common db backends even when using
bulk_create()<br>
</div>
<div>2. We get database sharding with postgresql which you
can only do with UUID pks. I was advised this years ago by
jcline.<br>
</div>
<div><br>
</div>
<div>Performance and compatibility are a pretty classic
trade-off. Overall I've found that initial releases launch
with less performance and improve (often significantly)
overtime. Consider the interpreter pypy (not pypi). It
started "roughly 2000x slower [at initial launch] than
CPython, to roughly 7x faster [now]" [0]. Launching Pulp
3.0 that is 30% slower in the worst-case but runs
everywhere with zero "db-behavior surprises" I think is
worth it. Also conversely, if we don't adopt UUIDs, how
will we address item 1 pre RC?</div>
<div><br>
</div>
<div>@dawalker for the "can we have both" option, we
probably can have some db-specific codepaths, but I don't
think doing an application wide PK type change as a
setting is feasible to support. The db specific codepaths
are one way performance improves over time. For the
initial release, to keep things simple I hope we don't
have conditional database codepaths (for now).</div>
<div><br>
</div>
<div>More discussion on this change is encouraged. Thanks
@dalley so much for all the detailed investigation!</div>
<div><br>
</div>
<div>[0]: <a href="https://morepypy.blogspot.com/2018/09/the-first-15-years-of-pypy.html" target="_blank">https://morepypy.blogspot.com/2018/09/the-first-15-years-of-pypy.html</a><br>
</div>
<div><br>
</div>
<div>Thank you,</div>
<div>Brian<br>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at 2:51 PM
Dana Walker <<a href="mailto:dawalker@redhat.com" target="_blank">dawalker@redhat.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div>As I brought up on irc, I don't know how feasible the
complications to maintenance would be going forward, but I
would prefer if we could use some sort of settings in
order to choose uuid or id based on MariaDB or
PostgreSQL. I want us to work everywhere, but I'm really
concerned at the impact to our users of a 30-40%
efficiency drop in speed and storage.</div>
<div><br>
</div>
<div>David wrote up a quick Proof of Concept after I brought
this up but wasn't necessarily advocating it himself. I
think Daniel and Dennis expressed some concerns. I'd like
to see more people discussing it here with
reasoning/examples on how doable something like this could
be?</div>
<div><br>
</div>
<div>If it's not on the table, I understand, but want to
make sure we've considered all reasonable options, and
that might not be a simple binary of either/or.</div>
<div><br>
</div>
<div>Thanks,</div>
<div><br>
</div>
<div>--Dana<br>
</div>
<div><br>
</div>
<div>
<div>
<div dir="ltr" class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688gmail-m_5851459133451292743gmail-m_-1256802369304774127m_-3039835796394319797gmail_signature">
<div dir="ltr">
<div>
<p style="font-weight:bold;margin:0px;padding:0px;font-size:14px;text-transform:uppercase"><span>Dana</span>
<span>Walker</span></p>
<p style="font-weight:normal;font-size:10px;margin:0px 0px 4px;text-transform:uppercase"><span>Associate
Software Engineer</span><span style="font-weight:normal;color:rgb(170,170,170);margin:0px"></span></p>
<p style="font-weight:normal;margin:0px;font-size:10px;color:rgb(153,153,153)"><a style="color:rgb(0,136,206);font-size:10px;margin:0px;text-decoration:none;font-family:overpass,sans-serif" href="https://www.redhat.com" target="_blank">Red Hat <span><br>
<br>
</span></a></p>
<table border="0">
<tbody>
<tr>
<td width="100px"><a href="https://red.ht/sig" target="_blank">
<img src="https://www.redhat.com/files/brand/email/sig-redhat.png" width="90" height="auto"></a> </td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at
9:15 AM David Davis <<a href="mailto:daviddavis@redhat.com" target="_blank">daviddavis@redhat.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div dir="ltr">I just want to bump this thread. If we hope
to make the Pulp 3 RC date, we need feedback today.<br clear="all">
<div>
<div dir="ltr" class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688gmail-m_5851459133451292743gmail-m_-1256802369304774127gmail-m_-3039835796394319797gmail-m_-1650898562000539570gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div><br>
</div>
<div>David<br>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Wed, Feb 27, 2019
at 5:09 PM Matt Pusateri <<a href="mailto:mpusater@redhat.com" target="_blank">mpusater@redhat.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">Not sure if <a href="https://www.webyog.com/" target="_blank">https://www.webyog.com/</a>
Monyog will give a free opensource project
license. But that might help diagnose the MariaDB
performance. Monyog is really nice, I wish it
supported Postgres.</div>
<div dir="ltr"><br>
</div>
<div>Matt P. <br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, Feb 26,
2019 at 7:23 PM Daniel Alley <<a href="mailto:dalley@redhat.com" target="_blank">dalley@redhat.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>Hello all,</div>
<div><br>
</div>
<div>We've had an ongoing discussion about
whether Pulp would be able to perform
acceptably if we switched back to UUID
primary keys. I've finished doing the
performance testing and I *think* the
answer is yes. Although to be honest,
I'm not sure that I understand why, in
the case of MariaDB.</div>
<div><br>
</div>
<div>I linked my testing methodology and
results here: <a href="https://pulp.plan.io/issues/4290#note-18" target="_blank">https://pulp.plan.io/issues/4290#note-18</a></div>
<div><br>
</div>
<div>To summarize, I tested the following:</div>
<div><br>
</div>
<div>* How long it takes to perform
subsequent large (lazy) syncs, with lots
of content in the database (100-400k
content units)<br>
</div>
<div>* How long it takes to perform
various small but important database
queries<br>
</div>
<div><br>
</div>
<div>The results were weirdly in contrast
in some cases.</div>
<div><br>
</div>
<div>The first four syncs (202,000 content
total) behaved mostly the same on
PostgreSQL whether it used an
autoincrement or UUID primary key.
Subsequent syncs had a performance drop
of between 30-40%. Likewise, the code
snippets performed 30+% worse. Sync
time scaled linearly"ish" with the amont
of content in the repository in both
cases, which was a bit surprising to
me. The size of the database at the end
was 30-40% larger with UUID primary
keys, 736 MB vs 521 MB. The gap would
be smaller in typical usage when you
consider that most content types have
more metadata than FileContent (what I
was testing).<br>
</div>
<div><br>
</div>
<div>Autoincrement PostgreSQL (left) vs.
UUID PostgreSQL (right) in diff form<br>
</div>
<div><a href="https://www.diffchecker.com/40AF8vvM" target="_blank">https://www.diffchecker.com/40AF8vvM</a></div>
<div><br>
</div>
<div>With MariaDB the first sync was
almost 80% slower than the first sync w/
PostgreSQL, but every subsequent sync
was as fast or faster, despite the tests
of specific queries performing multiple
times worse. Additionally the sync
performance did not decrease as rapidly
as it did under PostgreSQL. With
MariaDB, one of my test queries that
worked fine when backed by PostgreSQL
ended up hanging endlessly and I had to
cut it off after 25 or so minutes. [0]
I would consider that a blocker to
claiming we support MariaDB / MySQL.<br>
</div>
<div><br>
</div>
<div>But overall I'm not sure how to
interpret the fact that on one hand the
real-usage performance is equal or
better better, and on the performance of
some of the underlying queries is
noticably worse. Maybe there's some
weird caching going on in the backend,
or the generated indexes are different?<br>
</div>
<div><br>
</div>
<div>UUID PostgreSQL (left) vs. UUID
MariaDB (right) in diff form</div>
<div><a href="https://www.diffchecker.com/W1nnIQgj" target="_blank">https://www.diffchecker.com/W1nnIQgj</a></div>
<div><br>
</div>
<div>I'd like to invite some discussion on
this, but nothing I've mentioned seems
like it would be a problem for going
forwards with using UUID primary keys in
a general sense. If we're all in
agreement about that engineering
decision then we can move forwards with
that work.<br>
</div>
<div><br>
</div>
<div>[0] for *some* but not all repository
versions. No idea what's up there.<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
</div>
</div>
_______________________________________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
</blockquote>
</div>
_______________________________________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
</blockquote>
</div>
_______________________________________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
</blockquote>
</div>
_______________________________________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
</blockquote>
</div>
<br>
<fieldset class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688mimeAttachmentHeader"></fieldset>
<pre class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688moz-quote-pre">_______________________________________________
Pulp-dev mailing list
<a class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a>
<a class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a>
</pre>
</blockquote>
<br>
</div>
_______________________________________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
</blockquote></div>
</blockquote></div>