<div dir="ltr">The changes to switch to UUIDs have been merged. I opened issues against all the Pulp 3 plugins I could think of to update their docs. There may be some other changes needed too though.<br clear="all"><div><div dir="ltr" class="m_8951786006495689380gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><br></div><div>David<br></div></div></div></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 6, 2019 at 9:18 AM David Davis <<a href="mailto:daviddavis@redhat.com" target="_blank">daviddavis@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr">Since there seems to be no objections to switching to UUIDs, I’d like to propose we merge the PRs[0][1] that will switch core to use UUID PKs tomorrow (in 24 hours). After that, we'll open redmine issues to update plugins to use UUIDs.<div><br></div><div>[0] <a href="https://github.com/pulp/pulpcore/pull/16" target="_blank">https://github.com/pulp/pulpcore/pull/16</a></div><div>[1] <a href="https://github.com/pulp/pulpcore-plugin/pull/69" target="_blank">https://github.com/pulp/pulpcore-plugin/pull/69</a><br clear="all"><div><div dir="ltr" class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><br></div><div>David<br></div></div></div></div></div></div></div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 5, 2019 at 5:15 PM Jeff Ortel <<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF">
    +1 to switching back to UUIDs for the reasons Brian gave.<br>
    <br>
    On 3/1/19 2:23 PM, Brian Bouterse wrote:<br>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div>I've finally gotten to read through the numbers and
              this thread. It is a tradeoff but I am +1 for switching to
              UUIDs. I focus on the PostgreSQL UUID vs int case because
              that is our default database. I don't think too much about
              how things perform on MariaDB because they can improve
              their own performance to catch up to PostgreSQL which
              regularly is performing better afaict. I agree with the
              assessment of 30% ish slowdown in the large unit cases for
              PostgreSQL. Still, I believe the advantages of switching
              to UUIDs are worth it. Two main reasons stick out in my
              mind.<br>
            </div>
            <div><br>
            </div>
            <div>1. Our core code and all plugin code will always be
              compatible with common db backends even when using
              bulk_create()<br>
            </div>
            <div>2. We get database sharding with postgresql which you
              can only do with UUID pks. I was advised this years ago by
              jcline.<br>
            </div>
            <div><br>
            </div>
            <div>Performance and compatibility are a pretty classic
              trade-off. Overall I've found that initial releases launch
              with less performance and improve (often significantly)
              overtime. Consider the interpreter pypy (not pypi).  It
              started "roughly 2000x slower [at initial launch] than
              CPython, to roughly 7x faster [now]" [0]. Launching Pulp
              3.0 that is 30% slower in the worst-case but runs
              everywhere with zero "db-behavior surprises" I think is
              worth it. Also conversely, if we don't adopt UUIDs, how
              will we address item 1 pre RC?</div>
            <div><br>
            </div>
            <div>@dawalker for the "can we have both" option, we
              probably can have some db-specific codepaths, but I don't
              think doing an application wide PK type change as a
              setting is feasible to support. The db specific codepaths
              are one way performance improves over time. For the
              initial release, to keep things simple I hope we don't
              have conditional database codepaths (for now).</div>
            <div><br>
            </div>
            <div>More discussion on this change is encouraged. Thanks
              @dalley so much for all the detailed investigation!</div>
            <div><br>
            </div>
            <div>[0]: <a href="https://morepypy.blogspot.com/2018/09/the-first-15-years-of-pypy.html" target="_blank">https://morepypy.blogspot.com/2018/09/the-first-15-years-of-pypy.html</a><br>
            </div>
            <div><br>
            </div>
            <div>Thank you,</div>
            <div>Brian<br>
            </div>
          </div>
        </div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at 2:51 PM
          Dana Walker <<a href="mailto:dawalker@redhat.com" target="_blank">dawalker@redhat.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
          <div dir="ltr">
            <div>As I brought up on irc, I don't know how feasible the
              complications to maintenance would be going forward, but I
              would prefer if we could use some sort of settings in
              order to choose uuid or id based on MariaDB or
              PostgreSQL.  I want us to work everywhere, but I'm really
              concerned at the impact to our users of a 30-40%
              efficiency drop in speed and storage.</div>
            <div><br>
            </div>
            <div>David wrote up a quick Proof of Concept after I brought
              this up but wasn't necessarily advocating it himself.  I
              think Daniel and Dennis expressed some concerns.  I'd like
              to see more people discussing it here with
              reasoning/examples on how doable something like this could
              be?</div>
            <div><br>
            </div>
            <div>If it's not on the table, I understand, but want to
              make sure we've considered all reasonable options, and
              that might not be a simple binary of either/or.</div>
            <div><br>
            </div>
            <div>Thanks,</div>
            <div><br>
            </div>
            <div>--Dana<br>
            </div>
            <div><br>
            </div>
            <div>
              <div>
                <div dir="ltr" class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688gmail-m_5851459133451292743gmail-m_-1256802369304774127m_-3039835796394319797gmail_signature">
                  <div dir="ltr">
                    <div>
                      <p style="font-weight:bold;margin:0px;padding:0px;font-size:14px;text-transform:uppercase"><span>Dana</span>
                        <span>Walker</span></p>
                      <p style="font-weight:normal;font-size:10px;margin:0px 0px 4px;text-transform:uppercase"><span>Associate
                          Software Engineer</span><span style="font-weight:normal;color:rgb(170,170,170);margin:0px"></span></p>
                      <p style="font-weight:normal;margin:0px;font-size:10px;color:rgb(153,153,153)"><a style="color:rgb(0,136,206);font-size:10px;margin:0px;text-decoration:none;font-family:overpass,sans-serif" href="https://www.redhat.com" target="_blank">Red Hat <span><br>
                            <br>
                          </span></a></p>
                      <table border="0">
                        <tbody>
                          <tr>
                            <td width="100px"><a href="https://red.ht/sig" target="_blank">
                                <img src="https://www.redhat.com/files/brand/email/sig-redhat.png" width="90" height="auto"></a> </td>
                          </tr>
                        </tbody>
                      </table>
                    </div>
                  </div>
                </div>
              </div>
              <br>
            </div>
          </div>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Fri, Mar 1, 2019 at
              9:15 AM David Davis <<a href="mailto:daviddavis@redhat.com" target="_blank">daviddavis@redhat.com</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
              <div dir="ltr">I just want to bump this thread. If we hope
                to make the Pulp 3 RC date, we need feedback today.<br clear="all">
                <div>
                  <div dir="ltr" class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688gmail-m_5851459133451292743gmail-m_-1256802369304774127gmail-m_-3039835796394319797gmail-m_-1650898562000539570gmail_signature">
                    <div dir="ltr">
                      <div>
                        <div dir="ltr">
                          <div>
                            <div dir="ltr">
                              <div><br>
                              </div>
                              <div>David<br>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
                <br>
              </div>
              <br>
              <div class="gmail_quote">
                <div dir="ltr" class="gmail_attr">On Wed, Feb 27, 2019
                  at 5:09 PM Matt Pusateri <<a href="mailto:mpusater@redhat.com" target="_blank">mpusater@redhat.com</a>>
                  wrote:<br>
                </div>
                <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                  <div dir="ltr">
                    <div dir="ltr">Not sure if <a href="https://www.webyog.com/" target="_blank">https://www.webyog.com/</a>
                      Monyog will give a free opensource project
                      license.  But that might help diagnose the MariaDB
                      performance.  Monyog is really nice, I wish it
                      supported Postgres.</div>
                    <div dir="ltr"><br>
                    </div>
                    <div>Matt P. <br>
                    </div>
                  </div>
                  <br>
                  <div class="gmail_quote">
                    <div dir="ltr" class="gmail_attr">On Tue, Feb 26,
                      2019 at 7:23 PM Daniel Alley <<a href="mailto:dalley@redhat.com" target="_blank">dalley@redhat.com</a>>
                      wrote:<br>
                    </div>
                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">
                      <div dir="ltr">
                        <div dir="ltr">
                          <div dir="ltr">
                            <div dir="ltr">
                              <div>Hello all,</div>
                              <div><br>
                              </div>
                              <div>We've had an ongoing discussion about
                                whether Pulp would be able to perform
                                acceptably if we switched back to UUID
                                primary keys.  I've finished doing the
                                performance testing and I *think* the
                                answer is yes.  Although to be honest,
                                I'm not sure that I understand why, in
                                the case of MariaDB.</div>
                              <div><br>
                              </div>
                              <div>I linked my testing methodology and
                                results here: <a href="https://pulp.plan.io/issues/4290#note-18" target="_blank">https://pulp.plan.io/issues/4290#note-18</a></div>
                              <div><br>
                              </div>
                              <div>To summarize, I tested the following:</div>
                              <div><br>
                              </div>
                              <div>* How long it takes to perform
                                subsequent large (lazy) syncs, with lots
                                of content in the database (100-400k
                                content units)<br>
                              </div>
                              <div>* How long it takes to perform
                                various small but important database
                                queries<br>
                              </div>
                              <div><br>
                              </div>
                              <div>The results were weirdly in contrast
                                in some cases.</div>
                              <div><br>
                              </div>
                              <div>The first four syncs (202,000 content
                                total) behaved mostly the same on
                                PostgreSQL whether it used an
                                autoincrement or UUID primary key. 
                                Subsequent syncs had a performance drop
                                of between 30-40%.  Likewise, the code
                                snippets performed 30+% worse.  Sync
                                time scaled linearly"ish" with the amont
                                of content in the repository in both
                                cases, which was a bit surprising to
                                me.  The size of the database at the end
                                was 30-40% larger with UUID primary
                                keys, 736 MB vs 521 MB.  The gap would
                                be smaller in typical usage when you
                                consider that most content types have
                                more metadata than FileContent (what I
                                was testing).<br>
                              </div>
                              <div><br>
                              </div>
                              <div>Autoincrement PostgreSQL (left) vs.
                                UUID PostgreSQL (right) in diff form<br>
                              </div>
                              <div><a href="https://www.diffchecker.com/40AF8vvM" target="_blank">https://www.diffchecker.com/40AF8vvM</a></div>
                              <div><br>
                              </div>
                              <div>With MariaDB the first sync was
                                almost 80% slower than the first sync w/
                                PostgreSQL, but every subsequent sync
                                was as fast or faster, despite the tests
                                of specific queries performing multiple
                                times worse.  Additionally the sync
                                performance did not decrease as rapidly
                                as it did under PostgreSQL.  With
                                MariaDB, one of my test queries that
                                worked fine when backed by PostgreSQL
                                ended up hanging endlessly and I had to
                                cut it off after 25 or so minutes. [0] 
                                I would consider that a blocker to
                                claiming we support MariaDB / MySQL.<br>
                              </div>
                              <div><br>
                              </div>
                              <div>But overall I'm not sure how to
                                interpret the fact that on one hand the
                                real-usage performance is equal or
                                better better, and on the performance of
                                some of the underlying queries is
                                noticably worse.  Maybe there's some
                                weird caching going on in the backend,
                                or the generated indexes are different?<br>
                              </div>
                              <div><br>
                              </div>
                              <div>UUID PostgreSQL (left) vs. UUID
                                MariaDB (right) in diff form</div>
                              <div><a href="https://www.diffchecker.com/W1nnIQgj" target="_blank">https://www.diffchecker.com/W1nnIQgj</a></div>
                              <div><br>
                              </div>
                              <div>I'd like to invite some discussion on
                                this, but nothing I've mentioned seems
                                like it would be a problem for going
                                forwards with using UUID primary keys in
                                a general sense.  If we're all in
                                agreement about that engineering
                                decision then we can move forwards with
                                that work.<br>
                              </div>
                              <div><br>
                              </div>
                              <div>[0] for *some* but not all repository
                                versions.  No idea what's up there.<br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                      _______________________________________________<br>
                      Pulp-dev mailing list<br>
                      <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
                      <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
                    </blockquote>
                  </div>
                  _______________________________________________<br>
                  Pulp-dev mailing list<br>
                  <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
                  <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
                </blockquote>
              </div>
              _______________________________________________<br>
              Pulp-dev mailing list<br>
              <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
              <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
            </blockquote>
          </div>
          _______________________________________________<br>
          Pulp-dev mailing list<br>
          <a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
          <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
        </blockquote>
      </div>
      <br>
      <fieldset class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688mimeAttachmentHeader"></fieldset>
      <pre class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688moz-quote-pre">_______________________________________________
Pulp-dev mailing list
<a class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a>
<a class="gmail-m_8951786006495689380gmail-m_7893141517108178299gmail-m_2498969762049501438gmail-m_7916255516304310688moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a>
</pre>
    </blockquote>
    <br>
  </div>

_______________________________________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com" target="_blank">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
</blockquote></div>
</blockquote></div>