<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    The PK for a task record in the db does not need to be the same as
    the job ID in rq/redis.  Consistency is good.  Let's make the
    Task.id (int like the rest of the tables) and add a job_id to
    correlate with rq/redis.<br>
    <br>
    <div class="moz-cite-prefix">On 07/11/2018 03:20 PM, David Davis
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAHa=2W=1UMfkNYB0s=DKiA_AUodkrW=LGapBtM7FN=xxxy0vJw@mail.gmail.com">
      <div dir="ltr">I actually started working on converting IDs from
        UUIDs to integer IDs. It was pretty easy with one exception.
        Jobs in rq/redis are created using task id[0] and this job id
        needs to be a uuid. I see two possible solutions:
        <div><br>
        </div>
        <div>1. We leave task id as a UUID but every other id is an
          integer</div>
        <div>2. We add a job uuid field on task<br>
          <div><br>
            <div>With the hard numbers that show that integer IDs are
              significantly faster, I think we should proceed unless
              anyone has a major objection.</div>
            <div><br>
            </div>
            <div>Great work on this btw.</div>
            <div><br>
            </div>
            <div>[0] <a
href="https://github.com/pulp/pulp/blob/9bfc50d90a24c9d0ac4a93f5718187515b947058/pulpcore/pulpcore/tasking/tasks.py#L187"
                target="_blank" moz-do-not-send="true">https://github.com/pulp/pulp/blob/9bfc50d90a24c9d0ac4a93f5718187515b947058/pulpcore/pulpcore/tasking/tasks.py#L187</a></div>
          </div>
        </div>
        <div>
          <div dir="ltr" class="gmail_signature">
            <div dir="ltr">
              <div>
                <div dir="ltr">
                  <div>
                    <div dir="ltr">
                      <div><br>
                      </div>
                      <div>David<br>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
        <br>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr">On Wed, Jul 11, 2018 at 3:56 PM Daniel Alley <<a
            href="mailto:dalley@redhat.com" moz-do-not-send="true">dalley@redhat.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div dir="ltr">
            <div>w/ creating 400,000 units, the non-uuid PK is 30%
              faster at 42.22 seconds vs. 55.98 seconds.</div>
            <div><br>
            </div>
            <div>w/ searching through the same 400,000 units,
              performance is still about 30% faster.  Doing a filter for
              file content units that have a
              relative_path__startswith={some random letter} (I put
              UUIDs in all the fields) takes about 0.44 seconds if the
              model has a UUID pk and about 0.33 seconds if the model
              has a default Django auto-incrementing PK.</div>
          </div>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Wed, Jul 11, 2018 at 11:03 AM,
              Daniel Alley <span dir="ltr"><<a
                  href="mailto:dalley@redhat.com" target="_blank"
                  moz-do-not-send="true">dalley@redhat.com</a>></span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div dir="ltr">
                  <div>So, since I've already been working on some Pulp
                    3 benchmarking I decided to go ahead and benchmark
                    this to get some actual data.</div>
                  <div><br>
                  </div>
                  <div>Disclaimer:  The following data is using
                    bulk_create() with a modified, flat, non-inheriting
                    content model, not the current multi-table inherited
                    content model we're currently using.  It's also
                    using bulk_create() which we are not currently using
                    in Pulp 3, but likely will end up using eventually.<br>
                  </div>
                  <div><br>
                  </div>
                  <div>Using normal IDs instead of UUIDs was between 13%
                    and 25% faster with 15,000 units.  15,000 units
                    isn't really a sufficient value to actually test
                    index performance, so I'm rerunning it with a few
                    hundred thousand units, but that will take a
                    substantial amount of time to run.  I'll follow up
                    later.<br>
                  </div>
                  <div><br>
                  </div>
                  <div>As far as search/update performance goes, that
                    probably has better margins than just insert
                    performance, but I'll need to write new code to
                    benchmark that properly.<br>
                  </div>
                </div>
                <div class="m_-3769914636103084512HOEnZb">
                  <div class="m_-3769914636103084512h5">
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Thu, May 24, 2018 at
                        11:52 AM, David Davis <span dir="ltr"><<a
                            href="mailto:daviddavis@redhat.com"
                            target="_blank" moz-do-not-send="true">daviddavis@redhat.com</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div dir="ltr">Agreed on performance. Doing
                            some more Googling seems to have mixed
                            opinions on whether UUIDs performance is
                            worse or not. If this is a significant
                            reason to switch, I agree we should test out
                            the performance.<br>
                            <div><br>
                            </div>
                            <div>Regarding the disk size, I think using
                              UUIDs is cumulative. Larger PKs mean
                              bigger index sizes, bigger FKs, etc. I
                              agree that it’s probably not a major
                              concern but I wouldn’t say it’s trivial.</div>
                            <div class="gmail_extra"><span
                                class="m_-3769914636103084512m_8458828713642419313HOEnZb"><font
                                  color="#888888">
                                  <div>
                                    <div
class="m_-3769914636103084512m_8458828713642419313m_5408261673236411045m_7778541513043329500gmail_signature"
                                      data-smartmail="gmail_signature">
                                      <div dir="ltr">
                                        <div>
                                          <div dir="ltr">
                                            <div>
                                              <div dir="ltr">
                                                <div><br>
                                                </div>
                                                <div>David<br>
                                                </div>
                                              </div>
                                            </div>
                                          </div>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                  <br>
                                </font></span>
                              <div class="gmail_quote">
                                <div>
                                  <div
                                    class="m_-3769914636103084512m_8458828713642419313h5">On
                                    Thu, May 24, 2018 at 11:27 AM, Sean
                                    Myers <span dir="ltr"><<a
                                        href="mailto:sean.myers@redhat.com"
                                        target="_blank"
                                        moz-do-not-send="true">sean.myers@redhat.com</a>></span>
                                    wrote:<br>
                                  </div>
                                </div>
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  <div>
                                    <div
                                      class="m_-3769914636103084512m_8458828713642419313h5">Responses
                                      inline.<br>
                                      <span><br>
                                        On 05/23/2018 02:26 PM, David
                                        Davis wrote:<br>
                                        > Before the release of Pulp
                                        3.0 GA, I think it’s worth just
                                        checking in to<br>
                                        > make sure we want to use
                                        UUIDs over integer based IDs.
                                        Changing from UUIDs<br>
                                        > to ints would be a very
                                        easy change at this point  (1-2
                                        lines of code) but<br>
                                        > after GA ships, it would be
                                        hard if not impossible to
                                        switch.<br>
                                        > <br>
                                        > I think there are a number
                                        of reasons why we might want to
                                        consider integer<br>
                                        > IDs:<br>
                                        > <br>
                                        > - Better performance all
                                        around for inserts[0], searches,
                                        indexing, etc<br>
                                        <br>
                                      </span>I don't really care either
                                      way, but it's worth pointing out
                                      that UUIDs are<br>
                                      integers (in the sense that the
                                      entire internet can be reduced to
                                      a single<br>
                                      integer since it's all just bits).
                                      To the best of my knowledge they
                                      are equally<br>
                                      performant to integers and stored
                                      in similar ways in Postgres.<br>
                                      <br>
                                      You linked a MySQL experiment,
                                      done using a version of MySQL that
                                      is nearly 10<br>
                                      years old. If there are concerns
                                      about the performance of UUID PKs
                                      vs. int PKs<br>
                                      in Pulp, we should compare apples
                                      to apples and profile Pulp using
                                      UUID PKs,<br>
                                      profile Pulp using integer PKs,
                                      and then compare the two.<br>
                                      <br>
                                      In my small-scale testing (100,000
                                      randomly generated content rows of
                                      a<br>
                                      proto-RPM content model, 1000
                                      repositories randomly related to
                                      each, no db funny<br>
                                      business beyond enforced
                                      uniqueness constraints), there was
                                      either no<br>
                                      difference, or what difference
                                      there was fell into the margin of
                                      error.<br>
                                      <span><br>
                                        > - Less storage required (4
                                        bytes for int vs 16 byes for
                                        UUIDs)<br>
                                        <br>
                                      </span>Well, okay...UUIDs are
                                      *huge* integers. But it's the
                                      length of an IPv6 address<br>
                                      vs. the length of an IPv4 address.
                                      While it's true that 4 < 16,
                                      both are still<br>
                                      pretty small. Trivially so, I
                                      think.<br>
                                      <br>
                                      Without taking relations into
                                      account, a table with a million
                                      rows should be a<br>
                                      little less than twelve
                                      mega(mebi)bytes larger. Even at
                                      scale, the size<br>
                                      difference is negligible,
                                      especially when compared to the
                                      size on disk of the<br>
                                      actual content you'd need to be
                                      storing that those million rows
                                      represent.<br>
                                      <span><br>
                                        > - Hrefs would be shorter
                                        (e.g.
                                        /pulp/api/v3/repositories/1/)<br>
                                        > - In line with other apps
                                        like Katello<br>
                                        <br>
                                      </span>I think these two are
                                      definitely worth considering,
                                      though.<br>
                                      <span><br>
                                        > There are some downsides to
                                        consider though:<br>
                                        > <br>
                                        > - Integer ids expose info
                                        like how many records there are<br>
                                        <br>
                                      </span>This was the main intent,
                                      if I recall correctly. UUID PKs
                                      are not:<br>
                                      - monotonically increasing<br>
                                      - variably sized (string length,
                                      not bit length)<br>
                                      <br>
                                      So an objects PK doesn't give you
                                      any indication of how many other
                                      objects may<br>
                                      be in the same collection, and
                                      while the Hrefs are long, for any
                                      given resource<br>
                                      they will always be a predictable
                                      size.<br>
                                      <br>
                                      The major downside is really that
                                      they're a pain in the butt to type
                                      out when<br>
                                      compared to int PKs, so if users
                                      are in a situation where they do
                                      have to type<br>
                                      these things out, I think
                                      something has gone wrong.<br>
                                      <br>
                                      If users typing in PKs can't be
                                      avoided, UUIDs probably should be
                                      avoided. I<br>
                                      recognize that this is effectively
                                      a restatement of "Hrefs would be
                                      shorter" in<br>
                                      the context of how that impacts
                                      the user.<br>
                                      <span><br>
                                        > - Can’t support sharding or
                                        multiple dbs (are we ever going
                                        to need this?)<br>
                                        <br>
                                      </span>A very good question. To
                                      the best of my recollection this
                                      was never stated as a<br>
                                      hard requirement; it was only ever
                                      mentioned like it is here, as a
                                      potential<br>
                                      positive side-effect of UUID keys.
                                      If collision-avoidance is not
                                      desired, and<br>
                                      will certainly never be desired,
                                      then a normal integer field would
                                      likely be a<br>
                                      less astonishing[0] user
                                      experience, and therefore a better
                                      user experience.<br>
                                      <br>
                                      [0]: <a
                                        href="https://en.wikipedia.org/wiki/Principle_of_least_astonishment"
                                        rel="noreferrer" target="_blank"
                                        moz-do-not-send="true">https://en.wikipedia.org/wiki/Principle_of_least_astonishment</a><br>
                                      <br>
                                      <br>
                                    </div>
                                  </div>
                                  <span>_______________________________________________<br>
                                    Pulp-dev mailing list<br>
                                    <a href="mailto:Pulp-dev@redhat.com"
                                      target="_blank"
                                      moz-do-not-send="true">Pulp-dev@redhat.com</a><br>
                                    <a
                                      href="https://www.redhat.com/mailman/listinfo/pulp-dev"
                                      rel="noreferrer" target="_blank"
                                      moz-do-not-send="true">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
                                    <br>
                                  </span></blockquote>
                              </div>
                              <br>
                            </div>
                          </div>
                          <br>
_______________________________________________<br>
                          Pulp-dev mailing list<br>
                          <a href="mailto:Pulp-dev@redhat.com"
                            target="_blank" moz-do-not-send="true">Pulp-dev@redhat.com</a><br>
                          <a
                            href="https://www.redhat.com/mailman/listinfo/pulp-dev"
                            rel="noreferrer" target="_blank"
                            moz-do-not-send="true">https://www.redhat.com/mailman/listinfo/pulp-dev</a><br>
                          <br>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </div>
                </div>
              </blockquote>
            </div>
            <br>
          </div>
        </blockquote>
      </div>
      <!--'"--><br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Pulp-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a>
<a class="moz-txt-link-freetext" href="https://www.redhat.com/mailman/listinfo/pulp-dev">https://www.redhat.com/mailman/listinfo/pulp-dev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>