[Pulp-dev] Integer IDs in Pulp 3

Thu May 24 15:52:15 UTC 2018

Agreed on performance. Doing some more Googling seems to have mixed
opinions on whether UUIDs performance is worse or not. If this is a
significant reason to switch, I agree we should test out the performance.

Regarding the disk size, I think using UUIDs is cumulative. Larger PKs mean
bigger index sizes, bigger FKs, etc. I agree that it’s probably not a major
concern but I wouldn’t say it’s trivial.

David

On Thu, May 24, 2018 at 11:27 AM, Sean Myers <sean.myers at redhat.com> wrote:

> Responses inline.
>
> On 05/23/2018 02:26 PM, David Davis wrote:
> > Before the release of Pulp 3.0 GA, I think it’s worth just checking in to
> > make sure we want to use UUIDs over integer based IDs. Changing from
> UUIDs
> > to ints would be a very easy change at this point  (1-2 lines of code)
> but
> > after GA ships, it would be hard if not impossible to switch.
> >
> > I think there are a number of reasons why we might want to consider
> integer
> > IDs:
> >
> > - Better performance all around for inserts[0], searches, indexing, etc
>
> I don't really care either way, but it's worth pointing out that UUIDs are
> integers (in the sense that the entire internet can be reduced to a single
> integer since it's all just bits). To the best of my knowledge they are
> equally
> performant to integers and stored in similar ways in Postgres.
>
> You linked a MySQL experiment, done using a version of MySQL that is
> nearly 10
> years old. If there are concerns about the performance of UUID PKs vs. int
> PKs
> in Pulp, we should compare apples to apples and profile Pulp using UUID
> PKs,
> profile Pulp using integer PKs, and then compare the two.
>
> In my small-scale testing (100,000 randomly generated content rows of a
> proto-RPM content model, 1000 repositories randomly related to each, no db
> funny
> business beyond enforced uniqueness constraints), there was either no
> difference, or what difference there was fell into the margin of error.
>
> > - Less storage required (4 bytes for int vs 16 byes for UUIDs)
>
> Well, okay...UUIDs are *huge* integers. But it's the length of an IPv6
> address
> vs. the length of an IPv4 address. While it's true that 4 < 16, both are
> still
> pretty small. Trivially so, I think.
>
> Without taking relations into account, a table with a million rows should
> be a
> little less than twelve mega(mebi)bytes larger. Even at scale, the size
> difference is negligible, especially when compared to the size on disk of
> the
> actual content you'd need to be storing that those million rows represent.
>
> > - Hrefs would be shorter (e.g. /pulp/api/v3/repositories/1/)
> > - In line with other apps like Katello
>
> I think these two are definitely worth considering, though.
>
> > There are some downsides to consider though:
> >
> > - Integer ids expose info like how many records there are
>
> This was the main intent, if I recall correctly. UUID PKs are not:
> - monotonically increasing
> - variably sized (string length, not bit length)
>
> So an objects PK doesn't give you any indication of how many other objects
> may
> be in the same collection, and while the Hrefs are long, for any given
> resource
> they will always be a predictable size.
>
> The major downside is really that they're a pain in the butt to type out
> when
> compared to int PKs, so if users are in a situation where they do have to
> type
> these things out, I think something has gone wrong.
>
> If users typing in PKs can't be avoided, UUIDs probably should be avoided.
> I
> recognize that this is effectively a restatement of "Hrefs would be
> shorter" in
> the context of how that impacts the user.
>
> > - Can’t support sharding or multiple dbs (are we ever going to need
> this?)
>
> A very good question. To the best of my recollection this was never stated
> as a
> hard requirement; it was only ever mentioned like it is here, as a
> potential
> positive side-effect of UUID keys. If collision-avoidance is not desired,
> and
> will certainly never be desired, then a normal integer field would likely
> be a
> less astonishing[0] user experience, and therefore a better user
> experience.
>
> [0]: https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180524/e1565d09/attachment.htm>