[Pulp-dev] 'id' versus 'pulp_id' on Content

Brian Bouterse bbouters at redhat.com
Tue May 29 13:24:47 UTC 2018


We can't go forward if there are -1s, especially from core devs. That is
unfortunate because in the roughly 4-5 plugins being developed, 2 have
raised concerns that they can't model their content as they expect to. I
don't believe documentation is a viable resolution (see inline), so can the
blocking votes talk about what they would like to do instead?

On Fri, May 25, 2018 at 7:39 PM, Dana Walker <dawalker at redhat.com> wrote:

> I'm basically -1 for the reasons Jeff enumerated but if he is ok with
> this, I'm happy to go ahead with it.
>
> [Jeff]:
>> In classic relational modeling, using ID as the primary key is common
>> practice.  Especially when ORMs are involved.  The "id" added by plugin
>> writers is a natural key so naming it ID goes against convention.
>>
>
> This is echoed here, for further reading (though perhaps this article is
> overly simplified for our needs) in the sections "Key Fields" and "Prefixes
> and Suffixes (are bad)":
> https://launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions/
>

That is true, but this article also talks about avoiding reserved words as
well. I think we're hearing 'id' is a commonly reserved word for content
types being modeled by plugin writers.


> Plugin writers could use a surrogate key instead (unless I am
> misunderstanding your example on Errata, Brian):
> http://www.vertabelo.com/blog/technical-articles/natural-
> and-surrogate-primary-keys
>

I'm not seeing how this is related. This article questions natural vs
surrogate primary keys, but we've already adopted surrogate keys. We know
the primary key is a UUID (not a natural key) and it will never change. The
question I see here is different: what do we name the surrogate pk?


>
>
[Jeff]:
>> Every field in base models used by plugins has potential for name
>> collisions.  Where does it end?  Every column having a pulp_ or _ prefix?
>>
>
> Exactly.  How far should we obfuscate our code base to prevent namespace
> collisions in derivative work?  As David already pointed out, if we want to
> do this for id, then maybe also for created and for last_updated.  Will we
> want to continue on to adding pulp or _ before everything in Pulp (no)?  It
> seems unnecessary.  So are we sure it's needed in these three cases?
>

It feels like a slippery slope, but it's scoped. We probably define 50 -
100 fields across all of our models. This change is being proposed for 3
only. The reason it's right for these three is that these fields inherit
onto the Content model, and that is the model plugin writers subclass and
define lots and lots of fields on. The other areas of the plugin API either
aren't subclassed, or not subclassed with a large numbers of fields being
defined.


>
> [Austin]:
>> In Pulp 2, having id fields bit us really badly. The reason may have been
>> specific to Mongoengine, but my understanding is that it is bad practice
>> anyway.
>>
>
> Now, I will give the caveat that there's no "one true way" to the naming
> conventions, there are proponents for pulp (app) prefix, _ prefix, or
> leaving the id alone, and as we already saw with the yaml file issue
> recently, sometimes it's better to break a convention than to create bigger
> problems elsewhere.  If we really were burned by the id fields in Pulp 2,
> that's a valid enough reason to change approaches in Pulp 3.
>
> Can anyone elaborate on what these problems were?  Was it related to
> MongoDB being non-relational and not something we're likely to run into on
> relational PostgreSQL?
>
> [David]:
>> At the very least, it might be nice to document which field names are
>> reserved on the Content model.
>>
>
> +1 on this.  We want to encourage plugin writers by making things easy for
> them, but there are relatively few instances where they would need to do a
> workaround here, so documentation may suffice.  Either way we go, we should
> document reserved names clearly for plugin writers to be aware of potential
> namespacing issues (you know? just document the heck out of everything :P).
>

I don't think documentation is a viable resolution here. Both of the plugin
writers I've seen experience this, knew right away what the problem was.
When encountered, what they expressed is "Why would Pulp reserve common
field that I need to define on my subclass?" My feeling here is that we
don't actually have a good reason.


> And on that note...happy weekend folks!
>
> --Dana
>
>
> Dana Walker
>
> Associate Software Engineer
>
> Red Hat
>
> <https://www.redhat.com>
> <https://red.ht/sig>
>
> On Fri, May 25, 2018 at 12:15 PM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>> I wrote up a Redmine issue to make this change here:
>> https://pulp.plan.io/issues/3704  Please look at it and groom it if it
>> looks ready.
>>
>> @jortel I especially want to make sure you're comfortable with this
>> change.
>>
>> If anyone is -1 on this change please reply saying so.
>>
>>
>> On Thu, May 24, 2018 at 5:06 PM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> +1 to using _id, _created, and _last_updated only on MasterModel. It
>>> looks like leading underscores for field names are fine:
>>> https://stackoverflow.com/a/25509372 That will resolve this issue. Also
>>> everyone can still use .pk instead of ._id because Django automatically
>>> makes a .pk attribute on every model that also acts as the primary key
>>> regardless of the primary key's name. Also I don't think we have this issue
>>> elsewhere, only because Content is so heavily subclassed so I don't think
>>> we'll do leading _ to field names in other places.
>>>
>>> @jortel I don't understand what you wrote; maybe try explaining it some
>>> more? The field "id" a plugin writer would define would contain the data
>>> they are obligated to hold for that content type. They didn't select the
>>> name "id"; it's the name the designers of that content type selected when
>>> they modeled that content type. So the designers of Errata itself chose
>>> Errata.id and what data should go inside there.
>>>
>>> On Wed, May 23, 2018 at 10:33 AM, Jeff Ortel <jortel at redhat.com> wrote:
>>>
>>>> In classic relational modeling, using ID as the primary key is common
>>>> practice.  Especially when ORMs are involved.  The "id" added by plugin
>>>> writers is a natural key so naming it ID goes against convention.  Every
>>>> field in base models used by plugins has potential for name collisions.
>>>> Where does it end?  Every column having a pulp_ or _ prefix?  Plugins
>>>> create relatively few tables and it doesn't seem unreasonable for plugin
>>>> writers to select other names to resolve naming conflicts.
>>>>
>>>>
>>>> On 05/23/2018 07:50 AM, Brian Bouterse wrote:
>>>>
>>>> Currently the Content model [0] has 'id' as it's primary key which is
>>>> inherited from MasterModel here [1]. By naming our pk 'id', we are
>>>> preventing plugin writers from also using that field. That field name is
>>>> common for content types. For example: both RPM and Nuget content also
>>>> expect to use the 'id' field to store data about the content type itself
>>>> (not Pulp's pk). We learned about the Nuget incompatibility at
>>>> ConfigMgmgtCamp from a community member. I learned about this issue with
>>>> RPM from @dalley.
>>>>
>>>> The only workaround a plugin writer has is to call their field 'rpm_id'
>>>> or something like that. I don't see how it's unavoidable that this renaming
>>>> won't be passed directly onto the user for things like filtering, creating
>>>> units, etc. I think that is an undesirable outcome just so that the Pulp pk
>>>> can be named 'id'.
>>>>
>>>> One option would be to rename 'id' to 'pulp_id' at the MasterModel.
>>>> This is also somewhat ugly for Pulp developers, but it would be (a) crystal
>>>> clear to the user in all cases and (b) allow Content writers to model their
>>>> content types correctly.
>>>>
>>>> Another option would be to rename the pk for 'Content' specifically and
>>>> not at the MasterModel level. I think that would create more confusion than
>>>> benefit so I recommend doing it at the MasterModel level.
>>>>
>>>> What do you all think?
>>>>
>>>> [0]: https://github.com/pulp/pulp/blob/6f492ee8fac94b8562dc62d87e
>>>> 6886869e052e7e/pulpcore/pulpcore/app/models/content.py#L106
>>>> [1]: https://github.com/pulp/pulp/blob/d1dc089890f167617fe9917af0
>>>> 87d5587708296b/pulpcore/pulpcore/app/models/base.py#L25
>>>>
>>>> -Brian
>>>>
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing listPulp-dev at redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180529/f84739ed/attachment.htm>


More information about the Pulp-dev mailing list