[Pulp-dev] 'id' versus 'pulp_id' on Content

Dana Walker dawalker at redhat.com
Fri May 25 23:39:37 UTC 2018


I'm basically -1 for the reasons Jeff enumerated but if he is ok with this,
I'm happy to go ahead with it.

[Jeff]:
> In classic relational modeling, using ID as the primary key is common
> practice.  Especially when ORMs are involved.  The "id" added by plugin
> writers is a natural key so naming it ID goes against convention.
>

This is echoed here, for further reading (though perhaps this article is
overly simplified for our needs) in the sections "Key Fields" and "Prefixes
and Suffixes (are bad)":
https://launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions/

Plugin writers could use a surrogate key instead (unless I am
misunderstanding your example on Errata, Brian):
http://www.vertabelo.com/blog/technical-articles/natural-and-surrogate-primary-keys

[Jeff]:
> Every field in base models used by plugins has potential for name
> collisions.  Where does it end?  Every column having a pulp_ or _ prefix?
>

Exactly.  How far should we obfuscate our code base to prevent namespace
collisions in derivative work?  As David already pointed out, if we want to
do this for id, then maybe also for created and for last_updated.  Will we
want to continue on to adding pulp or _ before everything in Pulp (no)?  It
seems unnecessary.  So are we sure it's needed in these three cases?

[Austin]:
> In Pulp 2, having id fields bit us really badly. The reason may have been
> specific to Mongoengine, but my understanding is that it is bad practice
> anyway.
>

Now, I will give the caveat that there's no "one true way" to the naming
conventions, there are proponents for pulp (app) prefix, _ prefix, or
leaving the id alone, and as we already saw with the yaml file issue
recently, sometimes it's better to break a convention than to create bigger
problems elsewhere.  If we really were burned by the id fields in Pulp 2,
that's a valid enough reason to change approaches in Pulp 3.

Can anyone elaborate on what these problems were?  Was it related to
MongoDB being non-relational and not something we're likely to run into on
relational PostgreSQL?

[David]:
> At the very least, it might be nice to document which field names are
> reserved on the Content model.
>

+1 on this.  We want to encourage plugin writers by making things easy for
them, but there are relatively few instances where they would need to do a
workaround here, so documentation may suffice.  Either way we go, we should
document reserved names clearly for plugin writers to be aware of potential
namespacing issues (you know? just document the heck out of everything :P).

And on that note...happy weekend folks!

--Dana


Dana Walker

Associate Software Engineer

Red Hat

<https://www.redhat.com>
<https://red.ht/sig>

On Fri, May 25, 2018 at 12:15 PM, Brian Bouterse <bbouters at redhat.com>
wrote:

> I wrote up a Redmine issue to make this change here:
> https://pulp.plan.io/issues/3704  Please look at it and groom it if it
> looks ready.
>
> @jortel I especially want to make sure you're comfortable with this change.
>
> If anyone is -1 on this change please reply saying so.
>
>
> On Thu, May 24, 2018 at 5:06 PM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>> +1 to using _id, _created, and _last_updated only on MasterModel. It
>> looks like leading underscores for field names are fine:
>> https://stackoverflow.com/a/25509372 That will resolve this issue. Also
>> everyone can still use .pk instead of ._id because Django automatically
>> makes a .pk attribute on every model that also acts as the primary key
>> regardless of the primary key's name. Also I don't think we have this issue
>> elsewhere, only because Content is so heavily subclassed so I don't think
>> we'll do leading _ to field names in other places.
>>
>> @jortel I don't understand what you wrote; maybe try explaining it some
>> more? The field "id" a plugin writer would define would contain the data
>> they are obligated to hold for that content type. They didn't select the
>> name "id"; it's the name the designers of that content type selected when
>> they modeled that content type. So the designers of Errata itself chose
>> Errata.id and what data should go inside there.
>>
>> On Wed, May 23, 2018 at 10:33 AM, Jeff Ortel <jortel at redhat.com> wrote:
>>
>>> In classic relational modeling, using ID as the primary key is common
>>> practice.  Especially when ORMs are involved.  The "id" added by plugin
>>> writers is a natural key so naming it ID goes against convention.  Every
>>> field in base models used by plugins has potential for name collisions.
>>> Where does it end?  Every column having a pulp_ or _ prefix?  Plugins
>>> create relatively few tables and it doesn't seem unreasonable for plugin
>>> writers to select other names to resolve naming conflicts.
>>>
>>>
>>> On 05/23/2018 07:50 AM, Brian Bouterse wrote:
>>>
>>> Currently the Content model [0] has 'id' as it's primary key which is
>>> inherited from MasterModel here [1]. By naming our pk 'id', we are
>>> preventing plugin writers from also using that field. That field name is
>>> common for content types. For example: both RPM and Nuget content also
>>> expect to use the 'id' field to store data about the content type itself
>>> (not Pulp's pk). We learned about the Nuget incompatibility at
>>> ConfigMgmgtCamp from a community member. I learned about this issue with
>>> RPM from @dalley.
>>>
>>> The only workaround a plugin writer has is to call their field 'rpm_id'
>>> or something like that. I don't see how it's unavoidable that this renaming
>>> won't be passed directly onto the user for things like filtering, creating
>>> units, etc. I think that is an undesirable outcome just so that the Pulp pk
>>> can be named 'id'.
>>>
>>> One option would be to rename 'id' to 'pulp_id' at the MasterModel. This
>>> is also somewhat ugly for Pulp developers, but it would be (a) crystal
>>> clear to the user in all cases and (b) allow Content writers to model their
>>> content types correctly.
>>>
>>> Another option would be to rename the pk for 'Content' specifically and
>>> not at the MasterModel level. I think that would create more confusion than
>>> benefit so I recommend doing it at the MasterModel level.
>>>
>>> What do you all think?
>>>
>>> [0]: https://github.com/pulp/pulp/blob/6f492ee8fac94b8562dc62d87e
>>> 6886869e052e7e/pulpcore/pulpcore/app/models/content.py#L106
>>> [1]: https://github.com/pulp/pulp/blob/d1dc089890f167617fe9917af0
>>> 87d5587708296b/pulpcore/pulpcore/app/models/base.py#L25
>>>
>>> -Brian
>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing listPulp-dev at redhat.comhttps://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20180525/578d14a4/attachment.htm>


More information about the Pulp-dev mailing list