[Pulp-dev] Partially constructed data in the DB

Jeff Ortel jortel at redhat.com
Thu Dec 14 21:39:33 UTC 2017

On 12/14/2017 12:55 PM, Brian Bouterse wrote:
> The behavior brings me back to an attribute name like 'user_visible' 
> and it would default to False. Thus you have to explicitly take the 
> step to make it user visible. Whatever the name, I think this would 
> this apply to both RepoVersion and Publication objects. Plugin writers 
> who produce these objects also need docs that identify they need to 
> set user_visible=True.

Agreed, except for field name.

My concern with the name user_visible is that rather than describing the 
incomplete state of the resource it describes only one aspect of how the 
resources should be handled.  That is, that a non-visible resource 
should be hidden from the user.  But there's more to it. For example, 
associating a publication to a distribution should be prevented by the 
viewset.  Not based on user visibility but the incomplete state of the 

> If an exception is raised while creating the repo_version or 
> publication, or from the plugin code, the core catches it, deletes the 
> repo_version/publication and re-raises the exception. This will cause 
> the task the user is tracking to error and report the error.


> We had some challenges on irc in finding a working design for the 
> crash case. If a crash occurs though the db record will just be there 
> with user_visible=False. We need some way to clean those up. We can't 
> assume that there will be just one outstanding one for us to cleanup 
> next time for a variety of reasons I won't recap here. During the irc 
> convo, @jortel suggested we consider if the tasking system can help 
> cleanup the work like it cleans up other areas and I think that is a 
> good idea. We could record on the Task model a list of objects to 
> deleted if the tasking system cleans up a task that crashed while 
> running. For example, when a publication is made, the first thing done 
> it to associate it with the running task as an object that needs to be 
> deleted if the task crashes. We would also hide this objects_to_delete 
> list from the user in the Task serializer which would omit this data. 
> If we don't omit that data from a Task serialization when the user 
> tries to load the url they will get a 404 because that object has 
> user_visible=False.

I think it would be best to omit from the task serializer.

All seems reasonable but want to note that for this to be crash proof it 
is imperative that the resource insert and the insert into 
/things-to-be-deleted-when-the-task-crashes/ must be committed in the 
same transaction in order to be crash proof.  The same is true for when 
the task completes successfully.  Updating the (valid|visible|?) field 
on the resource, inserting into CreatedResources and deleting from 
/things-to-be-deleted-when-the-task-crashes/ needs to be done in the 
same transaction.  This is trivial for the core because it can be done 
in the task code.  Relying on plugin writers to do this is a little 

Perhaps we can do something simpler.  Given the frequency of crash or 
worker restart, I wonder if we could delete incomplete things based on 
another event that ensures that no tasks are running.  Kind of like how 
//tmp /is cleaned up on system reboot.  I don't think having some of 
these things hanging around in the DB is a problem.  It's mainly that we 
don't want to leak them indefinitely. Any ideas?

> What are thoughts on these approaches, behaviors, and the attribute 
> name? Should this be moved into Redmine?
> On Thu, Dec 14, 2017 at 11:14 AM, Jeff Ortel <jortel at redhat.com 
> <mailto:jortel at redhat.com>> wrote:
>     On 12/13/2017 01:54 PM, Brian Bouterse wrote:
>>     Defining the field's behaivor a bit more could help us with the
>>     name. Will it actually be shown to the user in viewsets and
>>     filter results?
>>     I think the answer should be no, not until it's fully finished. I
>>     can't think of a reason why a user would want to see inconsistent
>>     content during a sync or publish.
>     Agreed.
>>     There are some downsides when users thinking things are done when
>>     they aren't. For instance, the user could mistakenly think the
>>     publish is done when its not, trigger package updates, and many
>>     machines will still receive the old content because it hasn't
>>     been switched over to auto-publish for the expected distribution.
>>     Also how is this related to when the 'created_resources' field is
>>     set on a Task? I had imagined core would set that at as the last
>>     thing it does so that when the user sees it everthing is
>>     "consistent" and "live" already.
>     Agreed.
>>     -Brian
>>     On Wed, Dec 13, 2017 at 2:42 PM, David Davis
>>     <daviddavis at redhat.com <mailto:daviddavis at redhat.com>> wrote:
>>         Thanks for answering my questions. I agree on not using an
>>         “is_” prefix and avoiding “visible.”
>>         Your suggestion of “valid” sounds fine. Maybe some other
>>         options: finished, complete[d], ready.
>>         David
>>         On Wed, Dec 13, 2017 at 2:15 PM, Jeff Ortel
>>         <jortel at redhat.com <mailto:jortel at redhat.com>> wrote:
>>             On 12/13/2017 12:46 PM, David Davis wrote:
>>>             A few questions. First, what is meant by incomplete? I’m
>>>             assuming it refers to a version in the process of being
>>>             created or one that was not successfully created?
>>             Both.
>>>             Also, what’s the motivation behind storing this
>>>             information? Is there something in Pulp that needs to
>>>             know this or is this so that the user can know?
>>             There may be others but an importer needs to be passed
>>             the new version so it can add/remove content.  It needs
>>             to exist in the DB so that it can add/remove content in
>>             separate transaction(s).
>>>             Lastly, I imagine that a task will be associated with
>>>             the creation of a version. Does knowing its state not
>>>             suffice for determining if a version is visible/valid?
>>             IMHO, absolutely not. That is not what tasks records in
>>             the DB are for.  Completed task records can be deleted at
>>             any time.
>>>             David
>>>             On Wed, Dec 13, 2017 at 12:16 PM, Jeff Ortel
>>>             <jortel at redhat.com <mailto:jortel at redhat.com>> wrote:
>>>                 There has been discussion on IRC about a matter
>>>                 related to versioned repositories that needs to be
>>>                 broadened.  It dealt with situations whereby a new
>>>                 repository version exists in the DB in an incomplete
>>>                 state.  The incomplete state exists because
>>>                 conceptually a repository version includes both the
>>>                 version record itself and all of the DB records that
>>>                 associate content.  For several reasons, the entire
>>>                 version cannot be constructed in the DB in a single
>>>                 DB transaction. The problem of /Incomplete State/ is
>>>                 not unique to repository versions.  It applies to
>>>                 publications as well.  I would like to discuss and
>>>                 decide on a standard approach to resolving this
>>>                 throughout the data model.
>>>                 The IRC discussion (as related to me) suggested we
>>>                 use a common approach of having a field in the DB
>>>                 that indicates this state.  This seems reasonable to
>>>                 me.  As noted, it's a common approach. Thoughts?
>>>                 Assume we did use a field, let's discuss name.  It's
>>>                 my understanding that a field named /is_visible/ or
>>>                 just /visible/ was discussed. I would argue two
>>>                 things.  1) the is_ prefix is redundant to the fact
>>>                 it's a boolean field and we have not used this
>>>                 convention anywhere else in the model.  2)
>>>                 Historically, the term /"visible"/ has strong ties
>>>                 to user interfaces and is used to mask fields or
>>>                 records from being displayed to users.  I propose we
>>>                 use a more appropriate field name. Perhaps
>>>                 /"valid"/. Thoughts?
>>>                 _______________________________________________
>>>                 Pulp-dev mailing list
>>>                 Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>>>                 https://www.redhat.com/mailman/listinfo/pulp-dev
>>>                 <https://www.redhat.com/mailman/listinfo/pulp-dev>
>>             _______________________________________________
>>             Pulp-dev mailing list
>>             Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>>             https://www.redhat.com/mailman/listinfo/pulp-dev
>>             <https://www.redhat.com/mailman/listinfo/pulp-dev>
>>         _______________________________________________
>>         Pulp-dev mailing list
>>         Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>>         https://www.redhat.com/mailman/listinfo/pulp-dev
>>         <https://www.redhat.com/mailman/listinfo/pulp-dev>
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev
>     <https://www.redhat.com/mailman/listinfo/pulp-dev>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171214/9e2d4c86/attachment.htm>

More information about the Pulp-dev mailing list