[Pulp-dev] Partially constructed data in the DB

Brian Bouterse bbouters at redhat.com
Mon Dec 18 22:40:54 UTC 2017


Thanks for the idea. I think we're pretty close to a design. More thoughts
would be great. I wrote back some inline.

On Thu, Dec 14, 2017 at 4:39 PM, Jeff Ortel <jortel at redhat.com> wrote:

>
>
> On 12/14/2017 12:55 PM, Brian Bouterse wrote:
>
> The behavior brings me back to an attribute name like 'user_visible' and
> it would default to False. Thus you have to explicitly take the step to
> make it user visible. Whatever the name, I think this would this apply to
> both RepoVersion and Publication objects. Plugin writers who produce these
> objects also need docs that identify they need to set user_visible=True.
>
>
> Agreed, except for field name.
>
> My concern with the name user_visible is that rather than describing the
> incomplete state of the resource it describes only one aspect of how the
> resources should be handled.  That is, that a non-visible resource should
> be hidden from the user.  But there's more to it.  For example, associating
> a publication to a distribution should be prevented by the viewset.  Not
> based on user visibility but the incomplete state of the publication.
>

I'm convinced by this. What about having it be called 'complete' and having
it default to False. Only complete Publications and RepositoryVersion
objects are returned in the list, detail, or filter views.


>
> If an exception is raised while creating the repo_version or publication,
> or from the plugin code, the core catches it, deletes the
> repo_version/publication and re-raises the exception. This will cause the
> task the user is tracking to error and report the error.
>
>
> Agreed.
>

>
> We had some challenges on irc in finding a working design for the crash
> case. If a crash occurs though the db record will just be there with
> user_visible=False. We need some way to clean those up. We can't assume
> that there will be just one outstanding one for us to cleanup next time for
> a variety of reasons I won't recap here. During the irc convo, @jortel
> suggested we consider if the tasking system can help cleanup the work like
> it cleans up other areas and I think that is a good idea. We could record
> on the Task model a list of objects to deleted if the tasking system cleans
> up a task that crashed while running. For example, when a publication is
> made, the first thing done it to associate it with the running task as an
> object that needs to be deleted if the task crashes. We would also hide
> this objects_to_delete list from the user in the Task serializer which
> would omit this data. If we don't omit that data from a Task serialization
> when the user tries to load the url they will get a 404 because that object
> has user_visible=False.
>
>
> I think it would be best to omit from the task serializer.
>

I agree leaving them out for now I think makes sense. It's more a
relationship for internal correctness.


> All seems reasonable but want to note that for this to be crash proof it
> is imperative that the resource insert and the insert into
> *things-to-be-deleted-when-the-task-crashes* must be committed in the
> same transaction in order to be crash proof.  The same is true for when the
> task completes successfully.  Updating the (valid|visible|?) field on the
> resource, inserting into CreatedResources and deleting from
> *things-to-be-deleted-when-the-task-crashes* needs to be done in the same
> transaction.  This is trivial for the core because it can be done in the
> task code.  Relying on plugin writers to do this is a little concerning.
>

I agree completely that the creation of the Publication needs to be saved
in the same transaction that we update the Task with a link to the
Publication to be cleaned up. Similarly for a RepositoryVersion. We could
probably do this transaction and the updating of the Tasks' foreign key
automatically in a fancy save() method on both Publication and
RepositoryVersion. Something like: in save() start the transaction, save
the self object, if in a task, update the Task with the FK reference, end
transaction and return control back from save(). This would cause anyone
who creates one of them to receive the "safe" transactional behavior
automatically. At that point the only thing people need to know is that
they have to later call a:

my_publication.complete = True
my_publication.save()

This would be in exceptional situations where plugin code has to make the
RepositoryVersion on their own (i.e. depsolving copy). Also for the sync()
and publish() methods, core would be handling the setting of complete=True
for plugin writers.


> Perhaps we can do something simpler.  Given the frequency of crash or
> worker restart, I wonder if we could delete incomplete things based on
> another event that ensures that no tasks are running.  Kind of like how */tmp
> *is cleaned up on system reboot.  I don't think having some of these
> things hanging around in the DB is a problem.  It's mainly that we don't
> want to leak them indefinitely.  Any ideas?
>

Not cleaning them up could be ok in the near term. In terms of a simple
cleanup mechanism I think the one proposed is almost as simple as we can
afford. Otherwise we'll end up in the "singleton concurrency" design, which
I think we should avoid at almost all costs. The "singleton concurrency" is
an idea; it's the pattern is one where the resource manager stops routing
work, waits for all workers to finish their tasks, and then runs code on
itself. That is the only way the pulp cluster can be safely guaranteed that
some set of code is the *only* thing running. The work will backup
significantly while waiting for all existing work to finish, especially if
one of those sync's is long running which is probable.


>
>
>
> What are thoughts on these approaches, behaviors, and the attribute name?
> Should this be moved into Redmine?
>
>
>
>
> On Thu, Dec 14, 2017 at 11:14 AM, Jeff Ortel <jortel at redhat.com> wrote:
>
>>
>>
>> On 12/13/2017 01:54 PM, Brian Bouterse wrote:
>>
>> Defining the field's behaivor a bit more could help us with the name.
>> Will it actually be shown to the user in viewsets and filter results?
>>
>> I think the answer should be no, not until it's fully finished. I can't
>> think of a reason why a user would want to see inconsistent content during
>> a sync or publish.
>>
>>
>> Agreed.
>>
>> There are some downsides when users thinking things are done when they
>> aren't. For instance, the user could mistakenly think the publish is done
>> when its not, trigger package updates, and many machines will still receive
>> the old content because it hasn't been switched over to auto-publish for
>> the expected distribution.
>>
>> Also how is this related to when the 'created_resources' field is set on
>> a Task? I had imagined core would set that at as the last thing it does so
>> that when the user sees it everthing is "consistent" and "live" already.
>>
>>
>> Agreed.
>>
>>
>>
>> -Brian
>>
>> On Wed, Dec 13, 2017 at 2:42 PM, David Davis <daviddavis at redhat.com>
>> wrote:
>>
>>> Thanks for answering my questions. I agree on not using an “is_” prefix
>>> and avoiding “visible.”
>>>
>>> Your suggestion of “valid” sounds fine. Maybe some other options:
>>> finished, complete[d], ready.
>>>
>>>
>>> David
>>>
>>> On Wed, Dec 13, 2017 at 2:15 PM, Jeff Ortel <jortel at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On 12/13/2017 12:46 PM, David Davis wrote:
>>>>
>>>> A few questions. First, what is meant by incomplete? I’m assuming it
>>>> refers to a version in the process of being created or one that was not
>>>> successfully created?
>>>>
>>>>
>>>> Both.
>>>>
>>>>
>>>> Also, what’s the motivation behind storing this information? Is there
>>>> something in Pulp that needs to know this or is this so that the user can
>>>> know?
>>>>
>>>>
>>>> There may be others but an importer needs to be passed the new version
>>>> so it can add/remove content.  It needs to exist in the DB so that it can
>>>> add/remove content in separate transaction(s).
>>>>
>>>>
>>>> Lastly, I imagine that a task will be associated with the creation of a
>>>> version. Does knowing its state not suffice for determining if a version is
>>>> visible/valid?
>>>>
>>>>
>>>> IMHO, absolutely not.  That is not what tasks records in the DB are
>>>> for.  Completed task records can be deleted at any time.
>>>>
>>>>
>>>>
>>>> David
>>>>
>>>> On Wed, Dec 13, 2017 at 12:16 PM, Jeff Ortel <jortel at redhat.com> wrote:
>>>>
>>>>> There has been discussion on IRC about a matter related to versioned
>>>>> repositories that needs to be broadened.  It dealt with situations whereby
>>>>> a new repository version exists in the DB in an incomplete state.  The
>>>>> incomplete state exists because conceptually a repository version includes
>>>>> both the version record itself and all of the DB records that associate
>>>>> content.  For several reasons, the entire version cannot be constructed in
>>>>> the DB in a single DB transaction.  The problem of *Incomplete State*
>>>>> is not unique to repository versions.  It applies to publications as well.
>>>>> I would like to discuss and decide on a standard approach to resolving this
>>>>> throughout the data model.
>>>>>
>>>>> The IRC discussion (as related to me) suggested we use a common
>>>>> approach of having a field in the DB that indicates this state.  This seems
>>>>> reasonable to me.  As noted, it's a common approach.  Thoughts?
>>>>>
>>>>> Assume we did use a field, let's discuss name.  It's my understanding
>>>>> that a field named *is_visible* or just *visible* was discussed.  I
>>>>> would argue two things.  1) the is_ prefix is redundant to the fact it's a
>>>>> boolean field and we have not used this convention anywhere else in the
>>>>> model.  2) Historically, the term *"visible"* has strong ties to user
>>>>> interfaces and is used to mask fields or records from being displayed to
>>>>> users.  I propose we use a more appropriate field name.  Perhaps
>>>>> *"valid"*.  Thoughts?
>>>>>
>>>>> _______________________________________________
>>>>> Pulp-dev mailing list
>>>>> Pulp-dev at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pulp-dev mailing list
>>>> Pulp-dev at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171218/bcb313a9/attachment.htm>


More information about the Pulp-dev mailing list