[Pulp-dev] 3.0 Validation update

Jeff Ortel jortel at redhat.com
Wed Apr 26 13:55:21 UTC 2017

On 04/25/2017 02:47 PM, Brian Bouterse wrote:
> There are a few use cases and behaviors that I think only an asynchronous update/delete for importers,
> publishers, and repositories will satisfy. Thanks to @asmacdo for covering some of these also.
> * As a plugin writer, I am in control when my code is running. The data layer is not expected to change other
> than what my code is doing or what I've asked platform to do for me. This same reasoning applies to unit
> associations too (as a similar example of this principle). With unit associations for example, I don't think
> we want new units showing up as part of a repo halfway through the publish operation.
> * As a plugin writer, I can modify and save the importer without being concerned that I am overwriting data
> the user just updated. This is a write-write race concern. We could say in documentation "don't do that", but
> they may have good reasons to and I don't see a good reason why they can't call save() during a sync() for
> example. Consider that they are defining fields on these subclassed importers and they want to write data into
> them at sync-time.

This applies to previous use case as well.

Even with update/delete being asynchronous, I don't think there is any guarantee as to what the data will be
when the task actually runs in a multi-user application.  There is a race condition on the task queue.  Other
users' updates and syncs can be queue in any order around what this user is trying to do.  If this is a
requirement, we may need to include a snapshot of the data in the task.

> * A user should be able to submit a [sync], [modify the importer], [sync] and only have the update apply to
> the second sync. Synchronous updates make it difficult for a user to determine the impact an update has when
> applied while sync code is running for example.
> * If users want the ability to apply an importer update apply immediately then can cancel any running syncs
> and their update will apply immediately then, so we're not limiting them in any way.
> * Deleting data while code is expected to use that data is running will likely create negative user
> experiences. This is a rephrasing of the "don't change the data later while plugin code is running" point from
> earlier but with a twist. If the publisher is deleted while the plugin writer's publish() is running, if
> plugin code tries to read or save, the task will fail at some random point with a traceback, which is not a
> good experience. Contrast that against what you would get with an asynchronous design which would allow a
> clean PulpException to be emitted by the publish() task just before calling the publish() code. Having the
> PulpException which checks in exactly one place will be a better user experience with a nice message instead
> of a deep traceback and easier to maintain (in one place).

Seems this can be handled cleanly/gracefully for synchronous delete as well.  The plugin should protect
updates with try/catch and raise a PulpException when the DB exception is raised.  The sync would terminate
immediately and the task would report that the importer had been deleted which seems reasonable.

> Thank you to everyone who is participating in this convo. We don't have agreement, but we do have a good
> dialogue going! What are you thoughts about ^?
> -Brian
> On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <austin at redhat.com <mailto:austin at redhat.com>> wrote:
>     I think we have arrived at a consensus around these points:
>     * Django's ModelForm validations that are used during `full_clean` is an anti-pattern in
>     DjangoRestFramework and should not be used
>     * Validation, by our definition, will be done by the serializers in the API layer
>     * The Task layer should run business logic checks at the time of use
>     We have not arrived at a consensus on whether Updates/Deletes for Importers/Publishers will be synchronous
>     or asynchronous. I've tried to think through each way, and what it would need to work.
>     *Synchronous Update and Delete:
>     *
>     _Validation Heuristic_
>     The serializers validate in the API layer only, and it is desirable to have that validation occur at the
>     time of the write. 
>     *
>     *
>     _Concurrent Writes:_
>     Let's assume synchronous Updates and Deletes. There is a possibility of concurrent writes because
>     sync/publish tasks write to the importer/publisher tables respectively and these tables can also be
>     written by the Update API at any time. If there is a collision the last write wins. 
>     Workaround:
>     Concurrent writes may not be a big deal because they should be not affect each other. In platform, sync
>     and publish only write to the last_sync/last_publish fields, which should never be written by the update
>     API. If the writes are restricted to only the affected fields, then there would be no clobbering. As long
>     as plugin writers' implementations of sync and publish do not write to the same fields allowed in an API
>     update, this concurrency is safe. We would obviously need to be clearly documented for plugin writers.
>     _Dealing With Broken Configurations_
>     For example, while a sync task is WAITING, the user updates an importer in a way that breaks sync. The
>     importer configuration should be read only once at the start of a sync. If this configuration has been
>     deleted or is not syncable, business logic checks catch this and fail immediately. Even if the
>     configuration is changed during the sync task, the sync task will continue with the original
>     configuration. The sync task will update last_sync, but will not write to other fields. I don't see a
>     problem here, as long as this is documented for the plugin writer.
>     _Summary_:
>     I don't see any concurrency problems here. The downside of this design is that sync and publish,
>     (implemented by plugin writers), should not write to the same fields as API updates.
>     *
>     *
>     *Asynchronous Update and Delete*
>     _Validation Heuristic_
>     Assuming async updates, the data comes into the API layer where it is validated by the serializers. The
>     update reserves a task (reservation is based on repository) and waits to be performed. It is usually
>     undesirable to separate validation from use. There could be concern that data validated in the API layer
>     could become unacceptable by the time the Task layer actually performs the write.
>     Practically, this separation may not be an issue. Valid data becoming invalid involves validations that
>     are dependent on state. We can split these validations into two categories, uniqueness validations and
>     multi-field validations. Fortunately, the Django ORM will enforce uniqueness validation, so the database
>     is protected, but we might have to handle write failures. Multi-field validations are probably not
>     validations by our definition, but are business logic, which could be checked in the Task layer.
>     _
>     _
>     _Concurrent write prevention_
>     Leveraging the reserved resource system, concurrent writes could not be a problem because writes could
>     never be concurrent at all. Syncs, Updates, Deletes, Publishes all make reservations based on the
>     repository (this is an assumption based on what we do in Pulp 2), so each would run in the order that they
>     were called. 
>     _User Controlled Order_
>     A user can issue a [sync][update][sync] and be confident that the first sync will use the original
>     importer configuration, and the second sync will use the updated configuration. 
>     _Uncertain State_
>     The configuration of an importer is unpredictable if there is a waiting update task from another user.
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170426/961eb473/attachment.sig>

More information about the Pulp-dev mailing list