[Pulp-dev] 3.0 Validation update

Tue Apr 25 19:47:25 UTC 2017

There are a few use cases and behaviors that I think only an asynchronous
update/delete for importers, publishers, and repositories will satisfy.
Thanks to @asmacdo for covering some of these also.

* As a plugin writer, I am in control when my code is running. The data
layer is not expected to change other than what my code is doing or what
I've asked platform to do for me. This same reasoning applies to unit
associations too (as a similar example of this principle). With unit
associations for example, I don't think we want new units showing up as
part of a repo halfway through the publish operation.

* As a plugin writer, I can modify and save the importer without being
concerned that I am overwriting data the user just updated. This is a
write-write race concern. We could say in documentation "don't do that",
but they may have good reasons to and I don't see a good reason why they
can't call save() during a sync() for example. Consider that they are
defining fields on these subclassed importers and they want to write data
into them at sync-time.

* A user should be able to submit a [sync], [modify the importer], [sync]
and only have the update apply to the second sync. Synchronous updates make
it difficult for a user to determine the impact an update has when applied
while sync code is running for example.

* If users want the ability to apply an importer update apply immediately
then can cancel any running syncs and their update will apply immediately
then, so we're not limiting them in any way.

* Deleting data while code is expected to use that data is running will
likely create negative user experiences. This is a rephrasing of the "don't
change the data later while plugin code is running" point from earlier but
with a twist. If the publisher is deleted while the plugin writer's
publish() is running, if plugin code tries to read or save, the task will
fail at some random point with a traceback, which is not a good experience.
Contrast that against what you would get with an asynchronous design which
would allow a clean PulpException to be emitted by the publish() task just
before calling the publish() code. Having the PulpException which checks in
exactly one place will be a better user experience with a nice message
instead of a deep traceback and easier to maintain (in one place).

Thank you to everyone who is participating in this convo. We don't have
agreement, but we do have a good dialogue going! What are you thoughts
about ^?

-Brian

On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <austin at redhat.com> wrote:

> I think we have arrived at a consensus around these points:
>
> * Django's ModelForm validations that are used during `full_clean` is an
> anti-pattern in DjangoRestFramework and should not be used
> * Validation, by our definition, will be done by the serializers in the
> API layer
> * The Task layer should run business logic checks at the time of use
>
> We have not arrived at a consensus on whether Updates/Deletes for
> Importers/Publishers will be synchronous or asynchronous. I've tried to
> think through each way, and what it would need to work.
>
>
> *Synchronous Update and Delete:*
> *Validation Heuristic*
> The serializers validate in the API layer only, and it is desirable to
> have that validation occur at the time of the write.
>
> *Concurrent Writes:*
> Let's assume synchronous Updates and Deletes. There is a possibility of
> concurrent writes because sync/publish tasks write to the
> importer/publisher tables respectively and these tables can also be written
> by the Update API at any time. If there is a collision the last write wins.
>
> Workaround:
> Concurrent writes may not be a big deal because they should be not affect
> each other. In platform, sync and publish only write to the
> last_sync/last_publish fields, which should never be written by the update
> API. If the writes are restricted to only the affected fields, then there
> would be no clobbering. As long as plugin writers' implementations of sync
> and publish do not write to the same fields allowed in an API update, this
> concurrency is safe. We would obviously need to be clearly documented for
> plugin writers.
>
> *Dealing With Broken Configurations*
> For example, while a sync task is WAITING, the user updates an importer in
> a way that breaks sync. The importer configuration should be read only once
> at the start of a sync. If this configuration has been deleted or is not
> syncable, business logic checks catch this and fail immediately. Even if
> the configuration is changed during the sync task, the sync task will
> continue with the original configuration. The sync task will update
> last_sync, but will not write to other fields. I don't see a problem here,
> as long as this is documented for the plugin writer.
>
> *Summary*:
> I don't see any concurrency problems here. The downside of this design is
> that sync and publish, (implemented by plugin writers), should not write to
> the same fields as API updates.
>
>
> *Asynchronous Update and Delete*
>
> *Validation Heuristic*
> Assuming async updates, the data comes into the API layer where it is
> validated by the serializers. The update reserves a task (reservation is
> based on repository) and waits to be performed. It is usually undesirable
> to separate validation from use. There could be concern that data validated
> in the API layer could become unacceptable by the time the Task layer
> actually performs the write.
>
> Practically, this separation may not be an issue. Valid data becoming
> invalid involves validations that are dependent on state. We can split
> these validations into two categories, uniqueness validations and
> multi-field validations. Fortunately, the Django ORM will enforce
> uniqueness validation, so the database is protected, but we might have to
> handle write failures. Multi-field validations are probably not validations
> by our definition, but are business logic, which could be checked in the
> Task layer.
>
> *Concurrent write prevention*
> Leveraging the reserved resource system, concurrent writes could not be a
> problem because writes could never be concurrent at all. Syncs, Updates,
> Deletes, Publishes all make reservations based on the repository (this is
> an assumption based on what we do in Pulp 2), so each would run in the
> order that they were called.
>
> *User Controlled Order*
> A user can issue a [sync][update][sync] and be confident that the first
> sync will use the original importer configuration, and the second sync will
> use the updated configuration.
>
> *Uncertain State*
> The configuration of an importer is unpredictable if there is a waiting
> update task from another user.
>
>
>
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170425/3c39af46/attachment.htm>