[Pulp-dev] 3.0 Validation update

Tue Apr 25 19:56:00 UTC 2017

I wrote a response inline.

On Mon, Apr 24, 2017 at 3:12 PM, Jeff Ortel <jortel at redhat.com> wrote:

>
>
> On 04/21/2017 01:16 PM, Austin Macdonald wrote:
> > I think we have arrived at a consensus around these points:
>
> +1
>
> >
> > * Django's ModelForm validations that are used during `full_clean` is an
> anti-pattern in DjangoRestFramework
> > and should not be used
> > * Validation, by our definition, will be done by the serializers in the
> API layer
> > * The Task layer should run business logic checks at the time of use
> >
> > We have not arrived at a consensus on whether Updates/Deletes for
> Importers/Publishers will be synchronous or
> > asynchronous. I've tried to think through each way, and what it would
> need to work.
> >
> > *Synchronous Update and Delete:
> > *
> > _Validation Heuristic_
> > The serializers validate in the API layer only, and it is desirable to
> have that validation occur at the time
> > of the write.
> > *
> > *
> > _Concurrent Writes:_
> > Let's assume synchronous Updates and Deletes. There is a possibility of
> concurrent writes because sync/publish
> > tasks write to the importer/publisher tables respectively and these
> tables can also be written by the Update
> > API at any time. If there is a collision the last write wins.
> >
> > Workaround:
> > Concurrent writes may not be a big deal because they should be not
> affect each other. In platform, sync and
> > publish only write to the last_sync/last_publish fields, which should
> never be written by the update API. If
> > the writes are restricted to only the affected fields, then there would
> be no clobbering. As long as plugin
> > writers' implementations of sync and publish do not write to the same
> fields allowed in an API update, this
> > concurrency is safe. We would obviously need to be clearly documented
> for plugin writers.
> >
> > _Dealing With Broken Configurations_
> > For example, while a sync task is WAITING, the user updates an importer
> in a way that breaks sync. The
> > importer configuration should be read only once at the start of a sync.
> If this configuration has been deleted
> > or is not syncable, business logic checks catch this and fail
> immediately. Even if the configuration is
> > changed during the sync task, the sync task will continue with the
> original configuration. The sync task will
> > update last_sync, but will not write to other fields. I don't see a
> problem here, as long as this is
> > documented for the plugin writer.
> >
> > _Summary_:
> > I don't see any concurrency problems here. The downside of this design
> is that sync and publish, (implemented
> > by plugin writers), should not write to the same fields as API updates.
> >
> > *
> > *
> > *Asynchronous Update and Delete*
> >
> > _Validation Heuristic_
> > Assuming async updates, the data comes into the API layer where it is
> validated by the serializers. The update
> > reserves a task (reservation is based on repository) and waits to be
> performed. It is usually undesirable to
> > separate validation from use. There could be concern that data validated
> in the API layer could become
> > unacceptable by the time the Task layer actually performs the write.
> >
> > Practically, this separation may not be an issue. Valid data becoming
> invalid involves validations that are
> > dependent on state. We can split these validations into two categories,
> uniqueness validations and multi-field
> > validations. Fortunately, the Django ORM will enforce uniqueness
> validation, so the database is protected, but
> > we might have to handle write failures. Multi-field validations are
> probably not validations by our
> > definition, but are business logic, which could be checked in the Task
> layer.
> > _
> > _
> > _Concurrent write prevention_
> > Leveraging the reserved resource system, concurrent writes could not be
> a problem because writes could never
> > be concurrent at all. Syncs, Updates, Deletes, Publishes all make
> reservations based on the repository (this
> > is an assumption based on what we do in Pulp 2), so each would run in
> the order that they were called.
> >
> > _User Controlled Order_
> > A user can issue a [sync][update][sync] and be confident that the first
> sync will use the original importer
> > configuration, and the second sync will use the updated configuration.
>
> This is not deterministic (from user point of view) even with task
> reservation.  You still have a race
> condition queuing tasks.  In the following example, the update made by
> user B will win and the update made by
> user A will not be used for the 2nd sync.  By making this async, seems we
> have done is moved the race
> condition from the DB to the tasking queue.
>
> queue
> -------
> 1. user-A [sync]
> 2. user-A [update]
> 3. user-B [update]  (winner)
> 4. user-A [sync]
>

I'm not concerned about ^ because eventually Pulp3 will get authZ features
which will allow administrators to prevent multi-user write-write races.
Two users competing to write shared data is a normal, multi-user system
problem. Also with the User A and User B update example, both synchronous
and asynchronous update approaches would produce the same outcomes, so I
don't think this point motivates either one as being better.

>
> >
> > _Uncertain State_
> > The configuration of an importer is unpredictable if there is a waiting
> update task from another user.
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com
> > https://www.redhat.com/mailman/listinfo/pulp-dev
> >
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170425/a121149d/attachment.htm>