[Pulp-dev] 3.0 Validation update

Mon Apr 24 19:12:21 UTC 2017

On 04/21/2017 01:16 PM, Austin Macdonald wrote:
> I think we have arrived at a consensus around these points:

+1

> 
> * Django's ModelForm validations that are used during `full_clean` is an anti-pattern in DjangoRestFramework
> and should not be used
> * Validation, by our definition, will be done by the serializers in the API layer
> * The Task layer should run business logic checks at the time of use
> 
> We have not arrived at a consensus on whether Updates/Deletes for Importers/Publishers will be synchronous or
> asynchronous. I've tried to think through each way, and what it would need to work.
> 
> *Synchronous Update and Delete:
> *
> _Validation Heuristic_
> The serializers validate in the API layer only, and it is desirable to have that validation occur at the time
> of the write. 
> *
> *
> _Concurrent Writes:_
> Let's assume synchronous Updates and Deletes. There is a possibility of concurrent writes because sync/publish
> tasks write to the importer/publisher tables respectively and these tables can also be written by the Update
> API at any time. If there is a collision the last write wins. 
> 
> Workaround:
> Concurrent writes may not be a big deal because they should be not affect each other. In platform, sync and
> publish only write to the last_sync/last_publish fields, which should never be written by the update API. If
> the writes are restricted to only the affected fields, then there would be no clobbering. As long as plugin
> writers' implementations of sync and publish do not write to the same fields allowed in an API update, this
> concurrency is safe. We would obviously need to be clearly documented for plugin writers.
> 
> _Dealing With Broken Configurations_
> For example, while a sync task is WAITING, the user updates an importer in a way that breaks sync. The
> importer configuration should be read only once at the start of a sync. If this configuration has been deleted
> or is not syncable, business logic checks catch this and fail immediately. Even if the configuration is
> changed during the sync task, the sync task will continue with the original configuration. The sync task will
> update last_sync, but will not write to other fields. I don't see a problem here, as long as this is
> documented for the plugin writer.
> 
> _Summary_:
> I don't see any concurrency problems here. The downside of this design is that sync and publish, (implemented
> by plugin writers), should not write to the same fields as API updates.
> 
> *
> *
> *Asynchronous Update and Delete*
> 
> _Validation Heuristic_
> Assuming async updates, the data comes into the API layer where it is validated by the serializers. The update
> reserves a task (reservation is based on repository) and waits to be performed. It is usually undesirable to
> separate validation from use. There could be concern that data validated in the API layer could become
> unacceptable by the time the Task layer actually performs the write.
> 
> Practically, this separation may not be an issue. Valid data becoming invalid involves validations that are
> dependent on state. We can split these validations into two categories, uniqueness validations and multi-field
> validations. Fortunately, the Django ORM will enforce uniqueness validation, so the database is protected, but
> we might have to handle write failures. Multi-field validations are probably not validations by our
> definition, but are business logic, which could be checked in the Task layer.
> _
> _
> _Concurrent write prevention_
> Leveraging the reserved resource system, concurrent writes could not be a problem because writes could never
> be concurrent at all. Syncs, Updates, Deletes, Publishes all make reservations based on the repository (this
> is an assumption based on what we do in Pulp 2), so each would run in the order that they were called. 
> 
> _User Controlled Order_
> A user can issue a [sync][update][sync] and be confident that the first sync will use the original importer
> configuration, and the second sync will use the updated configuration. 

This is not deterministic (from user point of view) even with task reservation.  You still have a race
condition queuing tasks.  In the following example, the update made by user B will win and the update made by
user A will not be used for the 2nd sync.  By making this async, seems we have done is moved the race
condition from the DB to the tasking queue.

queue
-------
1. user-A [sync]
2. user-A [update]
3. user-B [update]  (winner)
4. user-A [sync]

> 
> _Uncertain State_
> The configuration of an importer is unpredictable if there is a waiting update task from another user.
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170424/549019bc/attachment.sig>