<div dir="ltr"><div>tldr; REST calls to Update or Delete Importers, Publishers, and Repositories will return *asynchronously*.</div><div> </div>We met last week to discuss the question. The case of a Delete called during a sync demonstrates why this is necessary. If Deletes were synchronous, each plugin would have to handle the possibility that their models have been removed.<div> </div><div>By deciding that Updates and Deletes are asynchronous, we are able to make a general promise to the plugin writers: The repository and its associated importer and publishers will not be changed by platform Pulp except during reserved calls. This means that when plugin code is running it has a sort of write lock on the relevant data.</div></div><div class="gmail_extra"> <div class="gmail_quote">On Wed, Apr 26, 2017 at 9:55 AM, Jeff Ortel <<a href="mailto:jortel@redhat.com" target="_blank">jortel@redhat.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On 04/25/2017 02:47 PM, Brian Bouterse wrote: > There are a few use cases and behaviors that I think only an asynchronous update/delete for importers, > publishers, and repositories will satisfy. Thanks to @asmacdo for covering some of these also. > > * As a plugin writer, I am in control when my code is running. The data layer is not expected to change other > than what my code is doing or what I've asked platform to do for me. This same reasoning applies to unit > associations too (as a similar example of this principle). With unit associations for example, I don't think > we want new units showing up as part of a repo halfway through the publish operation. > > * As a plugin writer, I can modify and save the importer without being concerned that I am overwriting data > the user just updated. This is a write-write race concern. We could say in documentation "don't do that", but > they may have good reasons to and I don't see a good reason why they can't call save() during a sync() for > example. Consider that they are defining fields on these subclassed importers and they want to write data into > them at sync-time. This applies to previous use case as well. Even with update/delete being asynchronous, I don't think there is any guarantee as to what the data will be when the task actually runs in a multi-user application. There is a race condition on the task queue. Other users' updates and syncs can be queue in any order around what this user is trying to do. If this is a requirement, we may need to include a snapshot of the data in the task. > > * A user should be able to submit a [sync], [modify the importer], [sync] and only have the update apply to > the second sync. Synchronous updates make it difficult for a user to determine the impact an update has when > applied while sync code is running for example. > > * If users want the ability to apply an importer update apply immediately then can cancel any running syncs > and their update will apply immediately then, so we're not limiting them in any way. > > * Deleting data while code is expected to use that data is running will likely create negative user > experiences. This is a rephrasing of the "don't change the data later while plugin code is running" point from > earlier but with a twist. If the publisher is deleted while the plugin writer's publish() is running, if > plugin code tries to read or save, the task will fail at some random point with a traceback, which is not a > good experience. Contrast that against what you would get with an asynchronous design which would allow a > clean PulpException to be emitted by the publish() task just before calling the publish() code. Having the > PulpException which checks in exactly one place will be a better user experience with a nice message instead > of a deep traceback and easier to maintain (in one place). Seems this can be handled cleanly/gracefully for synchronous delete as well. The plugin should protect updates with try/catch and raise a PulpException when the DB exception is raised. The sync would terminate immediately and the task would report that the importer had been deleted which seems reasonable. > > Thank you to everyone who is participating in this convo. We don't have agreement, but we do have a good > dialogue going! What are you thoughts about ^? > > -Brian > > On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <<a href="mailto:austin@redhat.com">austin@redhat.com</a> <mailto:<a href="mailto:austin@redhat.com">austin@redhat.com</a>>> wrote: > > I think we have arrived at a consensus around these points: > > * Django's ModelForm validations that are used during `full_clean` is an anti-pattern in > DjangoRestFramework and should not be used > * Validation, by our definition, will be done by the serializers in the API layer > * The Task layer should run business logic checks at the time of use > > We have not arrived at a consensus on whether Updates/Deletes for Importers/Publishers will be synchronous > or asynchronous. I've tried to think through each way, and what it would need to work. > > *Synchronous Update and Delete: > * > _Validation Heuristic_ > The serializers validate in the API layer only, and it is desirable to have that validation occur at the > time of the write. > * > * > _Concurrent Writes:_ > Let's assume synchronous Updates and Deletes. There is a possibility of concurrent writes because > sync/publish tasks write to the importer/publisher tables respectively and these tables can also be > written by the Update API at any time. If there is a collision the last write wins. > > Workaround: > Concurrent writes may not be a big deal because they should be not affect each other. In platform, sync > and publish only write to the last_sync/last_publish fields, which should never be written by the update > API. If the writes are restricted to only the affected fields, then there would be no clobbering. As long > as plugin writers' implementations of sync and publish do not write to the same fields allowed in an API > update, this concurrency is safe. We would obviously need to be clearly documented for plugin writers. > > _Dealing With Broken Configurations_ > For example, while a sync task is WAITING, the user updates an importer in a way that breaks sync. The > importer configuration should be read only once at the start of a sync. If this configuration has been > deleted or is not syncable, business logic checks catch this and fail immediately. Even if the > configuration is changed during the sync task, the sync task will continue with the original > configuration. The sync task will update last_sync, but will not write to other fields. I don't see a > problem here, as long as this is documented for the plugin writer. > > _Summary_: > I don't see any concurrency problems here. The downside of this design is that sync and publish, > (implemented by plugin writers), should not write to the same fields as API updates. > > * > * > *Asynchronous Update and Delete* > > _Validation Heuristic_ > Assuming async updates, the data comes into the API layer where it is validated by the serializers. The > update reserves a task (reservation is based on repository) and waits to be performed. It is usually > undesirable to separate validation from use. There could be concern that data validated in the API layer > could become unacceptable by the time the Task layer actually performs the write. > > Practically, this separation may not be an issue. Valid data becoming invalid involves validations that > are dependent on state. We can split these validations into two categories, uniqueness validations and > multi-field validations. Fortunately, the Django ORM will enforce uniqueness validation, so the database > is protected, but we might have to handle write failures. Multi-field validations are probably not > validations by our definition, but are business logic, which could be checked in the Task layer. > _ > _ > _Concurrent write prevention_ > Leveraging the reserved resource system, concurrent writes could not be a problem because writes could > never be concurrent at all. Syncs, Updates, Deletes, Publishes all make reservations based on the > repository (this is an assumption based on what we do in Pulp 2), so each would run in the order that they > were called. > > _User Controlled Order_ > A user can issue a [sync][update][sync] and be confident that the first sync will use the original > importer configuration, and the second sync will use the updated configuration. > > _Uncertain State_ > The configuration of an importer is unpredictable if there is a waiting update task from another user. > > > > > > > _______________________________________________ > Pulp-dev mailing list > <a href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a> <mailto:<a href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a>> > <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a> <<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a>> > > > > <div class="HOEnZb"><div class="h5">> _______________________________________________ > Pulp-dev mailing list > <a href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a> > <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a> > </div></div> _______________________________________________ Pulp-dev mailing list <a href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a> <a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/mailman/listinfo/pulp-dev</a> </blockquote></div> </div>