<div dir="ltr"><div><div>There are a few use cases and behaviors that I think only an
asynchronous update/delete for importers, publishers, and repositories
will satisfy. Thanks to @asmacdo for covering some of these also.<br></div></div><div><br></div><div>*
As a plugin writer, I am in control when my code is running. The data
layer is not expected to change other than what my code is doing or what
I've asked platform to do for me. This same reasoning applies to unit
associations too (as a similar example of this principle). With unit
associations for example, I don't think we want new units showing up as
part of a repo halfway through the publish operation.<br><br></div><div>*
As a plugin writer, I can modify and save the importer without being
concerned that I am overwriting data the user just updated. This is a
write-write race concern. We could say in documentation "don't do that",
but they may have good reasons to and I don't see a good reason why
they can't call save() during a sync() for example. Consider that they
are defining fields on these subclassed importers and they want to write
data into them at sync-time.<br><br>* A user should be able to submit a [sync], [modify the importer],
[sync] and only have the update apply to the second sync. Synchronous
updates make it difficult for a user to determine the impact an update
has when applied while sync code is running for example.<br><br>*
If users want the ability to apply an importer update apply immediately
then can cancel any running syncs and their update will apply
immediately then, so we're not limiting them in any way.</div><div><br></div><div>*
Deleting data while code is expected to use that data is running will
likely create negative user experiences. This is a rephrasing of the
"don't change the data later while plugin code is running" point from
earlier but with a twist. If the publisher is deleted while the plugin
writer's publish() is running, if plugin code tries to read or save, the
task will fail at some random point with a traceback, which is not a
good experience. Contrast that against what you would get with an
asynchronous design which would allow a clean PulpException to be
emitted by the publish() task just before calling the publish() code.
Having the PulpException which checks in exactly one place will be a
better user experience with a nice message instead of a deep traceback
and easier to maintain (in one place).<br><br></div><div>Thank you to
everyone who is participating in this convo. We don't have agreement,
but we do have a good dialogue going! What are you thoughts about ^?<br><br></div>-Brian</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <span dir="ltr"><<a href="mailto:austin@redhat.com" target="_blank">austin@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>I think we have arrived at a consensus around these points:</div><div><br></div><div>* Django's ModelForm validations that are used during `full_clean` is an anti-pattern in DjangoRestFramework and should not be used<br></div><div><div>* Validation, by our definition, will be done by the serializers in the API layer<br></div></div><div>* The Task layer should run business logic checks at the time of use</div><div><br></div><div>We have not arrived at a consensus on whether Updates/Deletes for Importers/Publishers will be synchronous or asynchronous. I've tried to think through each way, and what it would need to work.</div><div><br></div><div><b>Synchronous Update and Delete:<br></b></div><div><u>Validation Heuristic</u></div><div>The serializers validate in the API layer only, and it is desirable to have that validation occur at the time of the write. </div><div><b><br></b></div><div><u>Concurrent Writes:</u></div><div>Let's assume synchronous Updates and Deletes. There is a possibility of concurrent writes because sync/publish tasks write to the importer/publisher tables respectively and these tables can also be written by the Update API at any time. If there is a collision the last write wins. </div><div><br></div><div>Workaround:</div><div>Concurrent writes may not be a big deal because they should be not affect each other. In platform, sync and publish only write to the last_sync/last_publish fields, which should never be written by the update API. If the writes are restricted to only the affected fields, then there would be no clobbering. As long as plugin writers' implementations of sync and publish do not write to the same fields allowed in an API update, this concurrency is safe. We would obviously need to be clearly documented for plugin writers.</div><div><br></div><div><u>Dealing With Broken Configurations</u></div><div>For example, while a sync task is WAITING, the user updates an importer in a way that breaks sync. The importer configuration should be read only once at the start of a sync. If this configuration has been deleted or is not syncable, business logic checks catch this and fail immediately. Even if the configuration is changed during the sync task, the sync task will continue with the original configuration. The sync task will update last_sync, but will not write to other fields. I don't see a problem here, as long as this is documented for the plugin writer.</div><div><br></div><div><u>Summary</u>:</div><div>I don't see any concurrency problems here. The downside of this design is that sync and publish, (implemented by plugin writers), should not write to the same fields as API updates.</div><div><br></div><div><b><br></b></div><div><b>Asynchronous Update and Delete</b></div><div><br></div><div><u>Validation Heuristic</u></div><div>Assuming async updates, the data comes into the API layer where it is validated by the serializers. The update reserves a task (reservation is based on repository) and waits to be performed. It is usually undesirable to separate validation from use. There could be concern that data validated in the API layer could become unacceptable by the time the Task layer actually performs the write.</div><div><br></div><div>Practically, this separation may not be an issue. Valid data becoming invalid involves validations that are dependent on state. We can split these validations into two categories, uniqueness validations and multi-field validations. Fortunately, the Django ORM will enforce uniqueness validation, so the database is protected, but we might have to handle write failures. Multi-field validations are probably not validations by our definition, but are business logic, which could be checked in the Task layer.</div><div><u><br></u></div><div><u>Concurrent write prevention</u></div><div>Leveraging the reserved resource system, concurrent writes could not be a problem because writes could never be concurrent at all. Syncs, Updates, Deletes, Publishes all make reservations based on the repository (this is an assumption based on what we do in Pulp 2), so each would run in the order that they were called. </div><div><br></div><div><u>User Controlled Order</u></div><div>A user can issue a [sync][update][sync] and be confident that the first sync will use the original importer configuration, and the second sync will use the updated configuration. </div><div><br></div><div><u>Uncertain State</u></div><div>The configuration of an importer is unpredictable if there is a waiting update task from another user.</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div>
<br>______________________________<wbr>_________________<br>
Pulp-dev mailing list<br>
<a href="mailto:Pulp-dev@redhat.com">Pulp-dev@redhat.com</a><br>
<a href="https://www.redhat.com/mailman/listinfo/pulp-dev" rel="noreferrer" target="_blank">https://www.redhat.com/<wbr>mailman/listinfo/pulp-dev</a><br>
<br></blockquote></div><br></div>