[Pulp-dev] 3.0 Validation update
mhrivnak at redhat.com
Tue Apr 11 21:08:12 UTC 2017
I like thinking about this as business logic. Data may be valid, but it may
not be usable in a particular context.
To help figure out where such logic should live, it may help to think about
where the check is most important. I've described that time as "at the time
of use" earlier in this discussion (maybe just on IRC). With sync as an
example, a workflow will load an importer's config from the database, check
it for problems, and then immediately use the values it just inspected.
This is the critical moment where it must gracefully handle unusable data.
This check ensures correct behavior and avoids an unhandled exception or
We can and should also check for problems at earlier opportunities, such as
at the time a user tries to queue a sync. This improves the user
experience, but it is not required for correct and safe operation.
Given that, I think it makes sense to put the check close to the data. A
method on the model seems reasonable. In terms of polluting the model with
business logic, it isn't that different from defining a custom query set on
the model, which django encourages.
As a slight tangent, some applications take this sort of checking even
further. An admirable approach in REST API design, which may not be a good
idea for us at this time but is interesting to note, is to make a behavior
such as "sync" only available via a link accessed via a known name in an
object's representation. That's a mouthful, so here's an example:
Consider that the link for starting a sync is not part of the published
API, except that it must be obtained from this representation. There are
two advantages here.
The main advantage I'm pointing out is that when the server creates this
representation of an Importer, it would only include the "sync" link if the
current state of the object would allow for a sync. If there were no feed,
there would be no sync link, and thus the client would be unable to even
try starting one. So this is a third opportunity to check whether the
object's state is suitable for a sync. It even allows the client to show or
hide a "sync" button without having to re-implement the business logic
that's already present on the server side. Neat, huh?
Another advantage to this kind of approach is a smaller API surface area.
We could theoretically change the sync URL schema at any time. We could
even move it to a new, separate service. We'd still need to document how to
use it, but it's actual location can change. In practice I don't think this
aspect is all that valuable unless you are 100% bought in to this design.
But it's fun to think about.
And to re-state, this kind of thing may not be worth our time to actually
do right now, and I'm not proposing it. I don't know to what extent DRF
would make this easy. But I wanted to bring it up for interest's sake as
yet another place in the workflow, even closer to the end user than the
other two we've discussed, where applications have an opportunity to
utilize context checking of data.
On Tue, Apr 11, 2017 at 3:49 PM, Austin Macdonald <amacdona at redhat.com>
> Where should business logic live? As an example, I want to consider the
> sync task  and the need to ensure that an importer is syncable. For now,
> let's say that an importer is syncable if it has a feed_url.
> Since the call to sync an importer comes from the API, the earliest we can
> check that the configuration is syncable is in a not-yet-written SyncView.
> Checking syncability here is ideal because it allows us to immediately
> return a meaningful http response instead of happily returning information
> about a Task that we already know is doomed to fail.
> Performing the check in the API layer is not enough. We have discussed
> edge cases that lead to an importer's feed_url being removed while the sync
> task is in the WAITING state. To make an assertion to the plugin that the
> feed_url exists, we have to check syncability again when the Task moves
> into an active state.
> My thinking is that we should put this business logic on the model.
> Admittedly, it is not a clean fit with the separation of concerns
> philosophy but we have already violated the concept by putting the sync
> method on the model. If sync is on the model, it seems like ensure_syncable
> should be too.
> If we write the platform API layer to use this kind of business logic,
> then the plugins can add double checking business logic without modifying
> the API and Task layers.
> : https://pulp.plan.io/issues/2399
> On Fri, Apr 7, 2017 at 2:14 PM, Sean Myers <sean.myers at redhat.com> wrote:
>> On 04/07/2017 12:08 PM, Brian Bouterse wrote:
>> > == questions ==
>> > * Where should ^ terms be documented?
>> I'm not really sure, but recommend the wiki as a good starting point for
>> putting information that we should probably "officially" document
>> but at the moment we aren't quite sure where.
>> > * Take the case of a sync which has overrides provided? This isn't in
>> > MVP, but in the future it could be. In that case, does the serializer
>> > associated with the importer validate the database data with the
>> > "added" on top of it?
>> My question here is "validate against what?". It makes sense to validate
>> against database data, but as long as the overrides aren't themselves
>> in the database, what does this really stop?
>> For example, what prevents two simultaneous syncs of repos using overrides
>> that would trigger a constraint violation if they were saved, but don't do
>> this because we don't save the overrides?
>> > * For plugin writers writing a serializer for a subclassed Importer, do
>> > they also need to express validations for fields defined on the base
>> > Importer?
>> It depends on the validation. If it's just validating an individual field,
>> no. If it's validation of a combination of values in multiple fields, and
>> one of those fields in this case was defined in a subclass, the subclass
>> will need to add the proper validation support.
>> > * The database still rejects data that doesn't adhere to the data layer
>> > definition right? That occurs even without the DRF serializer correct?
>> Again, this depends. For example, attempting to store an int in a
>> will work, because Django will coerce that int to string on save.
>> to store a string in an IntegerField will fail, though, because Django is
>> not able to coerce str to int prior to saving. Generally, though, your
>> understanding is correct. Anything that the database can't handle will be
>> > * In cases where data is created in the backend, do we need to validate
>> > that as a general practice? If we do, do we call the DRF serializer
>> > regularly in the backend or just let the database reject "bad" data at
>> > db level?
>> As a general practice, I don't think so. Specifically, though, when we're
>> passing data around, like when a bit of platform code is taking incoming
>> plugin data and passing it into some standard workflow that platform
>> (like running sync on an importer, say) I think it's going to be a good
>> and an all-around gentlemenly thing to do to validate that data in some
>> that appropriate to the process/workflow being invoked.
>> I'm concerned about finding the balance between making things
>> for plugin writers and having our checking code that provides that user-
>> friendly-ness itself be difficult to maintain and end up being
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
> Pulp-dev mailing list
> Pulp-dev at redhat.com
Principal Software Engineer, RHCE
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pulp-dev