[Pulp-dev] 3.0 Validation update

Mon Apr 17 19:09:31 UTC 2017


On 04/14/2017 10:23 AM, Brian Bouterse wrote:
> Did we ever decide on if delete and modify operations on an importer would be synchronous API operations or
> tasking system tasks?
> 
> From what I'm reading, there isn't a MVP need (and maybe ever) to verify the feed is set at the sync API call
> and in the task. We definitely need to do that business logic, i.e. verify the feed != None, in the task for
> the MVP. If we don't need to do it in two places now, then I think doing it procedurally in the sync task here
> [0] would be good enough. That means we would not add it to the model as a method. I think this business logic
> in the sync task would be:
> 
> ```
> if importer.feed is None:
>     raise ValueError("The 'feed' attribute on the importer cannot be none when syncing")  # <--- could should
> probably be a descendent of PulpException since it's a predictable/common error
> ```

+1

> 
> [0]: https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30
> 
> -Brian
> 
> 
> 
> On Wed, Apr 12, 2017 at 10:11 AM, Jeff Ortel <jortel at redhat.com <mailto:jortel at redhat.com>> wrote:
> 
> 
> 
>     On 04/11/2017 04:08 PM, Michael Hrivnak wrote:
>     > I like thinking about this as business logic. Data may be valid, but it may not be usable in a particular context.
>     >
>     > To help figure out where such logic should live, it may help to think about where the check is most important.
>     > I've described that time as "at the time of use" earlier in this discussion (maybe just on IRC). With sync as
>     > an example, a workflow will load an importer's config from the database, check it for problems, and then
>     > immediately use the values it just inspected. This is the critical moment where it must gracefully handle
>     > unusable data. This check ensures correct behavior and avoids an unhandled exception or crash.
> 
>     +1
> 
>     >
>     > We can and should also check for problems at earlier opportunities, such as at the time a user tries to queue
>     > a sync. This improves the user experience, but it is not required for correct and safe operation.
> 
>     The data can change between the time when the task is queued and when it is executed.  This adds complexity
>     and the value added is not deterministic.  -1 for a check here.
> 
>     >
>     > Given that, I think it makes sense to put the check close to the data. A method on the model seems reasonable.
>     > In terms of polluting the model with business logic, it isn't that different from defining a custom query set
>     > on the model, which django encourages.
> 
>     Agreed, but only for data integrity. Each field should be validated.  And, the record (cross field) should be
>     validated.  We don't want to store things in the DB that are invalid.  Seems like the django ORM already
>     provides for this, right?  But, the model should not have methods to validate itself for a particular
>     operation (usage).  For example, ensure_syncable() that validates the feed URL would not be appropriate.  This
>     should be checked at run-time by the object doing the sync.  Perhaps a double dispatch model in
>     plugin.Importer could do:
> 
>     def synchronize():
>         # do validation and preparation.
>         return self._synchronize()
> 
>     def _synchronize():
>         # overridden by plugins.
>         raise NotImplementedError()
> 
> 
>     >
>     > As a slight tangent, some applications take this sort of checking even further. An admirable approach in REST
>     > API design, which may not be a good idea for us at this time but is interesting to note, is to make a behavior
>     > such as "sync" only available via a link accessed via a known name in an object's representation. That's a
>     > mouthful, so here's an example:
>     >
>     > {
>     >   "id": "foo",
>     >   "feed": "http://cdn.redhat.com/stuff/",
>     >   "_links":
>     >   {
>     >     "self": "http://here.com/importers/foo",
>     >     "sync": "http://here.com/importers/foo/sync <http://here.com/importers/foo/sync>"
>     >   }
>     > }
>     >
>     > Consider that the link for starting a sync is not part of the published API, except that it must be obtained
>     > from this representation. There are two advantages here.
>     >
>     > The main advantage I'm pointing out is that when the server creates this representation of an Importer, it
>     > would only include the "sync" link if the current state of the object would allow for a sync. If there were no
>     > feed, there would be no sync link, and thus the client would be unable to even try starting one. So this is a
>     > third opportunity to check whether the object's state is suitable for a sync. It even allows the client to
>     > show or hide a "sync" button without having to re-implement the business logic that's already present on the
>     > server side. Neat, huh?
> 
>     Interesting.
> 
>     However, the duration of how long the 'sync' link is valid is not deterministic.  1 ms after the link is
>     included in the returned document, a DB commit could set it to NULL.  The link could be invalid by the time
>     the user even receives the REST reply.
> 
>     >
>     > Another advantage to this kind of approach is a smaller API surface area. We could theoretically change the
>     > sync URL schema at any time. We could even move it to a new, separate service. We'd still need to document how
>     > to use it, but it's actual location can change. In practice I don't think this aspect is all that valuable
>     > unless you are 100% bought in to this design. But it's fun to think about.
>     >
>     > And to re-state, this kind of thing may not be worth our time to actually do right now, and I'm not proposing
>     > it. I don't know to what extent DRF would make this easy. But I wanted to bring it up for interest's sake as
>     > yet another place in the workflow, even closer to the end user than the other two we've discussed, where
>     > applications have an opportunity to utilize context checking of data.
>     >
>     > On Tue, Apr 11, 2017 at 3:49 PM, Austin Macdonald <amacdona at redhat.com <mailto:amacdona at redhat.com> <mailto:amacdona at redhat.com
>     <mailto:amacdona at redhat.com>>> wrote:
>     >
>     >     Where should business logic live? As an example, I want to consider the sync task [0] and the need to
>     >     ensure that an importer is syncable. For now, let's say that an importer is syncable if it has a feed_url.
>     >
>     >     Since the call to sync an importer comes from the API, the earliest we can check that the configuration is
>     >     syncable is in a not-yet-written SyncView. Checking syncability here is ideal because it allows us to
>     >     immediately return a meaningful http response instead of happily returning information about a Task that
>     >     we already know is doomed to fail.
>     >
>     >     Performing the check in the API layer is not enough. We have discussed edge cases that lead to an
>     >     importer's feed_url being removed while the sync task is in the WAITING state. To make an assertion to the
>     >     plugin that the feed_url exists, we have to check syncability again when the Task moves into an active state.
>     >
>     >     My thinking is that we should put this business logic on the model.
>     >
>     >     Admittedly, it is not a clean fit with the separation of concerns philosophy but we have already violated
>     >     the concept by putting the sync method on the model. If sync is on the model, it seems like
>     >     ensure_syncable should be too.
>     >
>     >     If we write the platform API layer to use this kind of business logic, then the plugins can add double
>     >     checking business logic without modifying the API and Task layers.
>     >
>     >
>     >     [0]: https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>
>     <https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>>
>     >
>     >     On Fri, Apr 7, 2017 at 2:14 PM, Sean Myers <sean.myers at redhat.com <mailto:sean.myers at redhat.com>
>     <mailto:sean.myers at redhat.com <mailto:sean.myers at redhat.com>>> wrote:
>     >
>     >         On 04/07/2017 12:08 PM, Brian Bouterse wrote:
>     >         > == questions ==
>     >         > * Where should ^ terms be documented?
>     >
>     >         I'm not really sure, but recommend the wiki as a good starting point for
>     >         putting information that we should probably "officially" document *somewhere*,
>     >         but at the moment we aren't quite sure where.
>     >
>     >         https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>     <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>
>     >         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>     <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>>
>     >
>     >         > * Take the case of a sync which has overrides provided? This isn't in the
>     >         > MVP, but in the future it could be. In that case, does the serializer
>     >         > associated with the importer validate the database data with the overrides
>     >         > "added" on top of it?
>     >
>     >         My question here is "validate against what?". It makes sense to validate
>     >         against database data, but as long as the overrides aren't themselves stored
>     >         in the database, what does this really stop?
>     >
>     >         For example, what prevents two simultaneous syncs of repos using overrides
>     >         that would trigger a constraint violation if they were saved, but don't do
>     >         this because we don't save the overrides?
>     >
>     >         > * For plugin writers writing a serializer for a subclassed Importer, do
>     >         > they also need to express validations for fields defined on the base
>     >         > Importer?
>     >
>     >         It depends on the validation. If it's just validating an individual field,
>     >         no. If it's validation of a combination of values in multiple fields, and
>     >         one of those fields in this case was defined in a subclass, the subclass
>     >         will need to add the proper validation support.
>     >
>     >         > * The database still rejects data that doesn't adhere to the data layer
>     >         > definition right? That occurs even without the DRF serializer correct?
>     >
>     >         Again, this depends. For example, attempting to store an int in a charfield
>     >         will work, because Django will coerce that int to string on save. Attempting
>     >         to store a string in an IntegerField will fail, though, because Django is
>     >         not able to coerce str to int prior to saving. Generally, though, your
>     >         understanding is correct. Anything that the database can't handle will be
>     >         rejected.
>     >
>     >         > * In cases where data is created in the backend, do we need to validate
>     >         > that as a general practice? If we do, do we call the DRF serializer
>     >         > regularly in the backend or just let the database reject "bad" data at the
>     >         > db level?
>     >
>     >         As a general practice, I don't think so. Specifically, though, when we're
>     >         passing data around, like when a bit of platform code is taking incoming
>     >         plugin data and passing it into some standard workflow that platform provides
>     >         (like running sync on an importer, say) I think it's going to be a good idea
>     >         and an all-around gentlemenly thing to do to validate that data in some way
>     >         that appropriate to the process/workflow being invoked.
>     >
>     >         I'm concerned about finding the balance between making things user-friendly
>     >         for plugin writers and having our checking code that provides that user-
>     >         friendly-ness itself be difficult to maintain and end up being pulp-developer-
>     >         unfriendly.
>     >
>     >
>     >         _______________________________________________
>     >         Pulp-dev mailing list
>     >         Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com> <mailto:Pulp-dev at redhat.com
>     <mailto:Pulp-dev at redhat.com>>
>     >         https://www.redhat.com/mailman/listinfo/pulp-dev
>     <https://www.redhat.com/mailman/listinfo/pulp-dev> <https://www.redhat.com/mailman/listinfo/pulp-dev
>     <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>     >
>     >
>     >
>     >     _______________________________________________
>     >     Pulp-dev mailing list
>     >     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com> <mailto:Pulp-dev at redhat.com
>     <mailto:Pulp-dev at redhat.com>>
>     >     https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>     <https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>     >
>     >
>     >
>     >
>     > --
>     >
>     > Michael Hrivnak
>     >
>     > Principal Software Engineer, RHCE
>     >
>     > Red Hat
>     >
>     >
>     >
>     > _______________________________________________
>     > Pulp-dev mailing list
>     > Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     > https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>     >
> 
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170417/9be35503/attachment.sig>