[Pulp-dev] 3.0 Validation update

Mon Apr 17 19:09:08 UTC 2017

On 04/17/2017 09:50 AM, Michael Hrivnak wrote:
> That MVP scoping sounds reasonable.
> 
> I'm not sure we decided 100% on whether those operations need to be tasks, but I think everyone seems to be
> leaning in the direction that update does not. But it partly will come down to a plugin API question. If
> updates can happen any time, then the plugin should load its config exactly once and not re-read it during the
> task. The plugin API should encourage that workflow, if not enforce it.

The Importer instance is fetched from the DB and its attributes are the /config/.  So, would only be fetched
once from the DB for any task.

> 
> Delete could be a little more challenging, because of the potential desire for an importer or publisher to
> want to update attributes on it, like the last time a sync succeeded. Maybe we can manage to avoid the need
> for a plugin to write to its config, which also seems like a very reasonable goal, but that again could be
> part of the plugin API discussion. Although thinking about relationships that are likely to exist in the
> database, having an importer or publisher disappear while a corresponding task is running sounds like it could
> cause a lot of potential problems.

What kind of problems?

> 
> On Fri, Apr 14, 2017 at 11:23 AM, Brian Bouterse <bbouters at redhat.com <mailto:bbouters at redhat.com>> wrote:
> 
>     Did we ever decide on if delete and modify operations on an importer would be synchronous API operations
>     or tasking system tasks?
> 
>     From what I'm reading, there isn't a MVP need (and maybe ever) to verify the feed is set at the sync API
>     call and in the task. We definitely need to do that business logic, i.e. verify the feed != None, in the
>     task for the MVP. If we don't need to do it in two places now, then I think doing it procedurally in the
>     sync task here [0] would be good enough. That means we would not add it to the model as a method. I think
>     this business logic in the sync task would be:
> 
>     ```
>     if importer.feed is None:
>         raise ValueError("The 'feed' attribute on the importer cannot be none when syncing")  # <--- could
>     should probably be a descendent of PulpException since it's a predictable/common error
>     ```
> 
>     [0]: https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30
>     <https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30>
> 
>     -Brian
> 
> 
> 
>     On Wed, Apr 12, 2017 at 10:11 AM, Jeff Ortel <jortel at redhat.com <mailto:jortel at redhat.com>> wrote:
> 
> 
> 
>         On 04/11/2017 04:08 PM, Michael Hrivnak wrote:
>         > I like thinking about this as business logic. Data may be valid, but it may not be usable in a particular context.
>         >
>         > To help figure out where such logic should live, it may help to think about where the check is most important.
>         > I've described that time as "at the time of use" earlier in this discussion (maybe just on IRC). With sync as
>         > an example, a workflow will load an importer's config from the database, check it for problems, and then
>         > immediately use the values it just inspected. This is the critical moment where it must gracefully handle
>         > unusable data. This check ensures correct behavior and avoids an unhandled exception or crash.
> 
>         +1
> 
>         >
>         > We can and should also check for problems at earlier opportunities, such as at the time a user tries to queue
>         > a sync. This improves the user experience, but it is not required for correct and safe operation.
> 
>         The data can change between the time when the task is queued and when it is executed.  This adds
>         complexity
>         and the value added is not deterministic.  -1 for a check here.
> 
>         >
>         > Given that, I think it makes sense to put the check close to the data. A method on the model seems reasonable.
>         > In terms of polluting the model with business logic, it isn't that different from defining a custom query set
>         > on the model, which django encourages.
> 
>         Agreed, but only for data integrity. Each field should be validated.  And, the record (cross field)
>         should be
>         validated.  We don't want to store things in the DB that are invalid.  Seems like the django ORM already
>         provides for this, right?  But, the model should not have methods to validate itself for a particular
>         operation (usage).  For example, ensure_syncable() that validates the feed URL would not be
>         appropriate.  This
>         should be checked at run-time by the object doing the sync.  Perhaps a double dispatch model in
>         plugin.Importer could do:
> 
>         def synchronize():
>             # do validation and preparation.
>             return self._synchronize()
> 
>         def _synchronize():
>             # overridden by plugins.
>             raise NotImplementedError()
> 
> 
>         >
>         > As a slight tangent, some applications take this sort of checking even further. An admirable approach in REST
>         > API design, which may not be a good idea for us at this time but is interesting to note, is to make a behavior
>         > such as "sync" only available via a link accessed via a known name in an object's representation. That's a
>         > mouthful, so here's an example:
>         >
>         > {
>         >   "id": "foo",
>         >   "feed": "http://cdn.redhat.com/stuff/",
>         >   "_links":
>         >   {
>         >     "self": "http://here.com/importers/foo",
>         >     "sync": "http://here.com/importers/foo/sync <http://here.com/importers/foo/sync>"
>         >   }
>         > }
>         >
>         > Consider that the link for starting a sync is not part of the published API, except that it must be obtained
>         > from this representation. There are two advantages here.
>         >
>         > The main advantage I'm pointing out is that when the server creates this representation of an Importer, it
>         > would only include the "sync" link if the current state of the object would allow for a sync. If there were no
>         > feed, there would be no sync link, and thus the client would be unable to even try starting one. So this is a
>         > third opportunity to check whether the object's state is suitable for a sync. It even allows the client to
>         > show or hide a "sync" button without having to re-implement the business logic that's already present on the
>         > server side. Neat, huh?
> 
>         Interesting.
> 
>         However, the duration of how long the 'sync' link is valid is not deterministic.  1 ms after the link is
>         included in the returned document, a DB commit could set it to NULL.  The link could be invalid by the
>         time
>         the user even receives the REST reply.
> 
>         >
>         > Another advantage to this kind of approach is a smaller API surface area. We could theoretically change the
>         > sync URL schema at any time. We could even move it to a new, separate service. We'd still need to document how
>         > to use it, but it's actual location can change. In practice I don't think this aspect is all that valuable
>         > unless you are 100% bought in to this design. But it's fun to think about.
>         >
>         > And to re-state, this kind of thing may not be worth our time to actually do right now, and I'm not proposing
>         > it. I don't know to what extent DRF would make this easy. But I wanted to bring it up for interest's sake as
>         > yet another place in the workflow, even closer to the end user than the other two we've discussed, where
>         > applications have an opportunity to utilize context checking of data.
>         >
>         > On Tue, Apr 11, 2017 at 3:49 PM, Austin Macdonald <amacdona at redhat.com <mailto:amacdona at redhat.com> <mailto:amacdona at redhat.com
>         <mailto:amacdona at redhat.com>>> wrote:
>         >
>         >     Where should business logic live? As an example, I want to consider the sync task [0] and the need to
>         >     ensure that an importer is syncable. For now, let's say that an importer is syncable if it has a feed_url.
>         >
>         >     Since the call to sync an importer comes from the API, the earliest we can check that the configuration is
>         >     syncable is in a not-yet-written SyncView. Checking syncability here is ideal because it allows us to
>         >     immediately return a meaningful http response instead of happily returning information about a Task that
>         >     we already know is doomed to fail.
>         >
>         >     Performing the check in the API layer is not enough. We have discussed edge cases that lead to an
>         >     importer's feed_url being removed while the sync task is in the WAITING state. To make an assertion to the
>         >     plugin that the feed_url exists, we have to check syncability again when the Task moves into an active state.
>         >
>         >     My thinking is that we should put this business logic on the model.
>         >
>         >     Admittedly, it is not a clean fit with the separation of concerns philosophy but we have already violated
>         >     the concept by putting the sync method on the model. If sync is on the model, it seems like
>         >     ensure_syncable should be too.
>         >
>         >     If we write the platform API layer to use this kind of business logic, then the plugins can add double
>         >     checking business logic without modifying the API and Task layers.
>         >
>         >
>         >     [0]: https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>
>         <https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>>
>         >
>         >     On Fri, Apr 7, 2017 at 2:14 PM, Sean Myers <sean.myers at redhat.com <mailto:sean.myers at redhat.com>
>         <mailto:sean.myers at redhat.com <mailto:sean.myers at redhat.com>>> wrote:
>         >
>         >         On 04/07/2017 12:08 PM, Brian Bouterse wrote:
>         >         > == questions ==
>         >         > * Where should ^ terms be documented?
>         >
>         >         I'm not really sure, but recommend the wiki as a good starting point for
>         >         putting information that we should probably "officially" document *somewhere*,
>         >         but at the moment we aren't quite sure where.
>         >
>         >         https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>
>         >         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
>         <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>>
>         >
>         >         > * Take the case of a sync which has overrides provided? This isn't in the
>         >         > MVP, but in the future it could be. In that case, does the serializer
>         >         > associated with the importer validate the database data with the overrides
>         >         > "added" on top of it?
>         >
>         >         My question here is "validate against what?". It makes sense to validate
>         >         against database data, but as long as the overrides aren't themselves stored
>         >         in the database, what does this really stop?
>         >
>         >         For example, what prevents two simultaneous syncs of repos using overrides
>         >         that would trigger a constraint violation if they were saved, but don't do
>         >         this because we don't save the overrides?
>         >
>         >         > * For plugin writers writing a serializer for a subclassed Importer, do
>         >         > they also need to express validations for fields defined on the base
>         >         > Importer?
>         >
>         >         It depends on the validation. If it's just validating an individual field,
>         >         no. If it's validation of a combination of values in multiple fields, and
>         >         one of those fields in this case was defined in a subclass, the subclass
>         >         will need to add the proper validation support.
>         >
>         >         > * The database still rejects data that doesn't adhere to the data layer
>         >         > definition right? That occurs even without the DRF serializer correct?
>         >
>         >         Again, this depends. For example, attempting to store an int in a charfield
>         >         will work, because Django will coerce that int to string on save. Attempting
>         >         to store a string in an IntegerField will fail, though, because Django is
>         >         not able to coerce str to int prior to saving. Generally, though, your
>         >         understanding is correct. Anything that the database can't handle will be
>         >         rejected.
>         >
>         >         > * In cases where data is created in the backend, do we need to validate
>         >         > that as a general practice? If we do, do we call the DRF serializer
>         >         > regularly in the backend or just let the database reject "bad" data at the
>         >         > db level?
>         >
>         >         As a general practice, I don't think so. Specifically, though, when we're
>         >         passing data around, like when a bit of platform code is taking incoming
>         >         plugin data and passing it into some standard workflow that platform provides
>         >         (like running sync on an importer, say) I think it's going to be a good idea
>         >         and an all-around gentlemenly thing to do to validate that data in some way
>         >         that appropriate to the process/workflow being invoked.
>         >
>         >         I'm concerned about finding the balance between making things user-friendly
>         >         for plugin writers and having our checking code that provides that user-
>         >         friendly-ness itself be difficult to maintain and end up being pulp-developer-
>         >         unfriendly.
>         >
>         >
>         >         _______________________________________________
>         >         Pulp-dev mailing list
>         >         Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com> <mailto:Pulp-dev at redhat.com
>         <mailto:Pulp-dev at redhat.com>>
>         >         https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev> <https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>         >
>         >
>         >
>         >     _______________________________________________
>         >     Pulp-dev mailing list
>         >     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com> <mailto:Pulp-dev at redhat.com
>         <mailto:Pulp-dev at redhat.com>>
>         >     https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev> <https://www.redhat.com/mailman/listinfo/pulp-dev
>         <https://www.redhat.com/mailman/listinfo/pulp-dev>>
>         >
>         >
>         >
>         >
>         > --
>         >
>         > Michael Hrivnak
>         >
>         > Principal Software Engineer, RHCE
>         >
>         > Red Hat
>         >
>         >
>         >
>         > _______________________________________________
>         > Pulp-dev mailing list
>         > Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>         > https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>         >
> 
> 
>         _______________________________________________
>         Pulp-dev mailing list
>         Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>         https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> 
> 
> 
> 
> -- 
> 
> Michael Hrivnak
> 
> Principal Software Engineer, RHCE 
> 
> Red Hat
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170417/b398e91f/attachment.sig>