[Pulp-dev] 3.0 Validation update
Jeff Ortel
jortel at redhat.com
Mon Apr 17 19:09:08 UTC 2017
On 04/17/2017 09:50 AM, Michael Hrivnak wrote:
> That MVP scoping sounds reasonable.
>
> I'm not sure we decided 100% on whether those operations need to be tasks, but I think everyone seems to be
> leaning in the direction that update does not. But it partly will come down to a plugin API question. If
> updates can happen any time, then the plugin should load its config exactly once and not re-read it during the
> task. The plugin API should encourage that workflow, if not enforce it.
The Importer instance is fetched from the DB and its attributes are the /config/. So, would only be fetched
once from the DB for any task.
>
> Delete could be a little more challenging, because of the potential desire for an importer or publisher to
> want to update attributes on it, like the last time a sync succeeded. Maybe we can manage to avoid the need
> for a plugin to write to its config, which also seems like a very reasonable goal, but that again could be
> part of the plugin API discussion. Although thinking about relationships that are likely to exist in the
> database, having an importer or publisher disappear while a corresponding task is running sounds like it could
> cause a lot of potential problems.
What kind of problems?
>
> On Fri, Apr 14, 2017 at 11:23 AM, Brian Bouterse <bbouters at redhat.com <mailto:bbouters at redhat.com>> wrote:
>
> Did we ever decide on if delete and modify operations on an importer would be synchronous API operations
> or tasking system tasks?
>
> From what I'm reading, there isn't a MVP need (and maybe ever) to verify the feed is set at the sync API
> call and in the task. We definitely need to do that business logic, i.e. verify the feed != None, in the
> task for the MVP. If we don't need to do it in two places now, then I think doing it procedurally in the
> sync task here [0] would be good enough. That means we would not add it to the model as a method. I think
> this business logic in the sync task would be:
>
> ```
> if importer.feed is None:
> raise ValueError("The 'feed' attribute on the importer cannot be none when syncing") # <--- could
> should probably be a descendent of PulpException since it's a predictable/common error
> ```
>
> [0]: https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30
> <https://github.com/pulp/pulp/pull/2991/files#diff-b9ddc3416cf3909687419960d59316bbR30>
>
> -Brian
>
>
>
> On Wed, Apr 12, 2017 at 10:11 AM, Jeff Ortel <jortel at redhat.com <mailto:jortel at redhat.com>> wrote:
>
>
>
> On 04/11/2017 04:08 PM, Michael Hrivnak wrote:
> > I like thinking about this as business logic. Data may be valid, but it may not be usable in a particular context.
> >
> > To help figure out where such logic should live, it may help to think about where the check is most important.
> > I've described that time as "at the time of use" earlier in this discussion (maybe just on IRC). With sync as
> > an example, a workflow will load an importer's config from the database, check it for problems, and then
> > immediately use the values it just inspected. This is the critical moment where it must gracefully handle
> > unusable data. This check ensures correct behavior and avoids an unhandled exception or crash.
>
> +1
>
> >
> > We can and should also check for problems at earlier opportunities, such as at the time a user tries to queue
> > a sync. This improves the user experience, but it is not required for correct and safe operation.
>
> The data can change between the time when the task is queued and when it is executed. This adds
> complexity
> and the value added is not deterministic. -1 for a check here.
>
> >
> > Given that, I think it makes sense to put the check close to the data. A method on the model seems reasonable.
> > In terms of polluting the model with business logic, it isn't that different from defining a custom query set
> > on the model, which django encourages.
>
> Agreed, but only for data integrity. Each field should be validated. And, the record (cross field)
> should be
> validated. We don't want to store things in the DB that are invalid. Seems like the django ORM already
> provides for this, right? But, the model should not have methods to validate itself for a particular
> operation (usage). For example, ensure_syncable() that validates the feed URL would not be
> appropriate. This
> should be checked at run-time by the object doing the sync. Perhaps a double dispatch model in
> plugin.Importer could do:
>
> def synchronize():
> # do validation and preparation.
> return self._synchronize()
>
> def _synchronize():
> # overridden by plugins.
> raise NotImplementedError()
>
>
> >
> > As a slight tangent, some applications take this sort of checking even further. An admirable approach in REST
> > API design, which may not be a good idea for us at this time but is interesting to note, is to make a behavior
> > such as "sync" only available via a link accessed via a known name in an object's representation. That's a
> > mouthful, so here's an example:
> >
> > {
> > "id": "foo",
> > "feed": "http://cdn.redhat.com/stuff/",
> > "_links":
> > {
> > "self": "http://here.com/importers/foo",
> > "sync": "http://here.com/importers/foo/sync <http://here.com/importers/foo/sync>"
> > }
> > }
> >
> > Consider that the link for starting a sync is not part of the published API, except that it must be obtained
> > from this representation. There are two advantages here.
> >
> > The main advantage I'm pointing out is that when the server creates this representation of an Importer, it
> > would only include the "sync" link if the current state of the object would allow for a sync. If there were no
> > feed, there would be no sync link, and thus the client would be unable to even try starting one. So this is a
> > third opportunity to check whether the object's state is suitable for a sync. It even allows the client to
> > show or hide a "sync" button without having to re-implement the business logic that's already present on the
> > server side. Neat, huh?
>
> Interesting.
>
> However, the duration of how long the 'sync' link is valid is not deterministic. 1 ms after the link is
> included in the returned document, a DB commit could set it to NULL. The link could be invalid by the
> time
> the user even receives the REST reply.
>
> >
> > Another advantage to this kind of approach is a smaller API surface area. We could theoretically change the
> > sync URL schema at any time. We could even move it to a new, separate service. We'd still need to document how
> > to use it, but it's actual location can change. In practice I don't think this aspect is all that valuable
> > unless you are 100% bought in to this design. But it's fun to think about.
> >
> > And to re-state, this kind of thing may not be worth our time to actually do right now, and I'm not proposing
> > it. I don't know to what extent DRF would make this easy. But I wanted to bring it up for interest's sake as
> > yet another place in the workflow, even closer to the end user than the other two we've discussed, where
> > applications have an opportunity to utilize context checking of data.
> >
> > On Tue, Apr 11, 2017 at 3:49 PM, Austin Macdonald <amacdona at redhat.com <mailto:amacdona at redhat.com> <mailto:amacdona at redhat.com
> <mailto:amacdona at redhat.com>>> wrote:
> >
> > Where should business logic live? As an example, I want to consider the sync task [0] and the need to
> > ensure that an importer is syncable. For now, let's say that an importer is syncable if it has a feed_url.
> >
> > Since the call to sync an importer comes from the API, the earliest we can check that the configuration is
> > syncable is in a not-yet-written SyncView. Checking syncability here is ideal because it allows us to
> > immediately return a meaningful http response instead of happily returning information about a Task that
> > we already know is doomed to fail.
> >
> > Performing the check in the API layer is not enough. We have discussed edge cases that lead to an
> > importer's feed_url being removed while the sync task is in the WAITING state. To make an assertion to the
> > plugin that the feed_url exists, we have to check syncability again when the Task moves into an active state.
> >
> > My thinking is that we should put this business logic on the model.
> >
> > Admittedly, it is not a clean fit with the separation of concerns philosophy but we have already violated
> > the concept by putting the sync method on the model. If sync is on the model, it seems like
> > ensure_syncable should be too.
> >
> > If we write the platform API layer to use this kind of business logic, then the plugins can add double
> > checking business logic without modifying the API and Task layers.
> >
> >
> > [0]: https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>
> <https://pulp.plan.io/issues/2399 <https://pulp.plan.io/issues/2399>>
> >
> > On Fri, Apr 7, 2017 at 2:14 PM, Sean Myers <sean.myers at redhat.com <mailto:sean.myers at redhat.com>
> <mailto:sean.myers at redhat.com <mailto:sean.myers at redhat.com>>> wrote:
> >
> > On 04/07/2017 12:08 PM, Brian Bouterse wrote:
> > > == questions ==
> > > * Where should ^ terms be documented?
> >
> > I'm not really sure, but recommend the wiki as a good starting point for
> > putting information that we should probably "officially" document *somewhere*,
> > but at the moment we aren't quite sure where.
> >
> > https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
> <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>
> > <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes
> <https://pulp.plan.io/projects/pulp/wiki/Pulp_3_Developer_Notes>>
> >
> > > * Take the case of a sync which has overrides provided? This isn't in the
> > > MVP, but in the future it could be. In that case, does the serializer
> > > associated with the importer validate the database data with the overrides
> > > "added" on top of it?
> >
> > My question here is "validate against what?". It makes sense to validate
> > against database data, but as long as the overrides aren't themselves stored
> > in the database, what does this really stop?
> >
> > For example, what prevents two simultaneous syncs of repos using overrides
> > that would trigger a constraint violation if they were saved, but don't do
> > this because we don't save the overrides?
> >
> > > * For plugin writers writing a serializer for a subclassed Importer, do
> > > they also need to express validations for fields defined on the base
> > > Importer?
> >
> > It depends on the validation. If it's just validating an individual field,
> > no. If it's validation of a combination of values in multiple fields, and
> > one of those fields in this case was defined in a subclass, the subclass
> > will need to add the proper validation support.
> >
> > > * The database still rejects data that doesn't adhere to the data layer
> > > definition right? That occurs even without the DRF serializer correct?
> >
> > Again, this depends. For example, attempting to store an int in a charfield
> > will work, because Django will coerce that int to string on save. Attempting
> > to store a string in an IntegerField will fail, though, because Django is
> > not able to coerce str to int prior to saving. Generally, though, your
> > understanding is correct. Anything that the database can't handle will be
> > rejected.
> >
> > > * In cases where data is created in the backend, do we need to validate
> > > that as a general practice? If we do, do we call the DRF serializer
> > > regularly in the backend or just let the database reject "bad" data at the
> > > db level?
> >
> > As a general practice, I don't think so. Specifically, though, when we're
> > passing data around, like when a bit of platform code is taking incoming
> > plugin data and passing it into some standard workflow that platform provides
> > (like running sync on an importer, say) I think it's going to be a good idea
> > and an all-around gentlemenly thing to do to validate that data in some way
> > that appropriate to the process/workflow being invoked.
> >
> > I'm concerned about finding the balance between making things user-friendly
> > for plugin writers and having our checking code that provides that user-
> > friendly-ness itself be difficult to maintain and end up being pulp-developer-
> > unfriendly.
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com> <mailto:Pulp-dev at redhat.com
> <mailto:Pulp-dev at redhat.com>>
> > https://www.redhat.com/mailman/listinfo/pulp-dev
> <https://www.redhat.com/mailman/listinfo/pulp-dev> <https://www.redhat.com/mailman/listinfo/pulp-dev
> <https://www.redhat.com/mailman/listinfo/pulp-dev>>
> >
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com> <mailto:Pulp-dev at redhat.com
> <mailto:Pulp-dev at redhat.com>>
> > https://www.redhat.com/mailman/listinfo/pulp-dev
> <https://www.redhat.com/mailman/listinfo/pulp-dev> <https://www.redhat.com/mailman/listinfo/pulp-dev
> <https://www.redhat.com/mailman/listinfo/pulp-dev>>
> >
> >
> >
> >
> > --
> >
> > Michael Hrivnak
> >
> > Principal Software Engineer, RHCE
> >
> > Red Hat
> >
> >
> >
> > _______________________________________________
> > Pulp-dev mailing list
> > Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> > https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> >
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
> https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
>
>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 847 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170417/b398e91f/attachment.sig>
More information about the Pulp-dev
mailing list