[Pulp-dev] Webserver owning the entire url namespace?

Fri Nov 10 13:54:57 UTC 2017

On Wed, Nov 8, 2017 at 11:05 AM, Dennis Kliban <dkliban at redhat.com> wrote:

> Please see my comments inline.
>
> On Tue, Nov 7, 2017 at 3:28 PM, Michael Hrivnak <mhrivnak at redhat.com>
> wrote:
>
>>
>>
>> On Mon, Nov 6, 2017 at 9:34 AM, Brian Bouterse <bbouters at redhat.com>
>> wrote:
>>
>>> Yes the REST API can be scoped to a base path. Pulp can also serve
>>> content even if its scoped to a base path. So Pulp itself will work great
>>> even if scoped to a base path.
>>>
>>> The issue is 100% around the "content serving apps" like Crane, Forge,
>>> etc. I call those things "live content APIs". The current plan AIUI is that
>>> "live content APIs" will be satisfied using a custom viewset so the plugin
>>> developer does not need to package+ship+version+configure a separate app,
>>> e.g. crane, forge, etc.
>>>
>>
>> That may work in some cases, but I don't think it's a good fit for cases
>> like the docker registry API.
>>
>> The registry API has enough path complexity that a viewset would not be
>> sufficient, so it would need to provide a mix of routers and viewsets. It's
>> an entire app worth of routes and views, including its own auth and search.
>> DRF is not a great tool for that job, and it's valuable to enable plugin
>> writers to use whatever tools/frameworks/languages make sense. For example,
>> right now there is an effort underway to replace crane with an app that
>> uses the "docker distribution" code to serve the API, but can still read
>> crane's data files and serve Pulp publications. That level of flexibility
>> is important.
>>
>
> I believe you are suggesting that a Pulp backend could be built for a
> Docker registry. This backend would know how to consume information about
> docker content published by Pulp. This would indeed be a separate
> application. However, until such a registry backend exists, it would be
> good to allow the Docker plugin authors to provide a docker API as part of
> the same application.
>

I think you're describing crane as it exists today. Just looking at crane
itself, I don't think it makes sense to rewrite it using DRF. Even if we
had to start from scratch for some reason, I don't think DRF would make
sense. Crane doesn't use django (nor is django obviously the best fit), it
doesn't use a database at all, it isn't particularly RESTful, etc. Today
crane is implemented using flask, which meets its needs well. We would
probably benefit logistically from converting it to a django app, just to
reduce the number of frameworks we need to keep up with, but I'm not aware
of any other reason to do so.

With my comment in the previous email, I was only trying to point out that
someone is trying to re-implement crane using yet a different technology
stack, even more different from DRF than flask is.

>
>
>> From a deployment perspective, it's been a key use case to deploy crane
>> at the perimeter, rsync published image files out to a file or CDN service,
>> and run the rest of Pulp on a well-protected internal network.
>>
>
> Pulp can also be installed at the perimeter. Core should support a setting
> that enables/disables the REST API. Each plugin could support a setting
> that enables/disables its content API.
>

I think we're envisioning a similar goal, but with a different mechanism. I
like the idea of a user selecting which components should be active. Making
each component a WSGI app is very easy for us and very convenient for
users. You can see Pulp 2's WSGI apps defined here:

https://github.com/pulp/pulp/tree/master/server/usr/share/pulp/wsgi

Depending on whether a user wants to run each component embedded in normal
httpd processes, or in separate daemon processes, it's just a matter of
enabling or not a small httpd config file like this one:

https://github.com/pulp/pulp/blob/master/server/etc/httpd/conf.d/pulp_content.conf

This gives the most flexibility. A user won't need to deploy the entire
stack of Pulp dependencies with all of their plugins at the perimeter if
they don't want to; we can choose to deliver each WSGI app separately, or
not, depending on what is convenient.

This separation has worked very well in Pulp 2, and as far as I know there
have been no complaints about it.

>
>
>>
>>
>>>
>>> So we want to simplify the common cases and allow for complex cases to
>>> still work. To me that is:
>>>
>>> * allow plugin developers to deliver live content APIs in the form of
>>> viewsets. They are free to root them anywhere in the url namespace they
>>> want to. Their requirements require that.
>>> * Recommend that Pulp be run not scoped to a base path (simplest). If
>>> users follow this recommendation 100% of their live APIs will work.
>>>
>>> Then for allowing scoping Pulp to a base path:
>>>
>>> * Pulp can be scoped to a base path and it will work without any extra
>>> config. The docs should state this is possible, but that "live APIs" may
>>> not work.
>>> * Users will need to figure out to make the live APIs work. That's
>>> really between plugin writers and users at that point.
>>>
>>> Note that currently one WSGI process is serving both the REST API, the
>>> Content APIs, and the "live content APIs". I don't see a use case to
>>> separate them at this point. If there is a believe that (a) we will have
>>> more than 1 WSGI process and (b) why, please share those thoughts.
>>>
>>
>> We should definitely keep the REST API separate from content serving, as
>> it is in Pulp 2. They are very different services with different goals,
>> needs and characteristics. The streamer is a third independent service that
>> likely makes sense to keep separate.
>>
>> The REST API and content apps have different resource needs. Content
>> serving can use read-only access to a DB and filesystem, and it does not
>> need message broker access. We could probably get away with only giving it
>> access to a few tables in the DB. It does not need access to much of the
>> config or secrets that the REST API needs. The REST API app probably needs
>> a lot more memory and CPU than the content app.
>>
>> They have different audience/access needs also. A small group of humans
>> and/or automation need to infrequently use the REST API to manage what
>> content Pulp makes available. A much larger audience of content consumers
>> needs to access publications. The two audiences often exist on different
>> networks. More downtime can be tolerated from the REST API than the content
>> app.
>>
>> Related to the access differences, the two apps have different
>> scalability needs. The amount of traffic likely to be handled by the REST
>> API vs content app are very different. And on the uptime issue, we
>> definitely have a use case for continuing to serve publications while Pulp
>> is being upgraded or is otherwise down for maintenance.
>>
>> All of that said, there's no reason why a user couldn't use a web server
>> like httpd to run all three WSGI apps in the same process, multiplied
>> across its normal pool of processes. We should make the apps available as
>> separate WSGI apps, and users can deploy them in whatever combinations meet
>> their needs.
>>
>
>
> As mentioned above, Pulp should use configuration settings to disable and
> enable the REST API and the individual content APIs. Separate WSGI
> applications makes the deployment process more complicated.
>
>
>>
>> For example, Pulp 2 defaults to running the REST API as a separate set of
>> daemon processes within httpd (see WSGIDaemonProcess for details) to
>> isolate them from the rest of the httpd processes, which serve content (and
>> potentially other apps like katello).
>>
>>
>>>
>>> In Pulp2 we matched on /api/v2/ and maybe /content/ and just those two
>>> urls. This required plugin developres who need live APIs (docker, puppet,
>>> etc) to ship a separate application (crane, forget, etc).
>>>
>>> There is a middleground where we recommend Pulp run from / but they can
>>> bury it deeper in the url structure if they want, but their stuff may not
>>> work. Overall though, if we are bundling live APIs a plugin viewsets then I
>>> don't see how it will work if we don't recommend owning /.
>>>
>>
>> If we advocate that plugin writers add endpoints somewhere to support
>> type-specific content access APIs, that should go in the content-serving
>> app. It's important that such APIs only serve content that is part of an
>> active publication, which is a role well-matched to the content app. The
>> access, scalability and reliability needs are also a match.
>>
>
>
> I don't see that there is a real difference in these needs. Pulp should be
> scalable and reliable.
>
>
>>
>> A challenge with that pattern is tracking what path space is claimed by a
>> plugin's live API, and making sure other Distributions don't use that path
>> space. I'm sure that could be done, but it adds complexity that's worth
>> thinking through.
>>
>> --
>>
>> Michael Hrivnak
>>
>> Principal Software Engineer, RHCE
>>
>> Red Hat
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>

-- 

Michael Hrivnak

Principal Software Engineer, RHCE

Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171110/2fe163a0/attachment.htm>