[Pulp-dev] Webserver owning the entire url namespace?

Brian Bouterse bbouters at redhat.com
Wed Nov 8 17:06:07 UTC 2017


These are good conversations. Thanks for the writing. I pretty much agree,
but in each area there are even more requirements I think about. I'll try
to identify some of them here to see what everyone thinks. These are also
good prep convos for Friday's "Live APIs" call.

# Live APIs
If plugin writers are shipping separate applications outside of Pulp, they
can do that. There are reasons why they may want to. There are also reasons
why they may want to ship live APIs inside of Pulp too. What about these?

As a plugin writer, I need to support addition URL endpoints, and I don't
want to package and distribute a second application in addition to the
plugin code I am required to ship
As a plugin writer, the URL endpoints I need to provide would benefit from
being able to query the database directly

Also note that in terms of plugin writers having django discover views and
viewsets the plugin writer is adding, practically speaking, there isn't
anything we can do to stop them. Given that, everyone would benefit from
clarifying what our requirements are from them. This would be the kind of
documentation you are talking about, which I completely agree we need.

In cases where DRF isn't a good fit for a Live API implementation,
underneath it's Django, so the plugin writer can use Django to meet the
needs of any Live API.


# Deploying REST API and Content API separately

I'm also +1 on having users be able to deploy the content serving of Pulp
separate from the REST API. That is a deployment model we should support.

Don't we also want to support having a single WSGI process serve both
content and the REST API? There are practical reasons why users may want to
deploy this way too. For example having a smaller memory footprint by
having fewer process groups, or maybe they just want a simpler deployment.
If we ship one WSGI application with Pulp then we allow for both deployment
models (together or separate REST API and content serving). Users who want
to use separated WSGI processes should configure the webserver to
instantiate the one WSGI application Pulp would ship twice, and route
content urls one WSGIProcessGroup and the REST API calls to another
WSGIProcessGroup. We could document that in a deployments page which would
be cool.

So for ^ reasons I think having Pulp ship one WSGI application that could
handle all Pulp urls would allow for the most deployment models.


On Tue, Nov 7, 2017 at 3:28 PM, Michael Hrivnak <mhrivnak at redhat.com> wrote:

>
>
> On Mon, Nov 6, 2017 at 9:34 AM, Brian Bouterse <bbouters at redhat.com>
> wrote:
>
>> Yes the REST API can be scoped to a base path. Pulp can also serve
>> content even if its scoped to a base path. So Pulp itself will work great
>> even if scoped to a base path.
>>
>> The issue is 100% around the "content serving apps" like Crane, Forge,
>> etc. I call those things "live content APIs". The current plan AIUI is that
>> "live content APIs" will be satisfied using a custom viewset so the plugin
>> developer does not need to package+ship+version+configure a separate app,
>> e.g. crane, forge, etc.
>>
>
> That may work in some cases, but I don't think it's a good fit for cases
> like the docker registry API.
>
> The registry API has enough path complexity that a viewset would not be
> sufficient, so it would need to provide a mix of routers and viewsets. It's
> an entire app worth of routes and views, including its own auth and search.
> DRF is not a great tool for that job, and it's valuable to enable plugin
> writers to use whatever tools/frameworks/languages make sense. For example,
> right now there is an effort underway to replace crane with an app that
> uses the "docker distribution" code to serve the API, but can still read
> crane's data files and serve Pulp publications. That level of flexibility
> is important.
>
> From a deployment perspective, it's been a key use case to deploy crane at
> the perimeter, rsync published image files out to a file or CDN service,
> and run the rest of Pulp on a well-protected internal network.
>
>
>>
>> So we want to simplify the common cases and allow for complex cases to
>> still work. To me that is:
>>
>> * allow plugin developers to deliver live content APIs in the form of
>> viewsets. They are free to root them anywhere in the url namespace they
>> want to. Their requirements require that.
>> * Recommend that Pulp be run not scoped to a base path (simplest). If
>> users follow this recommendation 100% of their live APIs will work.
>>
>> Then for allowing scoping Pulp to a base path:
>>
>> * Pulp can be scoped to a base path and it will work without any extra
>> config. The docs should state this is possible, but that "live APIs" may
>> not work.
>> * Users will need to figure out to make the live APIs work. That's really
>> between plugin writers and users at that point.
>>
>> Note that currently one WSGI process is serving both the REST API, the
>> Content APIs, and the "live content APIs". I don't see a use case to
>> separate them at this point. If there is a believe that (a) we will have
>> more than 1 WSGI process and (b) why, please share those thoughts.
>>
>
> We should definitely keep the REST API separate from content serving, as
> it is in Pulp 2. They are very different services with different goals,
> needs and characteristics. The streamer is a third independent service that
> likely makes sense to keep separate.
>
> The REST API and content apps have different resource needs. Content
> serving can use read-only access to a DB and filesystem, and it does not
> need message broker access. We could probably get away with only giving it
> access to a few tables in the DB. It does not need access to much of the
> config or secrets that the REST API needs. The REST API app probably needs
> a lot more memory and CPU than the content app.
>
> They have different audience/access needs also. A small group of humans
> and/or automation need to infrequently use the REST API to manage what
> content Pulp makes available. A much larger audience of content consumers
> needs to access publications. The two audiences often exist on different
> networks. More downtime can be tolerated from the REST API than the content
> app.
>
> Related to the access differences, the two apps have different scalability
> needs. The amount of traffic likely to be handled by the REST API vs
> content app are very different. And on the uptime issue, we definitely have
> a use case for continuing to serve publications while Pulp is being
> upgraded or is otherwise down for maintenance.
>
> All of that said, there's no reason why a user couldn't use a web server
> like httpd to run all three WSGI apps in the same process, multiplied
> across its normal pool of processes. We should make the apps available as
> separate WSGI apps, and users can deploy them in whatever combinations meet
> their needs.
>
> For example, Pulp 2 defaults to running the REST API as a separate set of
> daemon processes within httpd (see WSGIDaemonProcess for details) to
> isolate them from the rest of the httpd processes, which serve content (and
> potentially other apps like katello).
>
>
>>
>> In Pulp2 we matched on /api/v2/ and maybe /content/ and just those two
>> urls. This required plugin developres who need live APIs (docker, puppet,
>> etc) to ship a separate application (crane, forget, etc).
>>
>> There is a middleground where we recommend Pulp run from / but they can
>> bury it deeper in the url structure if they want, but their stuff may not
>> work. Overall though, if we are bundling live APIs a plugin viewsets then I
>> don't see how it will work if we don't recommend owning /.
>>
>
> If we advocate that plugin writers add endpoints somewhere to support
> type-specific content access APIs, that should go in the content-serving
> app. It's important that such APIs only serve content that is part of an
> active publication, which is a role well-matched to the content app. The
> access, scalability and reliability needs are also a match.
>
> A challenge with that pattern is tracking what path space is claimed by a
> plugin's live API, and making sure other Distributions don't use that path
> space. I'm sure that could be done, but it adds complexity that's worth
> thinking through.
>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171108/aa6f41aa/attachment.htm>


More information about the Pulp-dev mailing list