[Pulp-dev] Webserver owning the entire url namespace?

Michael Hrivnak mhrivnak at redhat.com
Tue Nov 7 20:28:24 UTC 2017

On Mon, Nov 6, 2017 at 9:34 AM, Brian Bouterse <bbouters at redhat.com> wrote:

> Yes the REST API can be scoped to a base path. Pulp can also serve content
> even if its scoped to a base path. So Pulp itself will work great even if
> scoped to a base path.
> The issue is 100% around the "content serving apps" like Crane, Forge,
> etc. I call those things "live content APIs". The current plan AIUI is that
> "live content APIs" will be satisfied using a custom viewset so the plugin
> developer does not need to package+ship+version+configure a separate app,
> e.g. crane, forge, etc.

That may work in some cases, but I don't think it's a good fit for cases
like the docker registry API.

The registry API has enough path complexity that a viewset would not be
sufficient, so it would need to provide a mix of routers and viewsets. It's
an entire app worth of routes and views, including its own auth and search.
DRF is not a great tool for that job, and it's valuable to enable plugin
writers to use whatever tools/frameworks/languages make sense. For example,
right now there is an effort underway to replace crane with an app that
uses the "docker distribution" code to serve the API, but can still read
crane's data files and serve Pulp publications. That level of flexibility
is important.

>From a deployment perspective, it's been a key use case to deploy crane at
the perimeter, rsync published image files out to a file or CDN service,
and run the rest of Pulp on a well-protected internal network.

> So we want to simplify the common cases and allow for complex cases to
> still work. To me that is:
> * allow plugin developers to deliver live content APIs in the form of
> viewsets. They are free to root them anywhere in the url namespace they
> want to. Their requirements require that.
> * Recommend that Pulp be run not scoped to a base path (simplest). If
> users follow this recommendation 100% of their live APIs will work.
> Then for allowing scoping Pulp to a base path:
> * Pulp can be scoped to a base path and it will work without any extra
> config. The docs should state this is possible, but that "live APIs" may
> not work.
> * Users will need to figure out to make the live APIs work. That's really
> between plugin writers and users at that point.
> Note that currently one WSGI process is serving both the REST API, the
> Content APIs, and the "live content APIs". I don't see a use case to
> separate them at this point. If there is a believe that (a) we will have
> more than 1 WSGI process and (b) why, please share those thoughts.

We should definitely keep the REST API separate from content serving, as it
is in Pulp 2. They are very different services with different goals, needs
and characteristics. The streamer is a third independent service that
likely makes sense to keep separate.

The REST API and content apps have different resource needs. Content
serving can use read-only access to a DB and filesystem, and it does not
need message broker access. We could probably get away with only giving it
access to a few tables in the DB. It does not need access to much of the
config or secrets that the REST API needs. The REST API app probably needs
a lot more memory and CPU than the content app.

They have different audience/access needs also. A small group of humans
and/or automation need to infrequently use the REST API to manage what
content Pulp makes available. A much larger audience of content consumers
needs to access publications. The two audiences often exist on different
networks. More downtime can be tolerated from the REST API than the content

Related to the access differences, the two apps have different scalability
needs. The amount of traffic likely to be handled by the REST API vs
content app are very different. And on the uptime issue, we definitely have
a use case for continuing to serve publications while Pulp is being
upgraded or is otherwise down for maintenance.

All of that said, there's no reason why a user couldn't use a web server
like httpd to run all three WSGI apps in the same process, multiplied
across its normal pool of processes. We should make the apps available as
separate WSGI apps, and users can deploy them in whatever combinations meet
their needs.

For example, Pulp 2 defaults to running the REST API as a separate set of
daemon processes within httpd (see WSGIDaemonProcess for details) to
isolate them from the rest of the httpd processes, which serve content (and
potentially other apps like katello).

> In Pulp2 we matched on /api/v2/ and maybe /content/ and just those two
> urls. This required plugin developres who need live APIs (docker, puppet,
> etc) to ship a separate application (crane, forget, etc).
> There is a middleground where we recommend Pulp run from / but they can
> bury it deeper in the url structure if they want, but their stuff may not
> work. Overall though, if we are bundling live APIs a plugin viewsets then I
> don't see how it will work if we don't recommend owning /.

If we advocate that plugin writers add endpoints somewhere to support
type-specific content access APIs, that should go in the content-serving
app. It's important that such APIs only serve content that is part of an
active publication, which is a role well-matched to the content app. The
access, scalability and reliability needs are also a match.

A challenge with that pattern is tracking what path space is claimed by a
plugin's live API, and making sure other Distributions don't use that path
space. I'm sure that could be done, but it adds complexity that's worth
thinking through.


Michael Hrivnak

Principal Software Engineer, RHCE

Red Hat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171107/1d30039a/attachment.htm>

More information about the Pulp-dev mailing list