[Pulp-dev] Webserver owning the entire url namespace?

Mon Nov 13 19:04:58 UTC 2017

OK in looking at this more, the mod_wsgi docs resolved my concerns in
shipping multiple WSGI applications. From the docs [0] we can serve
multiple WSGI applications as easily as we can serve one, even within one
process group. In Pulp2 we just defaulted to 1 process group per WSGI app,
but one process group can serve multiple WSGI apps. Here are some example
configs. Say we have a two WSGI application (content + live APIs, and the
REST API); call them content.wsgi and api.wsgi respectively. An apache
config could then be:

WSGIScriptAlias /status /path/to/installed/wsgi/files/api.wsgi
<----- the API WSGI app serving the status API
WSGIScriptAlias /api/v3 /path/to/installed/wsgi/files/api.wsgi
<----- the API WSGI app serving the rest of the API
WSGIScriptAlias /content /path/to/installed/wsgi/files/content.wsgi
<---- the "content" app in Pulp
WSGIScriptAlias /v1/ /path/to/installed/wsgi/files/content.wsgi
<--- any live APIs you need

Other webservers be configured similarly. So does shipping multiple WSGI
apps sound good to everyone? I think this is what I'e heard from others in
this thread and now I agree.

What about shipping 3 WSGI apps versus 2? One for the REST API (status and
API), the pulp content app serves, and third for any "other urls" ? If we
have two wsgi apps only, we probably have to have plugin writers identify
which WSGI app a url should be for. Sometimes they may want them deployed
with the content (live API), or with the rest API (complex upload use
cases).

@ehelms, I'll try to share some responses. The config above could be mixed
with other applications on the same vhost. It would also allow them to be
deployed together or separately. The content WSGI app still needs database
access to serve that content so running them over a WAN is not something I
would recommend. Pushing content to be served "at the edge", "in another
geo", or via a "CDN" are a whole group of important use cases. This won't
help with those use cases unfortunately.

[0]: http://modwsgi.readthedocs.io/en/develop/user-guides/
configuration-guidelines.html#the-wsgiscriptalias-directive

On Fri, Nov 10, 2017 at 11:17 AM, Eric Helms <ehelms at redhat.com> wrote:

> First, I appreciate this being a proposal and discussion before going this
> route given it's implications for applications used to consuming Pulp
> heavily. Secondly, I believe some of my questions and concerns have been
> asked and addressed throughout the thread, but I do feel like it's reached
> a point where a summary would be useful for those just now entering the
> conversation to parse.
>
> I know some of these concerns start to quickly broach into more advanced
> architecture discussion than the original proposal raised Brian. I am happy
> to break those out into other threads, but for now since someone else
> mentioned it I am including them.
>
> My generalized concerns are:
>
>  * How do I deploy Pulp alongside another Apache based application (aka
> the current Katello use case) ?
>  * Can I deploy Pulp and the Content separately? From two perspectives,
> splitting load across multiple hosts or separating concerns into
> independent pieces that can be scaled per differing demands? (e.g. the Pulp
> API itself may get a little traffic, whereas my 50k hosts might hammer the
> content API)
>  * If the Content is separate from the main Pulp API, does that mean that
> I can scale my content delivery more easily horizontally and across
> geographies? This kinda goes to in the new world, how does the current
> setup of Pulp talking to Pulp to create replicated data endpoints for
> geography and scaling look like given this is affected by how the URL
> namespace is consumed.
>
>
> Eric
>
> On Fri, Nov 10, 2017 at 11:03 AM, Patrick Creech <pcreech at redhat.com>
> wrote:
>
>> On Fri, 2017-11-10 at 10:49 -0500, Brian Bouterse wrote:
>>
>>
>>
>>
>>
>>
>> From a deployment perspective, it's been a key use case to deploy crane
>> at the perimeter, rsync published image files out to a file or CDN service,
>> and run the rest of Pulp on a well-protected internal network.
>>
>>
>> Pulp can also be installed at the perimeter. Core should support a
>> setting that enables/disables the REST API. Each plugin could support a
>> setting that enables/disables its content API.
>>
>>
>> I think we're envisioning a similar goal, but with a different mechanism.
>> I like the idea of a user selecting which components should be active.
>> Making each component a WSGI app is very easy for us and very convenient
>> for users. You can see Pulp 2's WSGI apps defined here:
>>
>> https://github.com/pulp/pulp/tree/master/server/usr/share/pulp/wsgi
>>
>> Depending on whether a user wants to run each component embedded in
>> normal httpd processes, or in separate daemon processes, it's just a matter
>> of enabling or not a small httpd config file like this one:
>>
>> https://github.com/pulp/pulp/blob/master/server/etc/httpd/co
>> nf.d/pulp_content.conf
>>
>> This gives the most flexibility. A user won't need to deploy the entire
>> stack of Pulp dependencies with all of their plugins at the perimeter if
>> they don't want to; we can choose to deliver each WSGI app separately, or
>> not, depending on what is convenient.
>>
>>
>> How can one process group serve multiple WSGI applications? I don't think
>> it can, so it requires the user to deploy it with multiple process groups
>> (one for each WSGI application). This prevents a use case that goes like:
>> "As a user, I can deploy Pulp to serve content, REST API, and live APIs all
>> from one WSGI process". This use case is valuable because it's both a
>> simple deployment model (fewer WSGI apps) and it uses less memory because
>> there are fewer process groups. This is why I'm suggesting we ship all urls
>> to be handled by one WSGI application, which also allows for the deployment
>> that you outline also. So shipping one WSGI app makes Pulp the most
>> flexible.
>>
>>
>> This separation has worked very well in Pulp 2, and as far as I know
>> there have been no complaints about it.
>>
>>
>> There are complaints that Pulp is hard to deploy (multiple WSGI apps),
>> and that it uses too much memory.
>>
>>
>> There are some important isolation concerns from the security and
>> reliability points here that should be weight as well. You are talking
>> about having general 'user facing/unauthenticated' services like content
>> serving share the same process and memory space as your management
>> interface (rest api). There should probably be some thought given to what
>> your acceptible level of risk and exposure are when someone finds a flaw in
>> your content serving code and can now see your management interface's
>> memory footprint. Or say you have some unstable code that can crash in your
>> management interface, which will end up bringing down your entire
>> application, instead of just the management interface. Just some food for
>> thought while thinking about all the ins and outs of this tradeoff.
>>
>> _______________________________________________
>> Pulp-dev mailing list
>> Pulp-dev at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171113/d5a217d3/attachment.htm>