[Pulp-dev] Webserver owning the entire url namespace?

Mon Nov 13 19:32:35 UTC 2017

Here's an issue to split the WSGI app into two WSGI apps:
https://pulp.plan.io/issues/3132

On Mon, Nov 13, 2017 at 2:04 PM, Brian Bouterse <bbouters at redhat.com> wrote:

> OK in looking at this more, the mod_wsgi docs resolved my concerns in
> shipping multiple WSGI applications. From the docs [0] we can serve
> multiple WSGI applications as easily as we can serve one, even within one
> process group. In Pulp2 we just defaulted to 1 process group per WSGI app,
> but one process group can serve multiple WSGI apps. Here are some example
> configs. Say we have a two WSGI application (content + live APIs, and the
> REST API); call them content.wsgi and api.wsgi respectively. An apache
> config could then be:
>
> WSGIScriptAlias /status /path/to/installed/wsgi/files/api.wsgi
> <----- the API WSGI app serving the status API
> WSGIScriptAlias /api/v3 /path/to/installed/wsgi/files/api.wsgi
> <----- the API WSGI app serving the rest of the API
> WSGIScriptAlias /content /path/to/installed/wsgi/files/content.wsgi
> <---- the "content" app in Pulp
> WSGIScriptAlias /v1/ /path/to/installed/wsgi/files/content.wsgi
> <--- any live APIs you need
>
> Other webservers be configured similarly. So does shipping multiple WSGI
> apps sound good to everyone? I think this is what I'e heard from others in
> this thread and now I agree.
>
> What about shipping 3 WSGI apps versus 2? One for the REST API (status and
> API), the pulp content app serves, and third for any "other urls" ? If we
> have two wsgi apps only, we probably have to have plugin writers identify
> which WSGI app a url should be for. Sometimes they may want them deployed
> with the content (live API), or with the rest API (complex upload use
> cases).
>
> @ehelms, I'll try to share some responses. The config above could be mixed
> with other applications on the same vhost. It would also allow them to be
> deployed together or separately. The content WSGI app still needs database
> access to serve that content so running them over a WAN is not something I
> would recommend. Pushing content to be served "at the edge", "in another
> geo", or via a "CDN" are a whole group of important use cases. This won't
> help with those use cases unfortunately.
>
> [0]: http://modwsgi.readthedocs.io/en/develop/user-guides/configu
> ration-guidelines.html#the-wsgiscriptalias-directive
>
> On Fri, Nov 10, 2017 at 11:17 AM, Eric Helms <ehelms at redhat.com> wrote:
>
>> First, I appreciate this being a proposal and discussion before going
>> this route given it's implications for applications used to consuming Pulp
>> heavily. Secondly, I believe some of my questions and concerns have been
>> asked and addressed throughout the thread, but I do feel like it's reached
>> a point where a summary would be useful for those just now entering the
>> conversation to parse.
>>
>> I know some of these concerns start to quickly broach into more advanced
>> architecture discussion than the original proposal raised Brian. I am happy
>> to break those out into other threads, but for now since someone else
>> mentioned it I am including them.
>>
>> My generalized concerns are:
>>
>>  * How do I deploy Pulp alongside another Apache based application (aka
>> the current Katello use case) ?
>>  * Can I deploy Pulp and the Content separately? From two perspectives,
>> splitting load across multiple hosts or separating concerns into
>> independent pieces that can be scaled per differing demands? (e.g. the Pulp
>> API itself may get a little traffic, whereas my 50k hosts might hammer the
>> content API)
>>  * If the Content is separate from the main Pulp API, does that mean that
>> I can scale my content delivery more easily horizontally and across
>> geographies? This kinda goes to in the new world, how does the current
>> setup of Pulp talking to Pulp to create replicated data endpoints for
>> geography and scaling look like given this is affected by how the URL
>> namespace is consumed.
>>
>>
>> Eric
>>
>> On Fri, Nov 10, 2017 at 11:03 AM, Patrick Creech <pcreech at redhat.com>
>> wrote:
>>
>>> On Fri, 2017-11-10 at 10:49 -0500, Brian Bouterse wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>> From a deployment perspective, it's been a key use case to deploy crane
>>> at the perimeter, rsync published image files out to a file or CDN service,
>>> and run the rest of Pulp on a well-protected internal network.
>>>
>>>
>>> Pulp can also be installed at the perimeter. Core should support a
>>> setting that enables/disables the REST API. Each plugin could support a
>>> setting that enables/disables its content API.
>>>
>>>
>>> I think we're envisioning a similar goal, but with a different
>>> mechanism. I like the idea of a user selecting which components should be
>>> active. Making each component a WSGI app is very easy for us and very
>>> convenient for users. You can see Pulp 2's WSGI apps defined here:
>>>
>>> https://github.com/pulp/pulp/tree/master/server/usr/share/pulp/wsgi
>>>
>>> Depending on whether a user wants to run each component embedded in
>>> normal httpd processes, or in separate daemon processes, it's just a matter
>>> of enabling or not a small httpd config file like this one:
>>>
>>> https://github.com/pulp/pulp/blob/master/server/etc/httpd/co
>>> nf.d/pulp_content.conf
>>>
>>> This gives the most flexibility. A user won't need to deploy the entire
>>> stack of Pulp dependencies with all of their plugins at the perimeter if
>>> they don't want to; we can choose to deliver each WSGI app separately, or
>>> not, depending on what is convenient.
>>>
>>>
>>> How can one process group serve multiple WSGI applications? I don't
>>> think it can, so it requires the user to deploy it with multiple process
>>> groups (one for each WSGI application). This prevents a use case that goes
>>> like: "As a user, I can deploy Pulp to serve content, REST API, and live
>>> APIs all from one WSGI process". This use case is valuable because it's
>>> both a simple deployment model (fewer WSGI apps) and it uses less memory
>>> because there are fewer process groups. This is why I'm suggesting we ship
>>> all urls to be handled by one WSGI application, which also allows for the
>>> deployment that you outline also. So shipping one WSGI app makes Pulp the
>>> most flexible.
>>>
>>>
>>> This separation has worked very well in Pulp 2, and as far as I know
>>> there have been no complaints about it.
>>>
>>>
>>> There are complaints that Pulp is hard to deploy (multiple WSGI apps),
>>> and that it uses too much memory.
>>>
>>>
>>> There are some important isolation concerns from the security and
>>> reliability points here that should be weight as well. You are talking
>>> about having general 'user facing/unauthenticated' services like content
>>> serving share the same process and memory space as your management
>>> interface (rest api). There should probably be some thought given to what
>>> your acceptible level of risk and exposure are when someone finds a flaw in
>>> your content serving code and can now see your management interface's
>>> memory footprint. Or say you have some unstable code that can crash in your
>>> management interface, which will end up bringing down your entire
>>> application, instead of just the management interface. Just some food for
>>> thought while thinking about all the ins and outs of this tradeoff.
>>>
>>> _______________________________________________
>>> Pulp-dev mailing list
>>> Pulp-dev at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171113/5d451dff/attachment.htm>