[Pulp-dev] Webserver owning the entire url namespace?

Fri Nov 10 16:17:46 UTC 2017

First, I appreciate this being a proposal and discussion before going this
route given it's implications for applications used to consuming Pulp
heavily. Secondly, I believe some of my questions and concerns have been
asked and addressed throughout the thread, but I do feel like it's reached
a point where a summary would be useful for those just now entering the
conversation to parse.

I know some of these concerns start to quickly broach into more advanced
architecture discussion than the original proposal raised Brian. I am happy
to break those out into other threads, but for now since someone else
mentioned it I am including them.

My generalized concerns are:

 * How do I deploy Pulp alongside another Apache based application (aka the
current Katello use case) ?
 * Can I deploy Pulp and the Content separately? From two perspectives,
splitting load across multiple hosts or separating concerns into
independent pieces that can be scaled per differing demands? (e.g. the Pulp
API itself may get a little traffic, whereas my 50k hosts might hammer the
content API)
 * If the Content is separate from the main Pulp API, does that mean that I
can scale my content delivery more easily horizontally and across
geographies? This kinda goes to in the new world, how does the current
setup of Pulp talking to Pulp to create replicated data endpoints for
geography and scaling look like given this is affected by how the URL
namespace is consumed.

Eric

On Fri, Nov 10, 2017 at 11:03 AM, Patrick Creech <pcreech at redhat.com> wrote:

> On Fri, 2017-11-10 at 10:49 -0500, Brian Bouterse wrote:
>
>
>
>
>
>
> From a deployment perspective, it's been a key use case to deploy crane at
> the perimeter, rsync published image files out to a file or CDN service,
> and run the rest of Pulp on a well-protected internal network.
>
>
> Pulp can also be installed at the perimeter. Core should support a setting
> that enables/disables the REST API. Each plugin could support a setting
> that enables/disables its content API.
>
>
> I think we're envisioning a similar goal, but with a different mechanism.
> I like the idea of a user selecting which components should be active.
> Making each component a WSGI app is very easy for us and very convenient
> for users. You can see Pulp 2's WSGI apps defined here:
>
> https://github.com/pulp/pulp/tree/master/server/usr/share/pulp/wsgi
>
> Depending on whether a user wants to run each component embedded in normal
> httpd processes, or in separate daemon processes, it's just a matter of
> enabling or not a small httpd config file like this one:
>
> https://github.com/pulp/pulp/blob/master/server/etc/httpd/co
> nf.d/pulp_content.conf
>
> This gives the most flexibility. A user won't need to deploy the entire
> stack of Pulp dependencies with all of their plugins at the perimeter if
> they don't want to; we can choose to deliver each WSGI app separately, or
> not, depending on what is convenient.
>
>
> How can one process group serve multiple WSGI applications? I don't think
> it can, so it requires the user to deploy it with multiple process groups
> (one for each WSGI application). This prevents a use case that goes like:
> "As a user, I can deploy Pulp to serve content, REST API, and live APIs all
> from one WSGI process". This use case is valuable because it's both a
> simple deployment model (fewer WSGI apps) and it uses less memory because
> there are fewer process groups. This is why I'm suggesting we ship all urls
> to be handled by one WSGI application, which also allows for the deployment
> that you outline also. So shipping one WSGI app makes Pulp the most
> flexible.
>
>
> This separation has worked very well in Pulp 2, and as far as I know there
> have been no complaints about it.
>
>
> There are complaints that Pulp is hard to deploy (multiple WSGI apps), and
> that it uses too much memory.
>
>
> There are some important isolation concerns from the security and
> reliability points here that should be weight as well. You are talking
> about having general 'user facing/unauthenticated' services like content
> serving share the same process and memory space as your management
> interface (rest api). There should probably be some thought given to what
> your acceptible level of risk and exposure are when someone finds a flaw in
> your content serving code and can now see your management interface's
> memory footprint. Or say you have some unstable code that can crash in your
> management interface, which will end up bringing down your entire
> application, instead of just the management interface. Just some food for
> thought while thinking about all the ins and outs of this tradeoff.
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20171110/0a42ef58/attachment.htm>