[Pulp-list] Pulp 3.0 Technology Stack Justifications

Tue May 17 03:23:47 UTC 2016

FWIW, As a consumer I'm not excited about seeing ES make its way back into
the katello ecosystem.  All of my opinion is based on the fact that I use
pulp inside Katello.

PostgreSQL's more recent versions extended to NoSQL feature sets that can
be very performant.  Simply googling PostgreSQL NoSQL points to lots of
articles on it.

Having supported on several large applications that didn't scale well due
to ORMs I'm not a huge fan of them myself, but I'd rather have an ORM than
have ES on my systems.

Since i keep saying i dont want ES but dont have any reasons here are my
top 3:

1: Bundled everything - we are big on packages (hey pulp!) and having a big
bundled package from an open source project just rubs me wrong and has
other fun issues that yall will be stuck deal with.

2: Scaling - Katello's install isn't really designed to easily be built
across multiple systems, even if ES is.  Not that you can't do it, but
breaking things out can be...interesting.  Then, ES requires you to think
about scale from the get go. If pulp or katello initialize a default, there
is a strong requirement to oversize everything upfront, but even that is
dangerous.
https://www.elastic.co/guide/en/elasticsearch/guide/master/scale.html

3: Data integrity (i've had supposedly recoverable shards that i had to
loose completely cause they would not come back online no matter what
documentation tells you)

-greg

On Thu, May 12, 2016 at 1:09 PM Eric Helms <ehelms at redhat.com> wrote:

> On Thu, May 12, 2016 at 10:51 AM, Sean Myers <sean.myers at redhat.com>
> wrote:
>
>> Early planning for Pulp 3.0 is building up some steam, and it's
>> a good time to go over the proposed technology stack that we're
>> looking at right now that we're looking at to build on. For all
>> of these choices, once Pulp's basic needs are met, the major
>> deciding factor for what library to use is decided by "meta"
>> factors, like community support, release processes, etc. Special
>> thanks to Jeff Ortel for making sure my assumptions about these
>> tools got challenged so the right choices get made.
>>
>> We're using postgres as the DB for 3.0. Since we're going
>> relational, the next thing we'd want is a good ORM. Several team
>> members have experience with the Django ORM, and Pulp is actually
>> already using it in its views. It has a fantastic community, is
>> well documented, and comes with a vast multitude of third-party
>> plugins to help us fill in any gaps in functionality that may be
>> found. Our current tasking system is build on Celery[0], which is
>> among those third-party plugins with excellent Django support,
>> which potentially means that using Django with a relational DB
>> can help us get rid of code where we overlap functionality that
>> may be provided by django-celery.
>>
>> Other ORM options were considered, but only SQLAlchemy (another
>> very good ORM) stood out as something we could use if there was
>> a compelling reason to switch from Django, but at this time there
>> is no such reason. Django does the job well. Most other ORMs are
>> either not robust enough in their feature-set or apparently not
>> being actively maintained, and were rejected as alternatives.
>> Also rejected outright was not using an ORM (or other form of
>> data mapper) at all, since my sense is that we all agree that
>> we don't want to manually be writing SQL. :)
>>
>> This leads to the next big building block, which is the tool we
>> should use to build our REST APIs. I've used django-tastypie in
>> the past, as have a few other team members, and it was my front-
>> runner for this job. After looking around though, it looks like
>> django-rest-framework (DRF) is currently dominating this space
>> in the Django community[0]. Going through some of their tutorials
>> and examples, it's looking like tastypie is out of the running,
>> and DRF is the winner. Both would be adequate for Pulp's needs
>> when it comes to putting a REST API on top of our data model, so
>> it makes sense to go with the more "popular" option. In addition,
>> I think its documentation and API are easier to work with than
>> tastypie's, so it's simultaneously easier to use and easier to
>> *learn how* to use.
>>
>> Finally, we're looking at bringing in a search engine for the
>> search views in the API. We're currently doing search using
>> mongodb, using mongo-specific search criteria, but will be
>> decoupling the search API from the search engine. As with Django,
>> a few team members have experience using elasticsearch (myself
>> included). Elasticsearch is java-based, running on top of the
>> Lucene indexer, with a simple REST API on top of it, and so at
>> the moment it's my preferred search engine.
>>
>> I looked at a few other search engines in recent testing, including
>> the pure-python engine "Whoosh", Solr (also uses lucene), Xapian,
>> and Sphinx (the search engine, not the document builder). Of these,
>> only Whoosh and Elasticsearch have first-party support by the
>> django-haystack project[2], which is both my preferred and the most
>> commonly used django search plugin[3]. Given my previous positive
>> experience with Elasticsearch, I think it's probably the best choice
>> for a search indexer at this time.
>>
>
> Can you expand on why a separate search service is needed and how Postgres
> won't fill your needs?
>
> Thanks,
> Eric
>
>
>> The Whoosh plugin for Haystack currently doesn't support a very
>> useful feature that Whoosh itself does support, which is faceting.
>> This feature gap is something that would need to be closed (likely
>> by us) to get feature parity between the elasticsearch and whoosh
>> backends.
>>
>> While there are other libraries that appear to live in the same space
>> as haystack (integrate a search indexer with Django models, providing
>> Django QuerySet/Model results), none of them have the robust features
>> and community support seen in haystack. Again, though, decoupling the
>> search interface from the search implementation means that this piece
>> is likely to be easy to change out if we find better options in the
>> future (especially if we write it with this in mind).
>>
>> Summary:
>> - Django ORM on postgres
>> - django-rest-Framework to build API views
>> - django-haystack to provide search capabilities, using Elasticsearch
>>   to start, possible switching to Whoosh after some development -- this
>>   switch should occur before any release of 3.0
>>
>> [0]: http://docs.celeryproject.org/en/latest/django/
>> [1]: https://www.djangopackages.com/grids/g/rest/
>> [2]: http://django-haystack.readthedocs.io/en/stable/backend_support.html
>> [3]: https://www.djangopackages.com/grids/g/search/
>>
>>
>> _______________________________________________
>> Pulp-list mailing list
>> Pulp-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/pulp-list
>>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20160517/7f1e4fac/attachment.htm>