[Pulp-list] Pulp 3.0 Technology Stack Justifications

Eric Helms ehelms at redhat.com
Thu May 12 18:09:04 UTC 2016

On Thu, May 12, 2016 at 10:51 AM, Sean Myers <sean.myers at redhat.com> wrote:

> Early planning for Pulp 3.0 is building up some steam, and it's
> a good time to go over the proposed technology stack that we're
> looking at right now that we're looking at to build on. For all
> of these choices, once Pulp's basic needs are met, the major
> deciding factor for what library to use is decided by "meta"
> factors, like community support, release processes, etc. Special
> thanks to Jeff Ortel for making sure my assumptions about these
> tools got challenged so the right choices get made.
> We're using postgres as the DB for 3.0. Since we're going
> relational, the next thing we'd want is a good ORM. Several team
> members have experience with the Django ORM, and Pulp is actually
> already using it in its views. It has a fantastic community, is
> well documented, and comes with a vast multitude of third-party
> plugins to help us fill in any gaps in functionality that may be
> found. Our current tasking system is build on Celery[0], which is
> among those third-party plugins with excellent Django support,
> which potentially means that using Django with a relational DB
> can help us get rid of code where we overlap functionality that
> may be provided by django-celery.
> Other ORM options were considered, but only SQLAlchemy (another
> very good ORM) stood out as something we could use if there was
> a compelling reason to switch from Django, but at this time there
> is no such reason. Django does the job well. Most other ORMs are
> either not robust enough in their feature-set or apparently not
> being actively maintained, and were rejected as alternatives.
> Also rejected outright was not using an ORM (or other form of
> data mapper) at all, since my sense is that we all agree that
> we don't want to manually be writing SQL. :)
> This leads to the next big building block, which is the tool we
> should use to build our REST APIs. I've used django-tastypie in
> the past, as have a few other team members, and it was my front-
> runner for this job. After looking around though, it looks like
> django-rest-framework (DRF) is currently dominating this space
> in the Django community[0]. Going through some of their tutorials
> and examples, it's looking like tastypie is out of the running,
> and DRF is the winner. Both would be adequate for Pulp's needs
> when it comes to putting a REST API on top of our data model, so
> it makes sense to go with the more "popular" option. In addition,
> I think its documentation and API are easier to work with than
> tastypie's, so it's simultaneously easier to use and easier to
> *learn how* to use.
> Finally, we're looking at bringing in a search engine for the
> search views in the API. We're currently doing search using
> mongodb, using mongo-specific search criteria, but will be
> decoupling the search API from the search engine. As with Django,
> a few team members have experience using elasticsearch (myself
> included). Elasticsearch is java-based, running on top of the
> Lucene indexer, with a simple REST API on top of it, and so at
> the moment it's my preferred search engine.
> I looked at a few other search engines in recent testing, including
> the pure-python engine "Whoosh", Solr (also uses lucene), Xapian,
> and Sphinx (the search engine, not the document builder). Of these,
> only Whoosh and Elasticsearch have first-party support by the
> django-haystack project[2], which is both my preferred and the most
> commonly used django search plugin[3]. Given my previous positive
> experience with Elasticsearch, I think it's probably the best choice
> for a search indexer at this time.

Can you expand on why a separate search service is needed and how Postgres
won't fill your needs?


> The Whoosh plugin for Haystack currently doesn't support a very
> useful feature that Whoosh itself does support, which is faceting.
> This feature gap is something that would need to be closed (likely
> by us) to get feature parity between the elasticsearch and whoosh
> backends.
> While there are other libraries that appear to live in the same space
> as haystack (integrate a search indexer with Django models, providing
> Django QuerySet/Model results), none of them have the robust features
> and community support seen in haystack. Again, though, decoupling the
> search interface from the search implementation means that this piece
> is likely to be easy to change out if we find better options in the
> future (especially if we write it with this in mind).
> Summary:
> - Django ORM on postgres
> - django-rest-Framework to build API views
> - django-haystack to provide search capabilities, using Elasticsearch
>   to start, possible switching to Whoosh after some development -- this
>   switch should occur before any release of 3.0
> [0]: http://docs.celeryproject.org/en/latest/django/
> [1]: https://www.djangopackages.com/grids/g/rest/
> [2]: http://django-haystack.readthedocs.io/en/stable/backend_support.html
> [3]: https://www.djangopackages.com/grids/g/search/
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20160512/e13b48e7/attachment.htm>

More information about the Pulp-list mailing list