[Pulp-list] Pulp 3.0 Technology Stack Justifications
lzap at redhat.com
Wed May 18 07:05:17 UTC 2016
> FWIW, As a consumer I'm not excited about seeing ES make its way back into
> the katello ecosystem. All of my opinion is based on the fact that I use
> pulp inside Katello.
> PostgreSQL's more recent versions extended to NoSQL feature sets that can
> be very performant. Simply googling PostgreSQL NoSQL points to lots of
> articles on it.
Actually full text search is a feature that is in PostgreSQL for years
as a plugin and it was included in core I think somewhere in 8.x series.
I am not sure if this fulfills the NoSQL buzzword, but it's something
that works just fine with gigabytes of data (which I tested myself).
It integrates with ispell for stemming (which is really great feature
that Lucene didn't have on par for years) and configuration is trivial.
Having the search integrated in one database is huge benefit. Separate
indexing components tend to be slow on updates with possibility to
become out of sync. Data can be reindexed, but that does not solve the
root cause of a problem. I expect Pulp will be indexing only some parts
of data - I can imagine package names do not need to be indexed at all
since they have their own index already and with PostgreSQL integrated
solution you can use them both (package name index plus full text for
let's say errata texts if I understand your motivation correctly. Also,
having all the data under one roof (and one transaction) can be really
big deal for data integrity, backup and security.
As a (small but) Lucene contributor and with experiences with Lucene, ES
and PostgreSQL full text search capabilities, I'd try to evaluate the
PostgreSQL option for real. Searching API is usually quite easy, the
most difficult part is preparing the data. And you will be doing that
regardless of the chosen technology stack. Therefore I think the missing
django plugin for PostgreSQL full text search might not be the biggest
issue at all.
Google found some links if you want to see some comparison:
PostgreSQL full text outperforms Lucene 4 times in this one and takes
less index data on disk. This was just a quick search, but I want to
show that Lucene/ES won't be faster than PostgreSQL by order of
magnitude. And RDBMS scaling is not a *real* issue for decades.
I am really happy you are back in RDBMS business folks :-)
Lukas #lzap Zapletal
More information about the Pulp-list