[Pulp-dev] Transition from Mongo to Postgre

Tue Sep 13 15:01:42 UTC 2016

In addition to what @mhrivnak said. For me, the big motivation is 
transaction support. A single Pulp sync or publish can issue thousands 
of writes to the database. A failure in the middle leaves the database 
"half-updated" and Pulp has no feasible way to roll back these changes. 
This creates a major problem for data correctness in the face of 
failures. Transaction support at the database layer will give Pulp an 
opportunity to recover from these failures and preserve correctness.

 From a high level, Pulp's transition to PostgreSQL is about correctness 
not performance. We don't want to give up performance, but performance 
is a secondary concern behind correctness. Pulp 2.y hasn't done much to 
have the write and read performance really benefit from "the mongodb 
way"[0] so in switching I expect to see "similar" performance. We would 
need to benchmark and quantify the performance of 2.y versus 3.y to 
really know. We are not planning to do that so we may never know, but 
here is a writeup of an outline to track performance [1].

[0]: loosening write/read consistency and deployments that use sharding
[1]: https://etherpad.net/p/pulp_performance_test_plan

-Brian

On 09/13/2016 09:11 AM, Michael Hrivnak wrote:
> We have a thread here about a lot of the 3.0 stack choices, although it
> seems to skip past the assumption that we're moving to postgres:
>
> https://www.redhat.com/archives/pulp-list/2016-May/msg00042.html
>
> I can't quickly find another summary of why, so I'll describe the
> highlights here:
>
> - Pulp has highly relational data. The core use case is managing the
> relationships between content and repositories. Using a relational DB
> makes that a lot easier.
> - A schemaless DB makes it easy to do writes, but you have to be very
> careful when doing reads that the your software is prepared for whatever
> data structure comes out. If you want to enforce a schema, it has to be
> done in software. It's doable, but requires great care.
> - Transactions!
> - The HA story with mongodb is more complex than most people realize
> (certainly more complex than we expected). To get real HA with data
> safety, you have to do a lot of the work in your own software.
>
> MongoDB is great at what it does and a good fit for some use cases, but
> we learned that it's not the best fit for Pulp.
>
> Michael
>
> On Tue, Sep 13, 2016 at 3:21 AM, Filip Nguyen <fnguyen at redhat.com
> <mailto:fnguyen at redhat.com>> wrote:
>
>     I heard that Pulp is switching from Mongo to Postgre. Just out of
>     curiosity, I would like to learn more about the reasons why you
>     decided to go this direction. Is there any document/email thread
>     about it?
>
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev at redhat.com <mailto:Pulp-dev at redhat.com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev
>     <https://www.redhat.com/mailman/listinfo/pulp-dev>
>
>
>
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>