[Pulp-list] RFC: Should the migration system apply all migrations to new systems?

Randy Barlow rbarlow at redhat.com
Tue Feb 19 16:33:38 UTC 2013


I have an RFC for you around this question:

Should our migration system be altered to apply historical migrations to 
new installations?

I'll give you some background about how the migration system currently 
works, and why. Then I'll discuss why I think we might want to change the 
behavior. I'd like each of you to carefully consider this change, to try 
to see if you can think of any problems that it might lead to.

Background

As it is currently implemented, the migration system will skip all 
existing migrations for a specific migration package (i.e., pulp-rpm) if 
it detects that it has never run pulp-manage-db before while that package 
was installed.

The reason it works this way was that we wanted to be able to skip the 
application of migrations to new systems, since new systems shouldn't need 
migrations.

Why We Might Want to Change

If a plugin writer forgets to configure his or her plugin to advertise 
itself to the migration system, this behavior can lead to problems. In 
fact, we have just such a state right now with the pulp-rpm-plugins 
package[0]. In that bug report, it is noted that we had forgotten to 
include the egg-info in our RPM for the pulp-rpm-plugins package, which 
means that the package's migrations (and ISO plugins) are not advertised 
to Pulp. Because of this, the migration system will not mark that they 
have applied any migrations, or even that they ever had this package 
installed. This means that once we correct the issue and users upgrade 
from 2.0.z to X.Y.Z, any migrations that we wrote in between 2.0.z and 
X.Y.Z will not be applied.

A Proposed Change

I propose that we alter the migration system to not behave this way 
anymore, but to always start with migration version 0 and apply all the 
way to the latest available version. This will allow us to resolve 
#909366[0], and I believe it will be safe if the migration writers are 
careful to detect whether or not their migration should be applied.

Potential Problems

Most migrations are probably along the lines of looping over database 
objects and renaming a field to another field, or computing a new field 
based on some kind of state. These sorts of migrations should be safe to 
apply to new installations because the loop will execute 0 times as there 
are no objects in the DB yet.

However, migrations don't have to loop over database objects. In fact, 
they aren't constrained in any way. They are just a Python method that can 
do anything, and Pulp just tracks whether it has been called or not.

If there were to be a migration that did something that was tricky to 
detect whether it had already been applied yet, that would be problematic 
for this approach. I cannot think of such a use case myself, which is why 
I am writing this RFC. Here's a non-realistic example, but illustrates a 
case that might be tough to detect. I realize that this specific 
case is not something we would ever do, so consider it just for the 
purpose of illustration. Suppose that I wrote a migration that would 
insert an RPM to the DB that was just named example-<todays_date>.rpm. 
Obviously, this is silly, but you might see that it would be tough for me 
to be able to detect whether or not this migration had run before to avoid 
running it again. Again, that is not even close to a real world example, 
but I cannot myself think of a real world example.

Can any of you see a problem with this plan? Are there any examples you 
can think of that are real world use cases for the migration system that 
this would be a problem for?

Thanks for reading, and for your consideration!

[0] https://bugzilla.redhat.com/show_bug.cgi?id=909366

-- 
Randy Barlow




More information about the Pulp-list mailing list