[Pulp-list] Snapshotting support for a pulp repo

Mihai Ibanescu mihai.ibanescu at gmail.com
Fri Jan 22 00:01:51 UTC 2016


Hi,

I am investigating how I can implement snapshotting support for Pulp repos
(not with a plugin, at least not for now, but as a client).

Essentially, I need to make a copy of the pulp repo after each "logical
write" operation (a set of updates/copies from other repos, and before the
publish action).

One way I've thought about it:

* before calling publish on repo-1, create a repo-1__<timestamp> repo -
here timestamp is down to millisecond or less, so the chance of a clash
with another snapshot operation are slim
* copy everything from repo-1 onto repo-1__<timestamp>
* read the previous timestamp from the repo's notes section (if present)
* if present, compare the contents of the previous timestamp repo with the
current one. If they are the same, delete the newly created
repo-1__<timestamp>, and do nothing else
* if not present, or if the contents are different, write
repo-1__<timestamp> in the repo's notes section
* periodically clean up older repo-1__<timestamp> repos

There are clearly major concurrency issues/race conditions here.
* What happens if the contents of repo-1 change while I am performing the
comparison? Nothing bad in this case, that's the reason I chose to do the
copy first (in order to avoid comparing the repo with the previous
timestamped copy)
* What happens if another "snapshot" operation happens while I am doing
those calculations? If I could guarantee that the updates to the notes
section happen in the same order, nothing; I may end up having two
timestamped copies that are identical, generated very shortly one after the
other. If snapshot 1 starts, snapshot 2 starts and updates repo 1, then
snapshot 1 updates repo 1, I just overwrote a newer snapshot. If pulp had
ETag support and the PUT operation to update the notes had conditionals
(like If-None-Match), then I'd be able to detect that case.

Has anyone tried this kind of thing ever?

It is, in a sense, like git/scm - each commit gets its own changeset id,
and HEAD is always updated to point to the latest changeset.

This can probably be implemented much more efficiently as a distributor,
that doesn't create any distributable content, but only snapshots the state
of the pulp repo. Maybe such a thing already exists?

Thank you!
Mihai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20160121/65f63f05/attachment.htm>


More information about the Pulp-list mailing list