[Pulp-list] Synchronising Disconnected Pulp Servers

ben.stanley at exemail.com.au ben.stanley at exemail.com.au
Fri Sep 18 00:42:40 UTC 2015


Synchronising Disconnected Pulp Servers
=======================================

Situation
---------

I maintain two pulp servers:

pulp-A is connected to the internet, and downloads 583 repos.

pulp-B is disconnected from the rest of the world.

The goal is to transfer the 583 repos from pulp-A to pulp-B using only a
USB HDD.

Naive Solution
--------------

Initially, I tried to solve this problem using

pulp-admin rpm repo export run --repo-id=${REPO_ID} --export-dir=${DIR}

I constructed a script to run this command for all 583 repo-ids. This had
the following problems:
1) It wasn't going to finish copying inside a week
2) It was going to fill up more than the entire HDD with replicated binary
copies of every RPM package that is duplicated between separate
repositories.

Clearly the export command is a poor solution to this problem.

Workable Solution
-----------------

After trying various other ideas and false starts, I have come up with the
following solution:
1) rsync the pulp internal representation to the USB HDD:
rsync --recursive --no-inc-recursive --links --hard-links --delete --times
--progress /var/lib/pulp ${EXPORT_DIR}/var_lib_pulp
The complete rsync takes about a day the first time.
2) All the rpms are symlinked to absolute paths starting with
/var/lib/pulp/content . This prevents the USB HDD from being used as a
repository itself. The symlinks must be re-written from absolute paths to
relative paths.
This process takes 4 days to complete with a bash script, perhaps less
with a custom C program.
Furthermore, on subsequent synchronisations (for updates), the rsync will
convert all the relative symlinks back to absolute symlinks, so that the
symlink conversion process must be repeated from scratch every time. This
is a big waste of time.

After performing the two steps above, the USB HDD becomes usable as a
bunch of repositories in its own right. It can also be used as the feed to
update the disconnected machine pulp-B. I have written some scripts to
achieve this.

Improving the Solution by Patching Pulp
---------------------------------------
Now, the critical observation is this:
If pulp stored its internal symlinks as relative paths instead of absolute
paths, or perhaps even as hardlinks, then the second step of converting
absolute symlinks to relative symlinks by bash script would be
un-necessary, saving much time in the synchronisation process.

Is there any objection to submitting a patch to amend pulp to use relative
symlinks instead of absolute symlinks internally?

Perhaps it would be better to use hardlinks? This implies that
/var/lib/pulp must be stored entirely within the same filesystem. I don't
see this as a big problem, but others might.

Ideal Solution
--------------

I agree that the method of disconnected synchronisation outlined above is
not ideal, but it is the only thing I have that works. Feel free to
propose a more comprehensive solution for later, but I need something that
works now.

Ben Stanley.





More information about the Pulp-list mailing list