[Pulp-list] Package path enhancements in pulp

John Morris john at zultron.com
Thu Mar 29 22:07:30 UTC 2012


Hi Pradeep,

> We recently ran into an issue where in some situations package paths in
> pulp could collide. The relevant bug is here #798656. Due to this we
> decided to change the package path location to include the whole package
> checksum instead of first three characters. Though the change sounds
> simple, the path for migration is involved. The following wiki page
> illustrates the changes in detailed

I'm curious under what situations that collision may occur?

One nice thing about the old directory structure is the 
%{name}/%{version}/%{release}/%{arch} pattern matches koji's.  I 
previously had some thought why the same structure for both would be 
beneficial, but now I've forgotten.  :P

There's another minor concern about the directory structure in general, 
with our without the extra level.  The use of symlinks to point from the 
repo packages directory into grinder's multi-level structure takes a lot 
of disk activity to do any sort of scan that does a stat() on each RPM. 
  Following each symlink requires traversing 4 or 5 directories that are 
unlikely to be in the fs cache.  For example, compare times of '/bin/ls' 
and '/bin/ls -l' in the repo packages directory.

Daily grinder syncs of large repos, like Fedora, can take quite a long 
time even when there are few changes.  I suspect this to be a 
contributing factor.  Has there been any thought about making this more 
efficient, perhaps by creating hard links, or by updating a database 
with grinder's sync status?

The list archives don't have anything on the thinking behind this 
structure.  File de-duping and bandwidth savings are clear benefits, but 
I'd like to hear thoughts on whether others have this same concern, or 
more likely whether I'm just not doing something right.  ;)

	John




More information about the Pulp-list mailing list