[Pulp-list] Restructuring proposal for /var/lib/pulp

Dennis Gregorovic dgregor at redhat.com
Tue Aug 31 18:16:55 UTC 2010


On Tue, 2010-08-31 at 14:05 -0400, Pradeep Kilambi wrote:
> ----- Original Message -----
> From: "Dennis Gregorovic" <dgregor at redhat.com>
> To: "Pradeep Kilambi" <pkilambi at redhat.com>
> Cc: "pulp-list" <pulp-list at redhat.com>
> Sent: Tuesday, August 31, 2010 12:55:42 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Pulp-list] Restructuring proposal for /var/lib/pulp
> 
> On Tue, 2010-08-31 at 11:47 -0400, Pradeep Kilambi wrote:
> > I'm currently working on a task to make our packages globally unique
> > across the pulp server. Meaning, we'll have one and only one package
> > with same NVREA + checksum at a time and linked to multiple repos. For
> > this I need to restructure our content directory a bit. This is my
> > proposal,
> > 
> > Current Structure,
> > 
> > /var/lib/pulp/
> >              |_ ReopA/
> >              |
> >              |_ RepoB/
> > 
> > Proposed New Structure,
> > 
> > /var/lib/pulp/
> >              |_Packages/checksum[:3]/*.rpm
> >              |
> >              |_Repos/
> >                     |_RepoA
> >                     |_RepoB
> > 
> > Packages in RepoA and RepoB will be symlinks to Packages directory.
> > Apache will now only expose /var/lib/pulp/Repos and not include
> > Packages directory. 
> > 
> > Considering this is kinda major change I'm  working off a branch and
> > will see how stable it is before I merge it into this sprint. Overall
> > change is gonna be to both grinder and pulp. 
> > 
> > Lemme know if you have any concerns with this change,
> 
> What's the rationale for making NVREA+checksum globally unique?  I would
> think either NVRA+signature or just NVRA would be unique.  Also, note
> that I've dropped the epoch from the clause.  Brew, koji, and RHN all
> prevent multiple RPMs with the same NVRA and different epoch.  I think
> that's a good practice to continue.
> 
> The directory structure that we are using for the CDN is:
> 
> /<root>/
>         |_origin/
>         |       |
>         |        <name>/
>         |              |_<version>/
>         |                         |_<release>/
>         |                                    |_<signature key>/*.rpm
>         |_content/<...>/<symlinks>
> 
> This path makes it quick and easy to find a package.  The one issue I
> have is that the <name> directory has around 8k subdirectories.  So,
> opening that directory takes some time, but it's an operation that is
> rarely performed.
> 
> Also, I would suggest lowercase directory names.
> 
> Cheers
> -- Dennis
> 
> 
> 
> The uniqueness on NVRA + checksum basically the case you mentioned
> where the packages are from different vendors but same same NVRA. RHN
> does ignore epoch from filesystem standpoint but it does account for
> it in DB. So we do keep epoch in the DB. I'm fine with splitting the
> path with /name/version/release/. 

It sounds like one of the requirements, then, is to support different
packages that have the same NVRA but different checksum as a result of
coming from different vendors.  In that case, I would make NVRA+vendor
the unique key.  Alternatively, if checksum uniqueness really is the
requirement (i.e. a single vendor can have different copies of the same
NVRA), then you could just make the unique constraint the checksum and
drop NVRA.  If you end up with two packages that have the same checksum
but are named with different NVRA, you have much bigger problems. ;)

I agree that epoch needs to be tracked, but my point is that it
shouldn't be included in any unique constraints.  

> 
> What do you guys do for signature key as sub directory if package is
> not signed? 
Use the word "none".

> 
> ~ Prad





More information about the Pulp-list mailing list