Noarch subpackage problem

Toshio Kuratomi a.badger at gmail.com
Thu Feb 26 06:45:22 UTC 2009


Toshio Kuratomi wrote:
> Florian Festi wrote:
>>> But building things as arch specific subpackages when they could be
>>> noarch is a feature that costs us what exactly?  A bit of space?
>> I think you underestimate the amount of noarch data in the distribution.
>> From approximately 40 GB of content (in files) about 29 GB are not arch
>> dependent while there are only 10.7GB of binaries and libraries. Even if
>> you assume that the noarch content is compressed twice as good as the
>> binaries the larger part of the distribution is still packaged noarch
>> content. Right now about half of the noarch content already is in
>> regular noarch packages.
>>
>> So even if we only save about 10GB of mirror space for one release for
>> now it sums up over time for updates and new releases. I even expect
>> that the percentage of noarch content is increasing in the future and
>> every new supported architecture will automatically gain from the work
>> done (if there going to be any).
>>
> 
> Ah... but I'm not talking about turning noarch subpackages off. I'm
> talking about talking about increasing the level of checking that
> happens before a noarch subpackage is allowed.  So how much of the
> content that you list in 29GB saved is %doc?  How much is scripting
> languages that we could decide to select on via path and filename
> extension?  How much of those are headers that do not have timestamps or
> build hosts embedded into them?  We have the capability to noarh
> subpackage all of those if we turn on an rpmdiff that does md5sum
> checking but can exclude those properties.
> 
> I think we'll see substantial savings from allowing through things that
> meet a heuristic while still placing the burden of checking this onto an
> automated tool instead of a human.
> 
And so that you can run a test...

I took the rpmdiff that is currently in the koji repo and modified it to
 have a --lenient-hash option.  --lenient-hash currently compares a hash
of things that are:

* not %doc (won't matter for program execution)
* not *.pyc or *.pyo (Changes that are incompatible between a
sub-package noarch build should be caught by differences in the *.py file.)

So to test the effect, compare the difference between:

rpmdiff -iT -iS -i5 [noarch package built on x86_64] [noarch package
built on i386]

rpmdiff -iT -iS --lenient-hash [noarch package built on x86_64] [noarch
package built on i386]

If we can identify other classes of files that can be filtered safely,
and create false positives we can add a heuristic for them to see about
getting this down even more.

-Toshio
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: rpmdiff.mine
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20090225/733cab2d/attachment.ksh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/fedora-devel-list/attachments/20090225/733cab2d/attachment.sig>


More information about the fedora-devel-list mailing list