Change to bzip2?

Jeff Johnson n3npq at nc.rr.com
Wed Feb 2 21:00:32 UTC 2005


Steve G wrote:

>>Of course, when you're talking about man and info pages, what
>>are the chances that you actually save significant space when
>>you take the filesystem block size into account?
>>    
>>
>
>To me, its not just about diskspace. Its also about bandwidth. I don't know about
>how block size affects the following data, but here it is:
>
>Uncompressed man page:
>du -sh /usr/share/man/man?
>26M     /usr/share/man/man1
>2.9M    /usr/share/man/man2
>52M     /usr/share/man/man3
>2.5M    /usr/share/man/man4
>3.7M    /usr/share/man/man5
>16K     /usr/share/man/man6
>1.7M    /usr/share/man/man7
>5.7M    /usr/share/man/man8
>8.0K    /usr/share/man/man9
>1.3M    /usr/share/man/mann
>
>Using gzip:
>du -sh /usr/share/man/man?
>17M     /usr/share/man/man1
>2.5M    /usr/share/man/man2
>40M     /usr/share/man/man3
>640K    /usr/share/man/man4
>2.2M    /usr/share/man/man5
>16K     /usr/share/man/man6
>1016K   /usr/share/man/man7
>4.2M    /usr/share/man/man8
>8.0K    /usr/share/man/man9
>684K    /usr/share/man/mann
>Total 82M
>
>Using bzip2:
>du -sh /usr/share/man/man? 
>16M     /usr/share/man/man1
>2.5M    /usr/share/man/man2
>40M     /usr/share/man/man3
>588K    /usr/share/man/man4
>2.1M    /usr/share/man/man5
>16K     /usr/share/man/man6
>976K    /usr/share/man/man7
>4.2M    /usr/share/man/man8
>8.0K    /usr/share/man/man9
>680K    /usr/share/man/mann
>Total 81M
>
>One thing that skews the results is that some files were not compressed with
>bzip2 because they were symlinked.
>  
>

Calculating sizes may seem useful, but you're honking the wrong horn 
imho, if, for
nother reason, with both payload *AND* man page compression settable,
it really makes no difference counting man pages and summing sizes.

What is really needed is to change package transport, not diddle with 
package guts,
to use rsync like, rather than raw http transport.

For starters, all the mirroring of distros is rather simple minded atm.

So a new package is added.

rsync is fired up, and the remote site does not have that path.

What does rsync do? Copies the entire file.

Each additional rsync invoccation verifies that, indeed, the client and 
server
have identical content. Well, duh.

There is a fuzzy patch to rsync that matches on path, looking at suffix
like .rpm first, then choosing closest similar path as refence on remote.

That patch (with whatever sanity hardening necessary to map the 
functionality
to *only* rpm packages is needed, as the fuzzy patch is perhaps too risky
as is) needs to be wired into the rsync package.

Then -- since rsync is known to be sub-optimal with compressed payloads --
Rusty Russel's gzip.rsync.patch2 needs to be added to rpm. That patch
was in rpm-4.0.4, but alas, got blown out of rpm sources by the zlib
double free errata fire drill several years ago.

The patch is now (again) in rpm-4.4.1 and later.

There are quite promising hints of bandwidth savings (for apt, dunno rpm 
yet)

https://svn.uhulinux.hu/packages/dev/zlib/patches/02-rsync.patch

Explicit objective metrics of bandwidth savings for mirrors if both
package payload end-points include Rusty Russell's voo-doo will
only help stimulate development of better client transport protocols.

Or keep honking man pages in comnpressed with either bzip2 or gzip if
that floats your boat.

73 de Jeff






More information about the fedora-devel-list mailing list