Change to bzip2?

Jeff Johnson n3npq at nc.rr.com
Wed Feb 2 16:20:10 UTC 2005


Jindrich Novy wrote:

>On Wed, 2005-02-02 at 15:45 +0100, Florian La Roche wrote:
>  
>
>>On Wed, Feb 02, 2005 at 06:03:24AM -0800, Steve G wrote:
>>    
>>
>>>Hi,
>>>
>>>With the discussion about trimming specfile changelogs to save space and improve
>>>downloads...why not go one step further? Mandrake has been using bzip2 for a
>>>while and it works just as well and files are significantly smaller. The
>>>conversion could be done in several steps:
>>>
>>>1) man pages - less already handles bzipped man pages
>>>2) info pages - I submited patch in bz #128637 to try to get it working
>>>3) tar
>>>4) rpms - I'm sure the patch is in Mandrake's version
>>>
>>>Thoughts?
>>>      
>>>
>>bzip2 is only used for the cpio-packed file-data, the rpm-header is
>>not compressed. For the repo-data the changelog can also be trimmed,
>>only if you need to copy the rpm header unmodified this is actually
>>getting a problem (e.g. if you later-on want to verify the md5sum to
>>be the same as in full rpms you download or similar things).
>>
>>I think staying with gzip is ok as it really is a good middle ground
>>between speed and disk compression ratio. bzip2 "feels" noticable slower.
>>    
>>
>
>In my opinion a conversion to bzip2 is a right thing to do. I'm also
>trying to keep almost everything compressed to bzip2 because of its
>significantly better compression scheme and performance. I'll illustrate
>this on the mc tarball:
>
>-rw-rw-r--  1 jnovy jnovy 2831562 Jan 28 09:52 mc-4.6.1-pre3.tar.bz2
>-rw-rw-r--  1 jnovy jnovy 3956127 Feb  2 15:26 mc-4.6.1-pre3.tar.gz
>
>where we can see that the gzipped tarball is larger of more than 1/3 in
>comparison with the bzipped one. Decompression times are:
>
>gunzip decompression:
>real    0m0.257s
>user    0m0.198s
>sys     0m0.059s
>
>bunzip2 decompression:
>real    0m1.665s
>user    0m1.567s
>sys     0m0.098s
>
>so a conclusion could be that bunzip2 is about 6-7 times slower than
>gunzip. This is unfortunately a common myth among developers because
>bzip2 uses the best compression (-9, so 900k blocks for BWT) by default
>and gzip uses compromised performance (-6), but that means something
>different compared to bzip2 since gzip is LZ77 based.
>
>bzip2 is scalable enough to use even better compression times or
>performance. If you consider that for the fastest (and worst) -1
>compression with bzip2 you'll get:
>
>-rw-rw-r--  1 jnovy jnovy 3592894 Feb  2 16:08 mc-4.6.1-pre3.tar.bz2
>
>what is even better than the best compression (-9) with gzip and
>decompression time is:
>
>real    0m1.076s
>user    0m1.003s
>sys     0m0.073s
>
>so about 4 times slower than gzip.
>
>The question is what is the priority at the moment, if a space consumed
>by the file or a decompression time. 
>
>There are also some projects such as pbzip2
>(http://compression.ca/pbzip2/) that uses a fact that bzip2 actually
>compresses parts of large files in separated blocks, so that the BWT and
>Huffman encoding phase can be performed separately on these blocks
>simultaneously in multiple threads what speeds compression/decompression
>times significantly up on smp machines.
>
>Further if you consider scalability of bzip2 which has a compression
>range:
>
>best (-9): 2831562, worst (-1): 3592894
>and gzip:
>best (-9): 3931362, worst (-1): 4634277
>
>I think bzip2 is the winner at least from the future point of view.
>  
>

Nicely done.

Now try to get the -9 changed in rpm.

And it also makes little sense to bzip tarballs that end up in gzipped 
payloads imho.

73 de Jeff





More information about the fedora-devel-list mailing list