[Vdo-devel] VDO is not compressing some (easy compressible) files

Michael Sclafani sclafani at redhat.com
Mon May 13 11:01:17 UTC 2019


VDO provides block-level deduplication and compression using a 4K block
size. The way compression in VDO works is each block that doesn't
deduplicate is compressed. The compressed blocks are packed together with a
small header into a 4K block. Those packed blocks are written and shared in
almost exactly the same way as VDO shares normal 4K data blocks for
deduplication.

This success of this approach relies on variation in the compressed size of
blocks, or on blocks being very compressible. Unfortunately, in your
dataset, the compressed size of the blocks are always between 2K and 3K, so
there are no two that VDO can pack together to save space.

lz4 block compression is not as efficient as the streaming compression you
obtained with the lz4 tool, which is why they only shrank to 50% or larger,
not the 40% you saw in your test.


On Sun, May 12, 2019 at 5:13 PM civic9 <civic9 at gmail.com> wrote:

> Hi, I created issue on github for this
> (https://github.com/dm-vdo/kvdo/issues/20) but this is probably better
> place.
>
> I have a strange problem. VDO don't want to compress some of my CSV files
> (which are easy compressible, I think). Command line lz4 tool doesn't
> have any problem with them.
>
> # du /1/test2.csv
> 102400  /1/test2.csv
>
> # lz4 -1 /1/test2.csv /1/test2.csv.lz4
> Compressed 104857600 bytes into 42259380 bytes ==> 40.30%
>
> # vdo status -n vdobackup|grep -i Compression
>     Compression: enabled
>
> # vdostats --verbose vdobackup|egrep 'used|compress'
>   data blocks used                    : 13056414
>   overhead blocks used                : 1292638
>   logical blocks used                 : 17262451
>   1K-blocks used                      : 57396208
>   used percent                        : 27
>   compressed fragments written        : 773855
>   compressed blocks written           : 305048
>   compressed fragments in packer      : 0
>   KVDO module bytes used              : 1016029328
>   KVDO module peak bytes used         : 1016031648
>   KVDO module bios used               : 74572
>
> # cp /1/test2.csv /mnt/backup/1/; sync
>
> # vdostats --verbose vdobackup|egrep 'used|compress'
>   data blocks used                    : 13082012
>   overhead blocks used                : 1292670
>   logical blocks used                 : 17288052
>   1K-blocks used                      : 57498728
>   used percent                        : 27
>   compressed fragments written        : 773855
>   compressed blocks written           : 305048
>   compressed fragments in packer      : 0
>   KVDO module bytes used              : 1016029328
>   KVDO module peak bytes used         : 1016031648
>   KVDO module bios used               : 74572
>
> As you can see:
> "data blocks used" increased by: 25598
> "compressed fragments/blocks" are the same.
>
> No other activity at the same time. It is not the only one file with
> this problem.
> I have a few GB of files with similar content and none of them is
> compressed on the vdo volume.
> Compression on other type of files works fine.
>
> My VDO is created on top of standard disk partition. No LVM, no encryption.
> kvdo from rhel/centos:
>
> # uname -a
> Linux .... 3.10.0-957.10.1.el7.x86_64 #1 SMP Mon Mar 18 15:06:45 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
> # rpm -qa|grep -i vdo
> vdo-6.1.1.125-3.el7.x86_64
> kmod-kvdo-6.1.1.125-5.el7.x86_64
>
> Test file: https://github.com/dm-vdo/kvdo/files/3170189/test2.csv.gz
>
> Any ideas?
>
> _______________________________________________
> Vdo-devel mailing list
> Vdo-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/vdo-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vdo-devel/attachments/20190513/8873d633/attachment.htm>


More information about the vdo-devel mailing list