[vdo-devel] [RFA] strange 'data blocks used' behavior in vdostats
Louis Imershein
limershe at redhat.com
Tue Oct 20 17:08:25 UTC 2020
One thing I've seen in the past which can result in strange behavior is
that 4K aligned runs of zeros (zero blocks) get eliminated even with
deduplication disabled.
Could blocks like this exist for your larger files? Just a thought.
-louis
On Tue, Oct 20, 2020 at 9:24 AM Sweet Tea Dorminy <sweettea at redhat.com>
wrote:
> Hi Philipp:
>
> Quick question: You mentioned writing the same file to VDO over and
> over; are you using a filesystem atop VDO, or are you using dd or
> equivalent to write the file to the VDO device at different offsets?
>
> Thanks!
>
> On Tue, Oct 20, 2020 at 12:04 PM Philipp Rudo <prudo at linux.ibm.com> wrote:
> >
> > Hi everybody,
> >
> > I'm a kernel developer for s390. Together with Leon, I'm currently
> trying to
> > evaluate the cost & gains from using deflate instead of lz4 in vdo. The
> idea is
> > to make use of the in hw deflate implementation introduced with our
> latest machine
> > generation. In this effort we are currently running some tests on the
> > compression ratio which show a rather peculiar behavior we don't
> understand. So
> > we are reaching out to you in the hope you can help us finding out
> what's going
> > on.
> >
> > In our test (details below) we simply copy the same file (~5 MB) to a
> vdo device
> > until we reach the target logical size (deduplication disabled). Then we
> wait
> > till the packer is finished and get the vdostats. As
> > block size << file size << target size we expected to get a constant
> 'saving
> > percent'. But what we see is that the 'saving percent' starts high, has a
> > minimum at ~10GB and then grows again (seemingly logarithmic). While the
> > behavior for small sizes can be explained by a constant overhead the
> > logarithmic grows for large sizes looks odd to us.
> >
> > Looking at the raw data we noticed that the 'logical blocks used' grow
> linear
> > with the target size (as expected) while the 'data blocks used' have a
> rather
> > irrational behavior. What especially surprises us is that the 'data
> blocks
> > used' reach a peak at ~20GB and then go down again. So although more of
> the
> > same data is compressed with the same algorithm less disk space is used?
> >
> > We can reproduce this behavior with our prototype (both algorithms), an
> official
> > RHEL 8.3 build and different files.
> >
> > Do you have an idea what causes this behavior? Are we missing something
> > fundamental?
> >
> > Thanks and sorry for the long mail
> > Philipp
> >
> > ----
> > Test details:
> >
> > OS: RHEL 8.3 Snapshot 3
> > kernel: 4.18.0-235.el8.s390x
> > vdo: 6.2.3.114-14.el8
> > file: bible.txt from
> http://corpus.canterbury.ac.nz/resources/large.tar.gz
> >
> > size (in MB) logical blocks used data blocks used saving
> percent
> > 100 296241 22395 92.4%
> > 1000 526688 231178 56.1%
> > 2000 782848 462765 40.8%
> > 4000 1295170 931232 28.1%
> > 6000 1807495 1405123 22.2%
> > 8000 2318824 1865933 19.5%
> > 10000 2831146 2268763 19.8%
> > 12000 3343470 2503638 25.1%
> > 14000 3854801 2534747 34.2%
> > 16000 4367119 2607669 40.2%
> > 18000 4879445 2821335 42.1%
> > 20000 5390780 2824725 47.6%
> > 22000 5903107 2909582 50.7%
> > 24000 6415433 2727539 57.4%
> > 26000 6927756 2681278 61.2%
> > 28000 7439083 2647278 64.4%
> > 30000 7951405 2433786 69.3%
> > 32000 8463724 2428650 71.3%
> >
> > _______________________________________________
> > vdo-devel mailing list
> > vdo-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/vdo-devel
> >
>
> _______________________________________________
> vdo-devel mailing list
> vdo-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/vdo-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vdo-devel/attachments/20201020/e5285f8c/attachment.htm>
More information about the vdo-devel
mailing list