[vdo-devel] [RFA] strange 'data blocks used' behavior in vdostats
Philipp Rudo
prudo at linux.ibm.com
Fri Oct 30 16:10:34 UTC 2020
Hi,
we made a interesting observation. The behavior only occurs when we start
too many dd processes at once. When we limit the number of processes that run
in parallel the behavior disappears and the saving percent goes down to a
reasonable level.
I double checked the data written to the device for the failing case but
everything looks fine. So all in all it looks like a race in the statistics to
me.
Thanks and have a nice weekend
Philipp
On Thu, 29 Oct 2020 13:59:21 +0100
Philipp Rudo <prudo at linux.ibm.com> wrote:
> Hi Sweet,
> Hi Louis,
>
> On Wed, 21 Oct 2020 11:57:07 +0200
> Philipp Rudo <prudo at linux.ibm.com> wrote:
>
> > Hi Sweet,
> > Hi Louis,
> >
> > thanks for the quick reply.
> >
> > On Tue, 20 Oct 2020 10:08:25 -0700
> > Louis Imershein <limershe at redhat.com> wrote:
> >
> > > One thing I've seen in the past which can result in strange behavior is
> > > that 4K aligned runs of zeros (zero blocks) get eliminated even with
> > > deduplication disabled.
> > > Could blocks like this exist for your larger files? Just a thought.
> >
> > we are using the large and misc corpora from the canterburry corpus [1]. The in
> > them are simple text files with different content. So no zero blocks included.
> >
> > > On Tue, Oct 20, 2020 at 9:24 AM Sweet Tea Dorminy <sweettea at redhat.com>
> > > wrote:
> > >
> > > > Hi Philipp:
> > > >
> > > > Quick question: You mentioned writing the same file to VDO over and
> > > > over; are you using a filesystem atop VDO, or are you using dd or
> > > > equivalent to write the file to the VDO device at different offsets?
> >
> > in our current test we are using a ext4 filesystem and simply cp the files to
> > it. I wanted to keep it simple and expected that any filesystem related effect
> > would average out, as we write so many copies to the device.
> >
> > But you are right. We should have a test without filesystem and dd the data to
> > the device. Let's see what it brings.
>
> we re-ran our tests without a filesystem (details below). The numbers are a
> little different but the overall behavior looks similar. What's missing are the
> high savings for small target sizes, which supports our theory that it came
> from the fs overhead. What we still see is that the 'data blocks used' decline
> although the 'logical blocks used' go up.
>
> The bible.txt below is the most extreme example. In that case the physical
> blocks go down by ~1.8 GB although we write ~25GB additional logical data. The
> other files we tested (the other files from the large corpus and pi.txt from
> the misc corpus) are less extreme. But none of them seem to have any correlation
> between logical and data blocks, and all show a decline at some point.
>
> So long story short, it's not the file system. Do you have any other idea?
>
> Thanks
> Philipp
>
> ----
> Test details:
>
> OS: RHEL 8.3 Snapshot 3
> kernel: 4.18.0-235.el8.s390x
> vdo: 6.2.3.114-14.el8
> fs: none
> file: bible.txt from http://corpus.canterbury.ac.nz/resources/large.tar.gz
>
> size in (MB) logical blocks used data blocks used saving percent
> 100 24725 22550 8.8%
> 1000 255162 226455 11.3%
> 5000 1279766 1149789 10.2%
> 10000 2559532 1741664 32.0%
> 15000 3839298 2080044 45.8%
> 20000 5119064 1878069 63.3%
> 25000 6399819 1682376 73.7%
> 30000 7679585 1807039 76.5%
> 35000 8959351 1733469 80.7%
> 40000 10239117 1619199 84.2%
>
> >
> > Thanks
> > Philipp
> >
> > [1] https://corpus.canterbury.ac.nz/descriptions/
> >
> > > > On Tue, Oct 20, 2020 at 12:04 PM Philipp Rudo <prudo at linux.ibm.com> wrote:
> > > > >
> > > > > Hi everybody,
> > > > >
> > > > > I'm a kernel developer for s390. Together with Leon, I'm currently
> > > > trying to
> > > > > evaluate the cost & gains from using deflate instead of lz4 in vdo. The
> > > > idea is
> > > > > to make use of the in hw deflate implementation introduced with our
> > > > latest machine
> > > > > generation. In this effort we are currently running some tests on the
> > > > > compression ratio which show a rather peculiar behavior we don't
> > > > understand. So
> > > > > we are reaching out to you in the hope you can help us finding out
> > > > what's going
> > > > > on.
> > > > >
> > > > > In our test (details below) we simply copy the same file (~5 MB) to a
> > > > vdo device
> > > > > until we reach the target logical size (deduplication disabled). Then we
> > > > wait
> > > > > till the packer is finished and get the vdostats. As
> > > > > block size << file size << target size we expected to get a constant
> > > > 'saving
> > > > > percent'. But what we see is that the 'saving percent' starts high, has a
> > > > > minimum at ~10GB and then grows again (seemingly logarithmic). While the
> > > > > behavior for small sizes can be explained by a constant overhead the
> > > > > logarithmic grows for large sizes looks odd to us.
> > > > >
> > > > > Looking at the raw data we noticed that the 'logical blocks used' grow
> > > > linear
> > > > > with the target size (as expected) while the 'data blocks used' have a
> > > > rather
> > > > > irrational behavior. What especially surprises us is that the 'data
> > > > blocks
> > > > > used' reach a peak at ~20GB and then go down again. So although more of
> > > > the
> > > > > same data is compressed with the same algorithm less disk space is used?
> > > > >
> > > > > We can reproduce this behavior with our prototype (both algorithms), an
> > > > official
> > > > > RHEL 8.3 build and different files.
> > > > >
> > > > > Do you have an idea what causes this behavior? Are we missing something
> > > > > fundamental?
> > > > >
> > > > > Thanks and sorry for the long mail
> > > > > Philipp
> > > > >
> > > > > ----
> > > > > Test details:
> > > > >
> > > > > OS: RHEL 8.3 Snapshot 3
> > > > > kernel: 4.18.0-235.el8.s390x
> > > > > vdo: 6.2.3.114-14.el8
> > > > > fs: ext4
> > > > > file: bible.txt from
> > > > http://corpus.canterbury.ac.nz/resources/large.tar.gz
> > > > >
> > > > > size (in MB) logical blocks used data blocks used saving
> > > > percent
> > > > > 100 296241 22395 92.4%
> > > > > 1000 526688 231178 56.1%
> > > > > 2000 782848 462765 40.8%
> > > > > 4000 1295170 931232 28.1%
> > > > > 6000 1807495 1405123 22.2%
> > > > > 8000 2318824 1865933 19.5%
> > > > > 10000 2831146 2268763 19.8%
> > > > > 12000 3343470 2503638 25.1%
> > > > > 14000 3854801 2534747 34.2%
> > > > > 16000 4367119 2607669 40.2%
> > > > > 18000 4879445 2821335 42.1%
> > > > > 20000 5390780 2824725 47.6%
> > > > > 22000 5903107 2909582 50.7%
> > > > > 24000 6415433 2727539 57.4%
> > > > > 26000 6927756 2681278 61.2%
> > > > > 28000 7439083 2647278 64.4%
> > > > > 30000 7951405 2433786 69.3%
> > > > > 32000 8463724 2428650 71.3%
> > > > >
> > > > > _______________________________________________
> > > > > vdo-devel mailing list
> > > > > vdo-devel at redhat.com
> > > > > https://www.redhat.com/mailman/listinfo/vdo-devel
> > > > >
> > > >
> > > > _______________________________________________
> > > > vdo-devel mailing list
> > > > vdo-devel at redhat.com
> > > > https://www.redhat.com/mailman/listinfo/vdo-devel
> > > >
> > > >
More information about the vdo-devel
mailing list