[vdo-devel] [RFA] strange 'data blocks used' behavior in vdostats

Fri Oct 30 16:10:34 UTC 2020

Hi,

we made a interesting observation. The behavior only occurs when we start
too many dd processes at once. When we limit the number of processes that run
in parallel the behavior disappears and the saving percent goes down to a
reasonable level.

I double checked the data written to the device for the failing case but
everything looks fine. So all in all it looks like a race in the statistics to
me.

Thanks and have a nice weekend
Philipp

On Thu, 29 Oct 2020 13:59:21 +0100
Philipp Rudo <prudo at linux.ibm.com> wrote:

> Hi Sweet,
> Hi Louis,
> 
> On Wed, 21 Oct 2020 11:57:07 +0200
> Philipp Rudo <prudo at linux.ibm.com> wrote:
> 
> > Hi Sweet,
> > Hi Louis,
> > 
> > thanks for the quick reply.
> > 
> > On Tue, 20 Oct 2020 10:08:25 -0700
> > Louis Imershein <limershe at redhat.com> wrote:
> >   
> > > One thing I've seen in the past which can result in strange behavior is
> > > that 4K aligned runs of zeros (zero blocks) get eliminated even with
> > > deduplication disabled.
> > > Could blocks like this exist for your larger files?  Just a thought.    
> > 
> > we are using the large and misc corpora from the canterburry corpus [1]. The in
> > them are simple text files with different content. So no zero blocks included.
> >   
> > > On Tue, Oct 20, 2020 at 9:24 AM Sweet Tea Dorminy <sweettea at redhat.com>
> > > wrote:
> > >     
> > > > Hi Philipp:
> > > >
> > > > Quick question: You mentioned writing the same file to VDO over and
> > > > over; are you using a filesystem atop VDO, or are you using dd or
> > > > equivalent to write the file to the VDO device at different offsets?    
> > 
> > in our current test we are using a ext4 filesystem and simply cp the files to
> > it. I wanted to keep it simple and expected that any filesystem related effect
> > would average out, as we write so many copies to the device.
> > 
> > But you are right. We should have a test without filesystem and dd the data to
> > the device. Let's see what it brings.  
> 
> we re-ran our tests without a filesystem (details below). The numbers are a
> little different but the overall behavior looks similar. What's missing are the
> high savings for small target sizes, which supports our theory that it came
> from the fs overhead. What we still see is that the 'data blocks used' decline
> although the 'logical blocks used' go up.
> 
> The bible.txt below is the most extreme example. In that case the physical
> blocks go down by ~1.8 GB although we write ~25GB additional logical data. The
> other files we tested (the other files from the large corpus and pi.txt from
> the misc corpus) are less extreme. But none of them seem to have any correlation
> between logical and data blocks, and all show a decline at some point.
> 
> So long story short, it's not the file system. Do you have any other idea?
> 
> Thanks
> Philipp
> 
> ----
> Test details:
> 
> OS: RHEL 8.3 Snapshot 3
> kernel: 4.18.0-235.el8.s390x
> vdo: 6.2.3.114-14.el8
> fs: none
> file: bible.txt from http://corpus.canterbury.ac.nz/resources/large.tar.gz 
> 
> size in (MB)	 logical blocks used	 data blocks used	 saving percent
> 	  100		   24725		  22550			 8.8%
> 	 1000		  255162		 226455			11.3%
> 	 5000		 1279766		1149789			10.2%
> 	10000		 2559532		1741664			32.0%
> 	15000		 3839298		2080044			45.8%
> 	20000		 5119064		1878069			63.3%
> 	25000		 6399819		1682376			73.7%
> 	30000		 7679585		1807039			76.5%
> 	35000		 8959351		1733469			80.7%
> 	40000		10239117		1619199			84.2%
> 
> > 
> > Thanks
> > Philipp
> > 
> > [1] https://corpus.canterbury.ac.nz/descriptions/
> >   
> > > > On Tue, Oct 20, 2020 at 12:04 PM Philipp Rudo <prudo at linux.ibm.com> wrote:      
> > > > >
> > > > > Hi everybody,
> > > > >
> > > > > I'm a kernel developer for s390. Together with Leon, I'm currently      
> > > > trying to      
> > > > > evaluate the cost & gains from using deflate instead of lz4 in vdo. The      
> > > > idea is      
> > > > > to make use of the in hw deflate implementation introduced with our      
> > > > latest machine      
> > > > > generation. In this effort we are currently running some tests on the
> > > > > compression ratio which show a rather peculiar behavior we don't      
> > > > understand. So      
> > > > > we are reaching out to you in the hope you can help us finding out      
> > > > what's going      
> > > > > on.
> > > > >
> > > > > In our test (details below) we simply copy the same file (~5 MB) to a      
> > > > vdo device      
> > > > > until we reach the target logical size (deduplication disabled). Then we      
> > > > wait      
> > > > > till the packer is finished and get the vdostats. As
> > > > > block size << file size << target size we expected to get a constant      
> > > > 'saving      
> > > > > percent'. But what we see is that the 'saving percent' starts high, has a
> > > > > minimum at ~10GB and then grows again (seemingly logarithmic). While the
> > > > > behavior for small sizes can be explained by a constant overhead the
> > > > > logarithmic grows for large sizes looks odd to us.
> > > > >
> > > > > Looking at the raw data we noticed that the 'logical blocks used' grow      
> > > > linear      
> > > > > with the target size (as expected) while the 'data blocks used' have a      
> > > > rather      
> > > > > irrational behavior. What especially surprises us is that the 'data      
> > > > blocks      
> > > > > used' reach a peak at ~20GB and then go down again. So although more of      
> > > > the      
> > > > > same data is compressed with the same algorithm less disk space is used?
> > > > >
> > > > > We can reproduce this behavior with our prototype (both algorithms), an      
> > > > official      
> > > > > RHEL 8.3 build and different files.
> > > > >
> > > > > Do you have an idea what causes this behavior? Are we missing something
> > > > > fundamental?
> > > > >
> > > > > Thanks and sorry for the long mail
> > > > > Philipp
> > > > >
> > > > > ----
> > > > > Test details:
> > > > >
> > > > > OS: RHEL 8.3 Snapshot 3
> > > > > kernel: 4.18.0-235.el8.s390x
> > > > > vdo: 6.2.3.114-14.el8
> > > > > fs: ext4
> > > > > file: bible.txt from      
> > > > http://corpus.canterbury.ac.nz/resources/large.tar.gz      
> > > > >
> > > > > size (in MB)    logical blocks used     data blocks used        saving      
> > > > percent      
> > > > > 100             296241                  22395                   92.4%
> > > > > 1000            526688                  231178                  56.1%
> > > > > 2000            782848                  462765                  40.8%
> > > > > 4000            1295170                 931232                  28.1%
> > > > > 6000            1807495                 1405123                 22.2%
> > > > > 8000            2318824                 1865933                 19.5%
> > > > > 10000           2831146                 2268763                 19.8%
> > > > > 12000           3343470                 2503638                 25.1%
> > > > > 14000           3854801                 2534747                 34.2%
> > > > > 16000           4367119                 2607669                 40.2%
> > > > > 18000           4879445                 2821335                 42.1%
> > > > > 20000           5390780                 2824725                 47.6%
> > > > > 22000           5903107                 2909582                 50.7%
> > > > > 24000           6415433                 2727539                 57.4%
> > > > > 26000           6927756                 2681278                 61.2%
> > > > > 28000           7439083                 2647278                 64.4%
> > > > > 30000           7951405                 2433786                 69.3%
> > > > > 32000           8463724                 2428650                 71.3%
> > > > >
> > > > > _______________________________________________
> > > > > vdo-devel mailing list
> > > > > vdo-devel at redhat.com
> > > > > https://www.redhat.com/mailman/listinfo/vdo-devel
> > > > >      
> > > >
> > > > _______________________________________________
> > > > vdo-devel mailing list
> > > > vdo-devel at redhat.com
> > > > https://www.redhat.com/mailman/listinfo/vdo-devel
> > > >
> > > >