[dm-devel] I/O Reordering: Cache -> Backing Device

Eric Wheeler bcache at lists.ewheeler.net
Sat Jul 6 01:07:15 UTC 2019


[+cc dm-devel]

> -----Original Message-----
> From: linux-bcache-owner at vger.kernel.org <linux-bcache-owner at vger.kernel.org> On Behalf Of Coly Li
> Sent: Sunday, 30 June, 2019 19:24
> To: Don Doerner <Don.Doerner at Quantum.Com>
> Cc: linux-bcache at vger.kernel.org
> Subject: Re: I/O Reordering: Cache -> Backing Device
> 
> On 2019/6/29 5:56 上午, Don Doerner wrote:
> > Hello, I'm also interested in using bcache to facilitate stripe 
> > re-ass'y for the backing device.  I've done some experiments that 
> > dovetail with some of the traffic on this mailing list.  
> > Specifically, in this message 
> > (https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.spinics.net%2Flists%2Flinux-bcache%2Fmsg07590.html&data=02%7C01%7CDon.Doerner%40quantum.com%7Cafa50dd04a914f76bb7808d6fdcb338b%7C322a135f14fb4d72aede122272134ae0%7C1%7C0%7C636975446529502069&sdata=nC3JhPL%2FC6B57uw4xjEkGnV48jd9DqHLf0MQL7AAErs%3D&reserved=0), 
> > Eric suggested "...turning up 
> > /sys/block/bcache0/bcache/writeback_percent..." to increase the 
> > contiguous data in the cache.
> > My RAID-6 has a stripe size of 2.5MiB, and its bcache'ed with a few 
> > hundred GB of NVMe storage.  Here's my experiment:
> > * I made the cache a write back cache: echo writeback >
> > /sys/block/bcache0/bcache/cache_mode
> > * I plugged the cache: echo 0 >
> > /sys/block/bcache0/bcache/writeback_running
> > * I use a pathological I/O pattern, generated with 'fio': fio 
> >   --bs=128K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=1 
> >   --numjobs=1 --size=40G --name=/dev/bcache0.  I let it run to 
> >   completion, at which point I believe I should have 40 GiB of 
> >   sequential dirty data in cache, but not put there sequentially.  In 
> >   essence, I should have ‾16K complete stripes sitting in the cache, 
> >   waiting to be written.
> > * I set stuff up to go like a bat: echo 0 >
> > /sys/block/bcache0/bcache/writeback_percent; echo 0 >
> > /sys/block/bcache0/bcache/writeback_delay; echo 2097152 >
> > /sys/block/bcache0/bcache/writeback_rate
> > * And I unplugged the cache: echo 1 >
> > /sys/block/bcache0/bcache/writeback_running
> > I then watched 'iostat', and saw that there were lots of read operations (statistically, after merging, about 1 read for every 7 writes) - more than I had expected... that's enough that I concluded it wasn't building full stripes.  It kinda looks like it's playing back a journal sorted in time then LBA, or something like that...
> > Any suggestions for improving (reducing) the ratio of reads to writes will be gratefully accepted!
> 
> Hi Don,
> 
> If the backing device has expensive stripe cost, the upper layer should 
> issue I/Os with stripe size alignment, otherwise bcache cannot to too 
> much to make the I/O to be stripe optimized.
> 
> And you are right that bcache does not writeback in restrict LBA order, 
> this is because the internal btree is trend to be appended only. The LBA 
> ordering writeback happens in a reasonable small range, not in whole 
> cached data, see commit 6e6ccc67b9c7 ("bcache: writeback: properly order 
> backing device IO").
> 
> And I agree with you again that "improving (reducing) the ratio of reads 
> to writes will be gratefully accepted". Indeed not only reducing reads 
> to writes ratio, but also increase the reads to writes throughput. This 
> is something I want to improve, after I understand why the problem 
> exists in bcache writeback code ...


dm-devel list:

Does dm-writecache do any attempt to merge IOs into the io_opt size?

If so, bcache might get some ideas by looking at that codebase for its 
writeback thread.

--
Eric Wheeler


> 
> Thanks.
> 
> --
> 
> Coly Li
> The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through security software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
> 


More information about the dm-devel mailing list