[Virtio-fs] [PATCH][RFC] Support multiqueue mode by setting cpu affinity

Stefan Hajnoczi stefanha at redhat.com
Tue Aug 27 14:42:58 UTC 2019


On Mon, Aug 26, 2019 at 09:08:20AM +0800, piaojun wrote:
> On 2019/8/21 23:38, Stefan Hajnoczi wrote:
> > On Fri, Aug 09, 2019 at 02:04:54PM +0800, piaojun wrote:
> >> Set cpu affinity for each queue in multiqueue mode to improve the iops
> >> performance.
> >>
> >> >From my test, the iops is increased by adding multiqueues as below,
> >> but it has not achieved my expect yet due to some reason. So I'm
> >> considering if we could drop some locks when operating vq as it is
> >> binded to one vCPU. I'm very glad to have a discuss with other
> >> developers.
> >>
> >> Further more, I modified virtiofsd to support multiqueue which just for
> >> testing.
> >>
> >> Test Environment:
> >> Guest configuration:
> >> 8 vCPU
> >> 8GB RAM
> >> Linux 5.1 (vivek-aug-06-2019)
> >>
> >> Host configuration:
> >> Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (8 cores x 4 threads)
> >> 32GB RAM
> >> Linux 3.10.0
> >> EXT4 + 4G Ramdisk
> >>
> >> ---
> >> Single-queue:
> >> # fio -direct=1 -time_based -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=1G -numjob=8 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> >> file: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
> >> ...
> >> fio-2.13
> >> Starting 8 processes
> >> Jobs: 8 (f=8): [w(8)] [100.0% done] [0KB/316.5MB/0KB /s] [0/81.2K/0 iops] [eta 00m:00s]
> >> file: (groupid=0, jobs=8): err= 0: pid=5808: Fri Aug  9 20:35:22 2019
> >>   write: io=9499.9MB, bw=324251KB/s, iops=81062, runt= 30001msec
> >>
> >> Multi-queues:
> >> # fio -direct=1 -time_based -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=1G -numjob=8 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> >> file: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
> >> ...
> >> fio-2.13
> >> Starting 8 processes
> >> Jobs: 8 (f=8): [w(8)] [100.0% done] [0KB/444.6MB/0KB /s] [0/114K/0 iops] [eta 00m:00s]
> >> file: (groupid=0, jobs=8): err= 0: pid=5704: Fri Aug  9 20:38:47 2019
> >>   write: io=12967MB, bw=442582KB/s, iops=110645, runt= 30001msec
> >> ---
> > 
> > How does the same fio command-line perform on the host when bound to 8
> > CPUs?
> 
> fio has great performance on host side, so the bottleneck should be at virtiofsd.
> 
> ---
> Run status group 0 (all jobs):
>   WRITE: bw=12.7GiB/s (13.6GB/s), 12.7GiB/s-12.7GiB/s (13.6GB/s-13.6GB/s), io=381GiB (409GB), run=30001-30001msec

Using just one file?

> > 
> > What about the virtiofsd changes?  Did you implement host CPU affinity
> > for the virtqueue processing threads and their workqueues?
> > 
> > I wonder if numbers are better if you use 8 files instead of 1 file.
> > 
> I implement host CPU affinity and re-design the testcase with 8 files,
> the result looks better:
> 
> ---
> [global]
> runtime=30
> time_based
> group_reporting
> direct=1
> bs=1M
> size=1G
> ioengine=libaio
> rw=write
> numjobs=8
> iodepth=128
> thread=1
> 
> [file1]
> filename=/mnt/virtiofs/file1
> numjobs=1
> [file2]
> filename=/mnt/virtiofs/file2
> numjobs=1
> [file3]
> filename=/mnt/virtiofs/file3
> numjobs=1
> [file4]
> filename=/mnt/virtiofs/file4
> numjobs=1
> [file5]
> filename=/mnt/virtiofs/file5
> numjobs=1
> [file6]
> filename=/mnt/virtiofs/file6
> numjobs=1
> [file7]
> filename=/mnt/virtiofs/file7
> numjobs=1
> [file8]
> filename=/mnt/virtiofs/file8
> numjobs=1
> 
> Single-Queue:
> Jobs: 8 (f=8): [W(8)] [100.0% done] [0KB/1594MB/0KB /s] [0/1594/0 iops] [eta 00m:00s]
> file1: (groupid=0, jobs=8): err= 0: pid=6379: Mon Aug 26 16:24:10 2019
>   write: io=46676MB, bw=1555.6MB/s, iops=1555, runt= 30007msec

The result improves greatly when using separate files.  I wonder what
the bottleneck is, maybe serialization in the guest kernel?

> 
> Multi-Queues(8):
> Jobs: 8 (f=8): [W(8)] [100.0% done] [0KB/4064MB/0KB /s] [0/4064/0 iops] [eta 00m:00s]
> file1: (groupid=0, jobs=8): err= 0: pid=5785: Mon Aug 26 16:26:46 2019
>   write: io=115421MB, bw=3847.2MB/s, iops=3847, runt= 30002msec
> 
> I write a draft patch for virtiofsd, but the sandbox make it hard to
> set affinity for each vq, as _SC_NPROCESSORS_ONLN always equals 1. So I
> just delete the related code for testing. Maybe we could create a
> thread pool before setup_sandbox() or some effective way. I'm glad to
> help finding out the solution.

Doing the setup before entering the sandbox sounds like a good idea.
That way the sandbox does not need to whitelist the required syscalls.

Will you add an option similar to:

  --request-queues N
  --request-queue-cpu-affinity N=CPU_A[,CPU_B][-CPU_C]

?

For example, with 2 request queues where queue#1 is bound to CPUs 0-4
and queue#2 is bound to CPUs 2, 6, and 8:

  --request-queues 2
  --request-queue-cpu-affinity 1=0-4
  --request-queue-cpu-affinity 2=5,6,8

Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/virtio-fs/attachments/20190827/412c1ebf/attachment.sig>


More information about the Virtio-fs mailing list