[lvm-devel] thin vol write performance variance

Lakshmi Narasimhan Sundararajan lsundararajan at purestorage.com
Mon Nov 22 17:23:28 UTC 2021


Hi Team,
I am following up with the poor write/sync performance issue over dm
thin volumes.
I need your inputs to help me understand better.

System has physical SSD drives.
In this simple case, MD raid0 volume is mapped over the 3 SSDs.
A thin pool is created over the MD volume.
A thin volume is created over the thin pool.
And this test involves checking write IO performance over this thin volume.

The summary of my finding is dmthin volume buffered is very slow.
On direct volume this issue is not seen.
Looking at the inflight requests, there is a huge amount of IO in flight.
Given all block device have a max limit 'nr_request', i cannot imagine
how its possible to have
that many enqueued requests inflight. See below.

[root at ip-70-0-192-7 ~]# cat /sys/block/dm-5/queue/nr_requests
128
[root at ip-70-0-192-7 ~]# cat /sys/block/dm-5/inflight
       0   296600

It is not surprising then that a sync call would take forever given
the amount of pending IOs.

Can you please help me understand how that many inflight requests are possible?
Why is the device queue limit not honoured for dm thin devices?
Any other possible areas/pointers to follow up?

Below are more details on the setup and the fio used for pumping traffic.

* thin device under test:
/dev/mapped/pwx0-717475775864529330

* fio cmdline, where the above device (formatted with ext4 and mounted
at /mnt/1)
sudo fio --blocksize=16k --directory=/mnt/1 --filename=sample.txt
--ioengine=libaio --readwrite=write --size=1G --name=test
--verify_pattern=0xDeadBeef --direct=0 --gtod_reduce=1 --iodepth=32
--randrepeat=1 --disable_lat=0 --gtod_reduce=0

* devices:
[root at ip-70-0-192-7 ~]# ls -lh /dev/mapper/
total 0
crw------- 1 root root 10, 236 Nov 19 23:17 control
lrwxrwxrwx 1 root root       7 Nov 19 23:17 pwx0-717475775864529330 -> ../dm-5
lrwxrwxrwx 1 root root       7 Nov 19 23:17 pwx0-pxMetaFS -> ../dm-4
lrwxrwxrwx 1 root root       7 Nov 19 23:17 pwx0-pxpool -> ../dm-3
lrwxrwxrwx 1 root root       7 Nov 19 23:17 pwx0-pxpool_tdata -> ../dm-1
lrwxrwxrwx 1 root root       7 Nov 19 23:17 pwx0-pxpool_tmeta -> ../dm-0
lrwxrwxrwx 1 root root       7 Nov 19 23:17 pwx0-pxpool-tpool -> ../dm-2

* iostat:
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    2.00     0.00     0.01
5.50     0.03   14.50    0.00   14.50  14.50   2.90
sde               0.00  1216.00    0.00  832.00     0.00     8.00
19.69     3.35    4.03    0.00    4.03   0.43  35.50
sdf               0.00  1146.00    0.00  902.00     0.00     8.00
18.16     4.84    5.37    0.00    5.37   0.50  45.10
sdd               0.00  1136.00    1.00 1011.00     0.00     8.39
16.98     4.41    4.36    7.00    4.36   0.41  41.70
md127             0.00     0.00    0.00 6243.00     0.00    24.39
8.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00 6243.00     0.00    24.39
8.00    57.37    9.19    0.00    9.19   0.16 101.30
dm-2              0.00     0.00    0.00 6243.00     0.00    24.39
8.00    57.39    9.19    0.00    9.19   0.16 101.30
dm-4              0.00     0.00    0.00    0.00     0.00     0.00
0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00
0.00 259962.13    0.00    0.00    0.00   0.00 101.30

[root at ip-70-0-192-7 ~]# cat /sys/block/dm-5/queue/nr_requests
128
[root at ip-70-0-192-7 ~]# cat /sys/block/dm-5/inflight
       0   296600

[root at ip-70-0-192-7 ~]# dmsetup table
pwx0-pxMetaFS: 0 134217728 thin 253:2 1
pwx0-717475775864529330: 0 20971520 thin 253:2 2
pwx0-pxpool-tpool: 0 125689856 thin-pool 253:0 253:1 128 0 0
pwx0-pxpool_tdata: 0 125689856 linear 9:127 4196352
pwx0-pxpool_tmeta: 0 4194304 linear 9:127 129886208
pwx0-pxpool: 0 125689856 linear 253:2 0


[root at ip-70-0-192-7 ~]# dmsetup ls --tree
pwx0-pxMetaFS (253:4)
 └─pwx0-pxpool-tpool (253:2)
    ├─pwx0-pxpool_tdata (253:1)
    │  └─ (9:127)
    └─pwx0-pxpool_tmeta (253:0)
       └─ (9:127)
pwx0-pxpool (253:3)
 └─pwx0-pxpool-tpool (253:2)
    ├─pwx0-pxpool_tdata (253:1)
    │  └─ (9:127)
    └─pwx0-pxpool_tmeta (253:0)
       └─ (9:127)
pwx0-717475775864529330 (253:5)
 └─pwx0-pxpool-tpool (253:2)
    ├─pwx0-pxpool_tdata (253:1)
    │  └─ (9:127)
    └─pwx0-pxpool_tmeta (253:0)
       └─ (9:127)

[root at ip-70-0-78-192 ~]# lsblk
NAME                             MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sdf                                8:80   0    64G  0 disk
sdd                                8:48   0    64G  0 disk
sdb                                8:16   0    32G  0 disk
sdg                                8:96   0    64G  0 disk
sde                                8:64   0    64G  0 disk
sdc                                8:32   0    64G  0 disk
└─md127                            9:127  0    64G  0 raid0
  ├─pwx0-pxpool_tdata            253:1    0    60G  0 lvm
  │ └─pwx0-pxpool-tpool          253:2    0    60G  0 lvm
  │   ├─pwx0-pxMetaFS            253:4    0    64G  0 lvm
  │   ├─pwx0-717475775864529330  253:5    0    10G  0 lvm
  │   └─pwx0-pxpool              253:3    0    60G  0 lvm
  └─pwx0-pxpool_tmeta            253:0    0     2G  0 lvm
    └─pwx0-pxpool-tpool          253:2    0    60G  0 lvm
      ├─pwx0-pxMetaFS            253:4    0    64G  0 lvm
      ├─pwx0-717475775864529330  253:5    0    10G  0 lvm
      └─pwx0-pxpool              253:3    0    60G  0 lvm
sda                                8:0    0   128G  0 disk
├─sda2                             8:2    0 124.3G  0 part  /
└─sda1                             8:1    0   3.7G  0 part  /boot
[root at ip-70-0-78-192 ~]#

[root at ip-70-0-78-192 ~]# cat /sys/block/sdc/queue/rotational
0
sdc is a SSD drive.

==== system config =====
[root at ip-70-0-78-192 ~]# lvm version
  LVM version:     2.02.187(2)-RHEL7 (2020-03-24)
  Library version: 1.02.170-RHEL7 (2020-03-24)
  Driver version:  4.42.0
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu
--host=x86_64-redhat-linux-gnu --program-prefix=
--disable-dependency-tracking --prefix=/usr --exec-prefix=/usr
--bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64
--libexecdir=/usr/libexec --localstatedir=/var
--sharedstatedir=/var/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-default-dm-run-dir=/run
--with-default-run-dir=/run/lvm --with-default-pid-dir=/run
--with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64
--enable-lvm1_fallback --enable-fsadm --with-pool=internal
--enable-write_install --with-user= --with-group= --with-device-uid=0
--with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig
--enable-applib --enable-cmdlib --enable-dmeventd
--enable-blkid_wiping --enable-python2-bindings
--with-cluster=internal --with-clvmd=corosync --enable-cmirrord
--with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync
--with-thin=internal --enable-lvmetad --with-cache=internal
--enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-sanlock
--enable-dmfilemapd
[root at ip-70-0-78-192 ~]# uname -a
Linux ip-70-0-78-192.brbnca.spcsdns.net 5.7.12-1.el7.elrepo.x86_64 #1
SMP Fri Jul 31 16:18:28 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
[root at ip-70-0-78-192 ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root at ip-70-0-78-192 ~]#

This system has 16G RAM and 8 cpu cores.


Thanks

LN

On Tue, Sep 28, 2021 at 3:34 PM Lakshmi Narasimhan Sundararajan
<lsundararajan at purestorage.com> wrote:
>
> On Fri, Sep 17, 2021 at 1:29 AM Zdenek Kabelac <zkabelac at redhat.com> wrote:
> >
> > Dne 15. 09. 21 v 9:02 Lakshmi Narasimhan Sundararajan napsal(a):
> > > Hi Team,
> > > A very good day to you.
> > >
> > > I have a lvm2 thin pool and thin volumes in my environment.
> > > I see a huge variance in write performance over those thin volumes.
> > > As one can observe from the logs below, the same quantum of write
> > > (~1.5G) to the thin volume (/dev/pwx0/608561273872404373) completes
> > > between 2s to 40s.
> > > The metadata is  defined as 128 sectors (64KB) on the thin pool.
> > > I understand that there is a late mapping of segments to thin volumes
> > > as IO requests come in.
> > > Is there a way to test/quantity that the overhead is because of this
> > > lazy mapping?
> > > Are there any other config/areas that I can tune to control this behavior?
> > > Are there any tunables/ioctl to control mapping regions ahead of time
> > > (ala readahead)?
> > > Any other options available to confirm this behavior is because of the
> > > lazy mapping and ways to improve it?
> > >
> > > My intention is to improve this behavior and control the variance to a
> > > more tight bound.
> > > Looking forward to your inputs in helping me understand this better.
> > >
> >
> > Hi
> >
> > I think we need to 'decipher' first some origins of your problems.
> >
> > So what is your backend 'storage' in use?
> > Do you use fast device like ssd/nvme to store thin-pool metadata?
> >
> > Do you measure your time *after* sync all of 'unwritten/buffered' data on disk ?
> >
> > What is actually your hw in use  - RAM, CPU ?
> >
> > Which kernel and lvm2 version is being used ?
> >
> > Do you use/need zeroing of provisioned blocks (which may impact performance
> > and can be disabled with lvcreate -Zn) ?
> >
> > Do you measure writes while provisioning thin chunks or on already provisioned
> > device?
> >
>
> Hi Zdenek,
> These are traditional HDDs. Both the thin pool data/metadata reside on
> the same set of drive(s).
> I understand where you are going with this, I will look further into
> defining the hardware/disk before I bring it to your attention.
>
> This run was not on an already provisioned device. I do see improved
> performance of the same volume after the first write.
> I understand this perf gain to be the overhead that is avoided during
> the subsequent run where no mappings need to be established.
>
> But, you mentioned zeroing of provisioned blocks as an issue.
> 1/ during lvcreate -Z from the man pages reports only controls the
> first 4K block. And also implies this is a MUST otherwise fs may hang.
> So, we are using this. Are you saying this controls zeroing of each
> chunk that's mapped to the thin volume?
>
> 2/ The other about zeroing all the data chunks mapped to the thin
> volume, I could see only reference in the lvm.conf under
> thin_pool_zero,
> This is default enabled. So are you suggesting I disable this?
>
> Please confirm the above items. I will come back with more precise
> details on the details you had requested for.
>
> Thanks.
> LN
>
>
>
>
>
>
>
>
>
> >
> > Regards
> >
> > Zdenek
> >





More information about the lvm-devel mailing list