[linux-lvm] Possible bug with concurrent RAID syncs on the same underlying devices
Péter Sárközi
xmisterhu at gmail.com
Sat Apr 10 00:32:52 UTC 2021
Hi,
Up until now I was having 8 mdadm RAID6 arrays which are sharing the
same 6 different sized devices with 1TB partitions, like:
md0: sda1 sdb1 sdc1...
md1: sda2 sdb2 sdc2...
.
.
.
md7: sda8 sdb8 sde5 sdd7...
It was set up like this so I can efficiently use the space from
different sized disks.
Since lvmraid has support for integrity on lvmraid devices, I backed
up everything and trying to recreate a similar structure with lvmraid
and integrity enabled.
In the past when multiple mdadm arrays needed to resync, they would
wait for each other to finish before, because mdadm detected those
arrays shared the same disks.
Now when I was trying to recreate the arrays I realized that the
initial lvmraid syncs doesn't wait for each other.
This means I can't recreate the whole structure in one go as it would
trash the IO on these HDDs.
I don't know if this is on purpose, because I haven't tried lvmraid
before, but I know lvmraid uses md under the hood, and I'm thinking
that this might be a bug, because the md code in kernel can't detect
the underlying devices through the integrity layer.
But I think it might worth to get fixed, as even with just 3 raid6
lvmraids and sync speed reduced to 10M by dev.raid.speed_limit_max
sysctl I get a pretty high load:
[root at hp ~] 2021-04-10 02:07:38
# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log
Cpy%Sync Convert
root pve rwi-aor--- 29,25g
100,00
md0 raid6-0 rwi-a-r--- <3,61t
40,54
md1 raid6-1 rwi-a-r--- <2,71t
8,54
md2 raid6-2 rwi-a-r--- <3,61t 1,01
[root at hp ~] 2021-04-10 02:30:46
# pvs -S vg_name=raid6-0
PV VG Fmt Attr PSize PFree
/dev/sda3 raid6-0 lvm2 a-- 931,50g 4,00m
/dev/sdb1 raid6-0 lvm2 a-- 931,50g 4,00m
/dev/sdd6 raid6-0 lvm2 a-- 931,50g 4,00m
/dev/sde6 raid6-0 lvm2 a-- 931,50g 4,00m
/dev/sdf1 raid6-0 lvm2 a-- 931,50g 4,00m
/dev/sdg4 raid6-0 lvm2 a-- 931,50g 4,00m
[root at hp ~] 2021-04-10 02:35:39
# uptime
02:35:40 up 1 day, 29 min, 4 users, load average: 138,20, 126,23, 135,60
Although this is just due to the insane amount of integrity kworker
processes, and the system is pretty usable, I think it would be much
nicer to only have 1 sync running on the same physical device at a
time.
More information about the linux-lvm
mailing list