[linux-lvm] Possible bug in thin metadata size with Linux MDRAID

Thu Mar 9 15:33:45 UTC 2017

On 09/03/2017 12:53, Zdenek Kabelac wrote:
>
> Hmm - it would be interesting to see your 'metadata' -  it should be still
> quite good fit 128M of metadata for 512G  when you are not using snapshots.
>
> What's been your actual test scenario ?? (Lots of LVs??)
>

Nothing unusual - I had a single thinvol with an XFS filesystem used to 
store an HDD image gathered using ddrescue.

Anyway, are you sure that a 128 MB metadata volume is "quite good" for a 
512GB volume with 128 KB chunks? My testing suggests something 
different. For example, give it a look at this empty thinpool/thinvol:

[root at gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool     Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                      0
   thinpool         vg_kvm    twi-aotz-- 500.00g                 0.00 
0.81                             128.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                      0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                      0
   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        0.00 
                                      0
   root             vg_system -wi-ao----  50.00g 
                                      0
   swap             vg_system -wi-ao----   3.75g 
                                      0

As you can see, as it is a empty volume, metadata is at only 0.81% Let 
write 5 GB (1% of thin data volume):

[root at gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool     Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                      0
   thinpool         vg_kvm    twi-aotz-- 500.00g                 1.00 
1.80                             128.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                      0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                      0
   thinvol          vg_kvm    Vwi-a-tz-- 500.00g thinpool        1.00 
                                      0
   root             vg_system -wi-ao----  50.00g 
                                      0
   swap             vg_system -wi-ao----   3.75g 
                                      0

Metadata grown by the same 1%. Accounting for the initial 0.81 
utilization, this means that a near full data volume (with *no* 
overprovisionig nor snapshots) will exhaust its metadata *before* really 
becoming 100% full.

While I can absolutely understand that this is expected behavior when 
using snapshots and/or overprovisioning, in this extremely simple case 
metadata should not be exhausted before data. In other words, the 
initial metadata creation process should be *at least* consider that a 
plain volume can be 100% full, and allocate according.

The interesting part is that when not using MD, all is working properly: 
metadata are about 2x their minimal value (as reported by 
thin_metadata_size), and this provide ample buffer for 
snapshotting/overprovisioning. When using MD, the bad iteration between 
RAID chunks and thin metadata chunks ends with a too small metadata volume.

This can become very bad. Give a look at what happens when creating a 
thin pool on a MD raid whose chunks are at 64 KB:

[root at gdanti-laptop test]# lvs -a -o +chunk_size
   LV               VG        Attr       LSize   Pool Origin Data% 
Meta%  Move Log Cpy%Sync Convert Chunk
   [lvol0_pmspare]  vg_kvm    ewi------- 128.00m 
                                 0
   thinpool         vg_kvm    twi-a-tz-- 500.00g             0.00   1.58 
                             64.00k
   [thinpool_tdata] vg_kvm    Twi-ao---- 500.00g 
                                 0
   [thinpool_tmeta] vg_kvm    ewi-ao---- 128.00m 
                                 0
   root             vg_system -wi-ao----  50.00g 
                                 0
   swap             vg_system -wi-ao----   3.75g 
                                 0

Thin metadata chunks are now at 64 KB - with the *same* 128 MB metadata 
volume size. Now metadata can only address ~50% of thin volume space.

> But as said - there is no guarantee of the size to fit for any possible
> use case - user  is supposed to understand what kind of technology he is
> using,
> and when he 'opt-out' from automatic resize - he needs to deploy his own
> monitoring.

True, but this trivial case should really works without 
tuning/monitoring. In short, let fail gracefully on a simple case...
>
> Otherwise you would have to simply always create 16G metadata LV if you
> do not want to run out of metadata space.
>
>

Absolutely true. I've written this email to report a bug, indeed ;)
Thank you all for this outstanding work.

-- 
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti at assyoma.it - info at assyoma.it
GPG public key ID: FF5F32A8