[linux-lvm] cache IO blocking

Xen list at xenhideout.nl
Tue Jun 14 21:50:54 UTC 2016

I am sorry if this sounds repetitive,

I have an SDD + HDD cache combination.

And I am not sure it is not related to the SSD entirely.

I do test runs of dd if=/dev/zero of=/dev/<vg>/<cached lv>, and the 
system can freeze when I do so.

The cache for the specific volume I dd to is very small in relation to 
the volume itself.

However, that "vault cache" is not even used (1 block out of 60800) yet.

So I am writing to the combined volume called /dev/linux/vault.

   vault               linux Cwi-aoC--- 435,27g [vault_cache] 
[vault_corig] 0,00   9,18            0,00
   [vault_cache]       linux Cwi---C---   3,71g                           
   0,00   9,18            0,00
   [vault_cache_cdata] linux Cwi-ao----   3,71g
   [vault_cache_cmeta] linux ewi-ao----   8,00m
   [vault_corig]       linux owi-aoC--- 435,27g

I try to put a little load on the system (such as media library rescan) 
and processes can block for more than 2 minutes.

Such that a TTY will output messages such that "Process <X> has been 
blocking for more than 120 seconds".

It doesn't happen all the time or constantly. The first 2 test runs, it 
did happen. Without the cache, it hasn't happened yet.

I mean without the cache to "vault". "root" is also cached using the 

   root                linux Cwi-aoC---  20,00g [root_cache]  
[root_corig]  64,74  11,95           0,00
   [root_cache]        linux Cwi---C---   7,42g                           
   64,74  11,95           0,00
   [root_cache_cdata]  linux Cwi-ao----   7,42g
   [root_cache_cmeta]  linux ewi-ao----  12,00m
   [root_corig]        linux owi-aoC---  20,00g

So basically I can get _huge IO blocking_ where the CPU (top) is 
indicating waiting for IO, (io wait is near 100%) and the entire system 
freezes for basically all pieces of harddisk IO, (to the affected 
drives) for a cache that is not actually getting utilized much (as I 
said, 1/60800 currently) but writing to it causes the other volume (in 
this case) (which is "root") to block IO.

So "vault_cache" and "root_cache" are both on the SSD, and "vault_corig" 
and "root_corig" are both on the HDD. Writing to "vault" using DD can 
cause "root" to stop responding, in the sense of incurring huge IO 

This is irrespective of cache mode (writethrough/writeback) and cache 
policy (smq vs mq). And I wonder if this is just related to the SSD, or 
whether I will keep seeing this behaviour when I replace it.


More information about the linux-lvm mailing list