[linux-lvm] cache IO blocking
list at xenhideout.nl
Tue Jun 14 21:50:54 UTC 2016
I am sorry if this sounds repetitive,
I have an SDD + HDD cache combination.
And I am not sure it is not related to the SSD entirely.
I do test runs of dd if=/dev/zero of=/dev/<vg>/<cached lv>, and the
system can freeze when I do so.
The cache for the specific volume I dd to is very small in relation to
the volume itself.
However, that "vault cache" is not even used (1 block out of 60800) yet.
So I am writing to the combined volume called /dev/linux/vault.
vault linux Cwi-aoC--- 435,27g [vault_cache]
[vault_corig] 0,00 9,18 0,00
[vault_cache] linux Cwi---C--- 3,71g
0,00 9,18 0,00
[vault_cache_cdata] linux Cwi-ao---- 3,71g
[vault_cache_cmeta] linux ewi-ao---- 8,00m
[vault_corig] linux owi-aoC--- 435,27g
I try to put a little load on the system (such as media library rescan)
and processes can block for more than 2 minutes.
Such that a TTY will output messages such that "Process <X> has been
blocking for more than 120 seconds".
It doesn't happen all the time or constantly. The first 2 test runs, it
did happen. Without the cache, it hasn't happened yet.
I mean without the cache to "vault". "root" is also cached using the
root linux Cwi-aoC--- 20,00g [root_cache]
[root_corig] 64,74 11,95 0,00
[root_cache] linux Cwi---C--- 7,42g
64,74 11,95 0,00
[root_cache_cdata] linux Cwi-ao---- 7,42g
[root_cache_cmeta] linux ewi-ao---- 12,00m
[root_corig] linux owi-aoC--- 20,00g
So basically I can get _huge IO blocking_ where the CPU (top) is
indicating waiting for IO, (io wait is near 100%) and the entire system
freezes for basically all pieces of harddisk IO, (to the affected
drives) for a cache that is not actually getting utilized much (as I
said, 1/60800 currently) but writing to it causes the other volume (in
this case) (which is "root") to block IO.
So "vault_cache" and "root_cache" are both on the SSD, and "vault_corig"
and "root_corig" are both on the HDD. Writing to "vault" using DD can
cause "root" to stop responding, in the sense of incurring huge IO
This is irrespective of cache mode (writethrough/writeback) and cache
policy (smq vs mq). And I wonder if this is just related to the SSD, or
whether I will keep seeing this behaviour when I replace it.
More information about the linux-lvm