[dm-devel] back-ported dm-cache not forwarding read-ahead bios to origin

Mon Jul 20 16:50:07 UTC 2015

I'm working on a back-ported dm-cache version for kernel 2.6.32-431.29.2
(the CentOS 6 patched one) and I'm trying to solve a corruption bug
apparently introduced during the back-port. I can consistently reproduce it
by simply mounting an ext4 file-system that contains some data and running
stat(1) against a specific directory. stat(1) fails with "Input/output
error" and dmesg says:"EXT4-fs error (device dm-8): ext4_lookup: deleted
inode referenced: 29". The file-system is mounted with options
"data=writeback,ro,nodiscard,inode_readahead_blks=1" in order to minimise
noise.

Since I'm using the pass-through mode, my theory is that dm-cache:
(1) forwards the bio to the wrong device, and/or
(2) forwards the bio to the wrong location of the device (e,g, bio length
and/or offset are wrong), and/or
(3) copies the wrong piece of data from a forwarded bio to the original
bio, assuming it copies data in the first place (I don't know much about
the device mapper at this point).

I tried to confirm which of the above 3 could be happening by checking
which bio goes where using btrace(8). Specifically, I ran btrace(8) against
the cache target, the HDD, the SSD data and metadata devices (4 traces in
total). I observed that no bios go to the SSD data and metadata devices, so
this rules out (1). I also observed that read-ahead requests issued to the
cache target don't get forwarded to the HDD. I don't know whether or not
this can be a problem in the first place (can read-ahead bios be ignored?),
let alone identifying this being the problem, but I think it's worth it
ensuring that it really doesn't cause any problems.

Below are the traces when mounting the file-system:

cache target trace:
253,8    7        3    27.218701098 28536  Q   R 2 + 2 [mount]
253,8    7        4    27.218726465 28536  U   N [mount] 0
253,8    7        5    27.222694538 28536  Q   R 0 + 8 [mount]
253,8    7        6    27.222707270 28536  U   N [mount] 0
253,8    7        7    27.226580397 28536  Q   R 8 + 8 [mount]
253,8    7        8    27.226598088 28536  U   N [mount] 0
253,8    7        9    27.229666137 28536  Q  RA 2832 + 8 [mount]
253,8    7       10    27.229677500 28536  Q  RM 2824 + 8 [mount]
253,8    7       11    27.229679348 28536  U   N [mount] 0
253,8    1        2    27.222630997 28198  C   R 2 + 2 [0]
253,8    1        3    27.226560799 28198  C   R 0 + 8 [0]
253,8    1        4    27.229570827 28198  C   R 8 + 8 [0]
253,8    1        5    27.232313463 28198  C  RM 2824 + 8 [0]
253,8    3        1    27.229683980 28291  C  RA 2832 + 8 [0]
253,8    7       12    27.232360573 28536  Q   R 4456448 + 8 [mount]
253,8    7       13    27.232402040 28536  U   N [mount] 0
253,8    6        1    27.243263044 28204  C   R 4456448 + 8 [0]

HDD trace:
253,5    1        3    27.222584291 28198  C   R 2 + 2 [0]
253,5    7        2    27.218685545 28536  U   N [(null)] 0
253,5    7        3    27.222664774 28536  U   N [(null)] 0
253,5    3        1    27.218694575 28291  Q   R 2 + 2 [dm-cache]
253,5    3        2    27.222670216 28291  Q   R 0 + 8 [dm-cache]
253,5    3        3    27.226566647 28291  Q   R 8 + 8 [dm-cache]
253,5    1        4    27.226516192 28198  C   R 0 + 8 [0]
253,5    1        5    27.229526352 28198  C   R 8 + 8 [0]
253,5    1        6    27.232269516 28198  C  RM 2824 + 8 [0]
253,5    7        4    27.226555641 28536  U   N [(null)] 0
253,5    7        5    27.229636877 28536  U   N [(null)] 0
253,5    7        6    27.232331776 28536  A   R 4456448 + 8 <- (253,8)
4456448
253,5    7        7    27.232332898 28536  Q   R 4456448 + 8 [(null)]
253,5    7        8    27.232359557 28536  U   N [(null)] 0
253,5    3        4    27.229649990 28291  Q  RM 2824 + 8 [dm-cache]
253,5    6        1    27.243215063 28204  C   R 4456448 + 8 [0]

The "RA 2832 + 8" request (7th line in the 1st trace) issued to the cache
target gets completed without ever reaching the HDD. Is this OK? I've
started looking at the code but I haven't found yet anything specific to
read-ahead bios.

Regarding my 3rd theory (data getting corrupted by dm-cache after read from
the HDD), is there some relatively easy way to confirm this? E.g. could
btrace tell me when a bio completes the checksum of the bio's data?

Is there something else that could be wrong?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20150720/0a7fdfcc/attachment.htm>