<div dir="ltr">Acked-by: Joe Thornber <<a href="mailto:ejt@redhat.com">ejt@redhat.com</a>></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 8, 2022 at 2:07 PM Zhihao Cheng <<a href="mailto:chengzhihao1@huawei.com">chengzhihao1@huawei.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Recently we found a softlock up problem in dm thin pool looking up btree<br>
caused by corrupted metadata:<br>
 Kernel panic - not syncing: softlockup: hung tasks<br>
 CPU: 7 PID: 2669225 Comm: kworker/u16:3<br>
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)<br>
 Workqueue: dm-thin do_worker [dm_thin_pool]<br>
 Call Trace:<br>
   <IRQ><br>
   dump_stack+0x9c/0xd3<br>
   panic+0x35d/0x6b9<br>
   watchdog_timer_fn.cold+0x16/0x25<br>
   __run_hrtimer+0xa2/0x2d0<br>
   </IRQ><br>
   RIP: 0010:__relink_lru+0x102/0x220 [dm_bufio]<br>
   __bufio_new+0x11f/0x4f0 [dm_bufio]<br>
   new_read+0xa3/0x1e0 [dm_bufio]<br>
   dm_bm_read_lock+0x33/0xd0 [dm_persistent_data]<br>
   ro_step+0x63/0x100 [dm_persistent_data]<br>
   btree_lookup_raw.constprop.0+0x44/0x220 [dm_persistent_data]<br>
   dm_btree_lookup+0x16f/0x210 [dm_persistent_data]<br>
   dm_thin_find_block+0x12c/0x210 [dm_thin_pool]<br>
   __process_bio_read_only+0xc5/0x400 [dm_thin_pool]<br>
   process_thin_deferred_bios+0x1a4/0x4a0 [dm_thin_pool]<br>
   process_one_work+0x3c5/0x730<br>
<br>
Following process may generate a broken btree mixed with fresh and stale<br>
nodes, which could let dm thin trapped into an infinite loop while looking<br>
up data block:<br>
 Transaction 1: pmd->root = A, A->B->C   // One path in btree<br>
                pmd->root = X, X->Y->Z   // Copy-up<br>
 Transaction 2: X,Z is updated on disk, Y is written failed.<br>
                // Commit failed, dm thin becomes read-only.<br>
                process_bio_read_only<br>
                 dm_thin_find_block<br>
                  __find_block<br>
                   dm_btree_lookup(pmd->root)<br>
The pmd->root points to a broken btree, Y may contain stale node pointing<br>
to any block, for example X, which lets dm thin trapped into a dead loop<br>
while looking up Z.<br>
<br>
Fix it by setting pmd->root in __open_metadata(), so that dm thin will use<br>
last transaction's pmd->root if commit failed.<br>
<br>
Fetch a reproducer in [Link].<br>
<br>
Linke: <a href="https://bugzilla.kernel.org/show_bug.cgi?id=216790" rel="noreferrer" target="_blank">https://bugzilla.kernel.org/show_bug.cgi?id=216790</a><br>
Fixes: 991d9fa02da0 ("dm: add thin provisioning target")<br>
Signed-off-by: Zhihao Cheng <<a href="mailto:chengzhihao1@huawei.com" target="_blank">chengzhihao1@huawei.com</a>><br>
---<br>
 drivers/md/dm-thin-metadata.c | 9 +++++++++<br>
 1 file changed, 9 insertions(+)<br>
<br>
diff --git a/drivers/md/dm-thin-metadata.c b/drivers/md/dm-thin-metadata.c<br>
index 1a62226ac34e..26c42ee183ed 100644<br>
--- a/drivers/md/dm-thin-metadata.c<br>
+++ b/drivers/md/dm-thin-metadata.c<br>
@@ -724,6 +724,15 @@ static int __open_metadata(struct dm_pool_metadata *pmd)<br>
                goto bad_cleanup_data_sm;<br>
        }<br>
<br>
+       /*<br>
+        * For pool metadata opening process, root setting is redundant<br>
+        * because it will be set again in __begin_transaction(). But dm<br>
+        * pool aborting process really needs to get last transaction's<br>
+        * root in case accessing broken btree.<br>
+        */<br>
+       pmd->root = le64_to_cpu(disk_super->data_mapping_root);<br>
+       pmd->details_root = le64_to_cpu(disk_super->device_details_root);<br>
+<br>
        __setup_btree_details(pmd);<br>
        dm_bm_unlock(sblock);<br>
<br>
-- <br>
2.31.1<br>
<br>
</blockquote></div>