[dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178

Eric Wheeler dm-devel at lists.ewheeler.net
Fri Dec 27 01:47:04 UTC 2019


On Fri, 20 Dec 2019, Eric Wheeler wrote:
> On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > We have been using the 4.19 branch for months without issue; we just 
> > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > related to the issue, I don't know, maybe coincidence:
> > 
> > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > 	{
> > 	    int r;
> > 	    enum allocation_event ev;
> > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > 
> > 	    /* FIXME: we should loop round a couple of times */
> > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > 	    if (r)
> > 		return r;
> > 
> > 	    smd->begin = *b + 1;
> > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	    if (!r) {
> > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > 		smd->nr_allocated_this_transaction++;
> > 	    }
> 
> Hello all,
> 
> We hit this BUG_ON again, this time with 4.19.86 with 
> scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> additionally reported by Markus Schade:
> 
>   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
>      and
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> 
> In our case, the most latest trace (below) is from a different system that
> has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> Snitzer explains to be an allocator race:
> 
> On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > 
> > The stack shows the call to sm_disk_new_block() is due to
> > dm_pool_alloc_data_block().
> > 
> > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > 
> > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > returns 0 -- which simply means success.  And on success
> > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > 
> > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > 1) ref_count wasn't set
> > or
> > 2) old was identified
> > 
> > So all said: somehow a new data block was found to already be in use.
> > _WHY_ that is the case isn't clear from this stack...
> > 
> > But it does speak to the possibility of data block allocation racing
> > with other operations to the same block.  Which implies missing locking.
> 
> Where would you look to add locking do you think? 
> 
> > But that's all I've got so far... I'll review past dm-thinp changes with
> > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > needs to have a look at this too.
> > 
> > But could it be that bcache is the source of the data device race (same
> > block used concurrently)?  And DM thinp is acting as the canary in the
> > coal mine?
> 
> As Marcus has shown, this bug triggers without bcache.
> 
> 
> Other questions:
> 
> 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> 
> +	spin_lock(&lock); /* protect smd->begin */
> 	smd->begin = *b + 1;
> 	r = sm_ll_inc(&smd->ll, *b, &ev);
> 	if (!r) {
> -		BUG_ON(ev != SM_ALLOC); 
> 		smd->nr_allocated_this_transaction++;
> 	}
> +	else {
> +		r = -ENOSPC;
> +		smd->begin = *b - 1;
> +	}
> +	spin_unlock(&lock);

Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
times MQ has been turned on. 

I'm trying the -ENOSPC hack which will flag the pool as being out of space 
so I can recover more gracefully than a BUG_ON. Here's a first-draft 
patch, maybe the spinlock will even prevent the issue.

Compile tested, I'll try on a real system tomorrow.

Comments welcome:

diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
index 32adf6b..cb27a20 100644
--- a/drivers/md/persistent-data/dm-space-map-disk.c
+++ b/drivers/md/persistent-data/dm-space-map-disk.c
@@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
 	return r;
 }
 
+static DEFINE_SPINLOCK(smd_lock);
 static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 {
 	int r;
@@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
 	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
 
 	/* FIXME: we should loop round a couple of times */
+	spin_lock(&smd_lock);
 	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
-	if (r)
+	if (r) {
+		spin_unlock(&smd_lock);
 		return r;
+	}
 
 	smd->begin = *b + 1;
 	r = sm_ll_inc(&smd->ll, *b, &ev);
 	if (!r) {
-		BUG_ON(ev != SM_ALLOC);
-		smd->nr_allocated_this_transaction++;
+		if (ev == SM_ALLOC)
+			smd->nr_allocated_this_transaction++;
+		else {
+			/* Not actually out of space, this is a bug:
+			 * https://lore.kernel.org/linux-block/20190925200138.GA20584@redhat.com/
+			 */
+			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
+			r = -ENOSPC;
+			smd->begin = *b - 1;
+		}
 	}
 
+	spin_unlock(&smd_lock);
+
 	return r;
 }
 



--
Eric Wheeler


> 
> The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> sm_ll_mutate() then perhaps this is safe.  What do you think?
> 
> Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> of space, but I would take it over a BUG_ON.
> 
> 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
>    taught retry?  This bug shows up weeks or months apart, even on heavily 
>    loaded systems with ~100 live thin volumes, so retrying would be fine 
>    IMHO.
> 
> 3) In the thread from June, Marcus says:
> 	"Judging from the call trace, my guess is that there is somewhere 
> 	a race condition, when a new block needs to be allocated which has 
> 	still to be discarded."
> 
> Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> 
> --
> Eric Wheeler
> www.datawall.us
> 
> 
> 
> Here is the new trace, old trace below:
> 
> kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> invalid opcode: 0000 [#1] SMP NOPTI
> CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> Workqueue: dm-thin do_worker [dm_thin_pool]
> RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> Call Trace:
>  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
>  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
>  process_cell+0x2a3/0x550 [dm_thin_pool]
>  ? mempool_alloc+0x6f/0x180
>  ? sort+0x17b/0x270
>  ? u32_swap+0x10/0x10
>  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
>  do_worker+0x94/0xe0 [dm_thin_pool]
>  process_one_work+0x171/0x370
>  worker_thread+0x49/0x3f0
>  kthread+0xf8/0x130
>  ? max_active_store+0x80/0x80
>  ? kthread_bind+0x10/0x10
>  ret_from_fork+0x1f/0x40
> 
> > 
> > [199391.677689] ------------[ cut here ]------------
> > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.690984] Call Trace:
> > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199391.693852]  ? sort+0x17b/0x270
> > [199391.694527]  ? u32_swap+0x10/0x10
> > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199391.695890]  process_one_work+0x171/0x370
> > [199391.696640]  worker_thread+0x49/0x3f0
> > [199391.697332]  kthread+0xf8/0x130
> > [199391.697988]  ? max_active_store+0x80/0x80
> > [199391.698659]  ? kthread_bind+0x10/0x10
> > [199391.699281]  ret_from_fork+0x1f/0x40
> > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199391.877317] Kernel panic - not syncing: Fatal exception
> > [199391.878006] Kernel Offset: disabled
> > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > [199392.034277] Call Trace:
> > [199392.034929]  <IRQ>
> > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > [199392.037518]  irq_work_run_list+0x4c/0x70
> > [199392.038149]  irq_work_run+0x14/0x40
> > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > [199392.040011]  </IRQ>
> > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.046486]  oops_end+0xc1/0xd0
> > [199392.047149]  do_trap+0x13d/0x150
> > [199392.047795]  do_error_trap+0xd5/0x130
> > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.049048]  invalid_op+0x14/0x20
> > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.057484]  ? sort+0x17b/0x270
> > [199392.058016]  ? u32_swap+0x10/0x10
> > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.059060]  process_one_work+0x171/0x370
> > [199392.059576]  worker_thread+0x49/0x3f0
> > [199392.060083]  kthread+0xf8/0x130
> > [199392.060587]  ? max_active_store+0x80/0x80
> > [199392.061086]  ? kthread_bind+0x10/0x10
> > [199392.061569]  ret_from_fork+0x1f/0x40
> > [199392.062038] ------------[ cut here ]------------
> > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.078481] Call Trace:
> > [199392.079117]  <IRQ>
> > [199392.079745]  check_preempt_curr+0x6b/0x90
> > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > [199392.081623]  __wake_up_common+0x8f/0x160
> > [199392.082246]  ep_poll_callback+0x1af/0x300
> > [199392.082860]  __wake_up_common+0x8f/0x160
> > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > [199392.084074]  irq_work_run_list+0x4c/0x70
> > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > [199392.085879]  </IRQ>
> > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.092255]  oops_end+0xc1/0xd0
> > [199392.092894]  do_trap+0x13d/0x150
> > [199392.093516]  do_error_trap+0xd5/0x130
> > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.094718]  invalid_op+0x14/0x20
> > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.102750]  ? sort+0x17b/0x270
> > [199392.103242]  ? u32_swap+0x10/0x10
> > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.104228]  process_one_work+0x171/0x370
> > [199392.104714]  worker_thread+0x49/0x3f0
> > [199392.105193]  kthread+0xf8/0x130
> > [199392.105665]  ? max_active_store+0x80/0x80
> > [199392.106132]  ? kthread_bind+0x10/0x10
> > [199392.106601]  ret_from_fork+0x1f/0x40
> > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > [199392.107544] ------------[ cut here ]------------
> > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > [199392.123867] Call Trace:
> > [199392.124495]  <IRQ>
> > [199392.125116]  update_process_times+0x40/0x50
> > [199392.125742]  tick_sched_handle+0x25/0x60
> > [199392.126367]  tick_sched_timer+0x37/0x70
> > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > [199392.129423]  </IRQ>
> > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > [199392.135792]  oops_end+0xc1/0xd0
> > [199392.136444]  do_trap+0x13d/0x150
> > [199392.137094]  do_error_trap+0xd5/0x130
> > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.138383]  invalid_op+0x14/0x20
> > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > [199392.146949]  ? sort+0x17b/0x270
> > [199392.147450]  ? u32_swap+0x10/0x10
> > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > [199392.148441]  process_one_work+0x171/0x370
> > [199392.148937]  worker_thread+0x49/0x3f0
> > [199392.149430]  kthread+0xf8/0x130
> > [199392.149922]  ? max_active_store+0x80/0x80
> > [199392.150406]  ? kthread_bind+0x10/0x10
> > [199392.150883]  ret_from_fork+0x1f/0x40
> > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > 
> > 
> > --
> > Eric Wheeler
> > 
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 





More information about the dm-devel mailing list