[dm-devel] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178

Eric Wheeler dm-devel at lists.ewheeler.net
Sat Dec 28 02:13:07 UTC 2019


On Fri, 27 Dec 2019, Eric Wheeler wrote:
> On Fri, 20 Dec 2019, Eric Wheeler wrote:
> > On Wed, 25 Sep 2019, Eric Wheeler wrote:
> > > We are using the 4.19.75 stable tree with dm-thin and multi-queue scsi.  
> > > We have been using the 4.19 branch for months without issue; we just 
> > > switched to MQ and we seem to have hit this BUG_ON.  Whether or not MQ is 
> > > related to the issue, I don't know, maybe coincidence:
> > > 
> > > 	static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
> > > 	{
> > > 	    int r;
> > > 	    enum allocation_event ev;
> > > 	    struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
> > > 
> > > 	    /* FIXME: we should loop round a couple of times */
> > > 	    r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> > > 	    if (r)
> > > 		return r;
> > > 
> > > 	    smd->begin = *b + 1;
> > > 	    r = sm_ll_inc(&smd->ll, *b, &ev);
> > > 	    if (!r) {
> > > 		BUG_ON(ev != SM_ALLOC); <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> > > 		smd->nr_allocated_this_transaction++;
> > > 	    }
> > 
> > Hello all,
> > 
> > We hit this BUG_ON again, this time with 4.19.86 with 
> > scsi_mod.use_blk_mq=y, and it is known to be present as of 5.1.2 as 
> > additionally reported by Markus Schade:
> > 
> >   https://www.redhat.com/archives/dm-devel/2019-June/msg00116.html
> >      and
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1777398
> > 
> > In our case, the most latest trace (below) is from a different system that
> > has been stable for years on Linux 4.1 with tmeta direct on the SSD.
> > We updated to 4.19.86 a few weeks ago and just hit this, what Mike
> > Snitzer explains to be an allocator race:
> > 
> > On Wed, 25 Sep 2019, Mike Snitzer wrote:
> > > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > > [199391.693852]  ? sort+0x17b/0x270
> > > > [199391.694527]  ? u32_swap+0x10/0x10
> > > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > > [199391.695890]  process_one_work+0x171/0x370
> > > > [199391.696640]  worker_thread+0x49/0x3f0
> > > > [199391.697332]  kthread+0xf8/0x130
> > > > [199391.697988]  ? max_active_store+0x80/0x80
> > > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > 
> > > The stack shows the call to sm_disk_new_block() is due to
> > > dm_pool_alloc_data_block().
> > > 
> > > sm_disk_new_block()'s BUG_ON(ev != SM_ALLOC) indicates that somehow it is
> > > getting called without the passed 'ev' being set to SM_ALLOC.  Only
> > > drivers/md/persistent-dat/dm-space-map-common.c:sm_ll_mutate() sets
> > > SM_ALLOC. sm_disk_new_block() is indirectly calling sm_ll_mutate()
> > > 
> > > sm_ll_mutate() will only return 0 if ll->save_ie() does, the ll_disk *ll
> > > should be ll_disk, and so disk_ll_save_ie()'s call to dm_btree_insert()
> > > returns 0 -- which simply means success.  And on success
> > > sm_disk_new_block() assumes ev was set to SM_ALLOC (by sm_ll_mutate).
> > > 
> > > sm_ll_mutate() decided to _not_ set SM_ALLOC because either:
> > > 1) ref_count wasn't set
> > > or
> > > 2) old was identified
> > > 
> > > So all said: somehow a new data block was found to already be in use.
> > > _WHY_ that is the case isn't clear from this stack...
> > > 
> > > But it does speak to the possibility of data block allocation racing
> > > with other operations to the same block.  Which implies missing locking.
> > 
> > Where would you look to add locking do you think? 
> > 
> > > But that's all I've got so far... I'll review past dm-thinp changes with
> > > all this in mind and see what turns up.  But Joe Thornber (ejt) likely
> > > needs to have a look at this too.
> > > 
> > > But could it be that bcache is the source of the data device race (same
> > > block used concurrently)?  And DM thinp is acting as the canary in the
> > > coal mine?
> > 
> > As Marcus has shown, this bug triggers without bcache.
> > 
> > 
> > Other questions:
> > 
> > 1) Can sm_disk_new_block() fail more gracefully than BUG_ON?  For example:
> > 
> > +	spin_lock(&lock); /* protect smd->begin */
> > 	smd->begin = *b + 1;
> > 	r = sm_ll_inc(&smd->ll, *b, &ev);
> > 	if (!r) {
> > -		BUG_ON(ev != SM_ALLOC); 
> > 		smd->nr_allocated_this_transaction++;
> > 	}
> > +	else {
> > +		r = -ENOSPC;
> > +		smd->begin = *b - 1;
> > +	}
> > +	spin_unlock(&lock);
> 
> Just hit the bug again without mq-scsi (scsi_mod.use_blk_mq=n), all other 
> times MQ has been turned on. 
> 
> I'm trying the -ENOSPC hack which will flag the pool as being out of space 
> so I can recover more gracefully than a BUG_ON. Here's a first-draft 
> patch, maybe the spinlock will even prevent the issue.
> 
> Compile tested, I'll try on a real system tomorrow.
> 
> Comments welcome:
> 
> diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c
> index 32adf6b..cb27a20 100644
> --- a/drivers/md/persistent-data/dm-space-map-disk.c
> +++ b/drivers/md/persistent-data/dm-space-map-disk.c
> @@ -161,6 +161,7 @@ static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
>  	return r;
>  }
>  
> +static DEFINE_SPINLOCK(smd_lock);
>  static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  {
>  	int r;
> @@ -168,17 +169,30 @@ static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
>  	struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
>  
>  	/* FIXME: we should loop round a couple of times */
> +	spin_lock(&smd_lock);
>  	r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b);
> -	if (r)
> +	if (r) {
> +		spin_unlock(&smd_lock);
>  		return r;
> +	}
>  
>  	smd->begin = *b + 1;
>  	r = sm_ll_inc(&smd->ll, *b, &ev);
>  	if (!r) {
> -		BUG_ON(ev != SM_ALLOC);
> -		smd->nr_allocated_this_transaction++;
> +		if (ev == SM_ALLOC)
> +			smd->nr_allocated_this_transaction++;
> +		else {
> +			/* Not actually out of space, this is a bug:
> +			 * https://lore.kernel.org/linux-block/20190925200138.GA20584@redhat.com/
> +			 */
> +			WARN(ev != SM_ALLOC, "Pool metadata allocation race, marking pool out-of-space.");
> +			r = -ENOSPC;
> +			smd->begin = *b - 1;
> +		}
>  	}
>  
> +	spin_unlock(&smd_lock);
> +
>  	return r;
>  }

So far, so good.  There are 3 systems running with the patch (1 w/ MQ=y, 
 2 w/ MQ=n), time will tell if we hit the WARN or if the spinlock prevents 
the race.

--
Eric Wheeler


> 
> 
> 
> --
> Eric Wheeler
> 
> 
> > 
> > The lock might protect smd->begin, but I'm not sure how &smd->ll might 
> > have been modified by sm_ll_inc().  However, since ll->save_ie() failed in 
> > sm_ll_mutate() then perhaps this is safe.  What do you think?
> > 
> > Putting the pool into PM_OUT_OF_DATA_SPACE isn't ideal since it isn't out 
> > of space, but I would take it over a BUG_ON.
> > 
> > 2) If example #1 above returned -EAGAIN, how might alloc_data_block be 
> >    taught retry?  This bug shows up weeks or months apart, even on heavily 
> >    loaded systems with ~100 live thin volumes, so retrying would be fine 
> >    IMHO.
> > 
> > 3) In the thread from June, Marcus says:
> > 	"Judging from the call trace, my guess is that there is somewhere 
> > 	a race condition, when a new block needs to be allocated which has 
> > 	still to be discarded."
> > 
> > Is this discard sitation possible?  Wouldn't the bio prison prevent this?
> > 
> > --
> > Eric Wheeler
> > www.datawall.us
> > 
> > 
> > 
> > Here is the new trace, old trace below:
> > 
> > kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > invalid opcode: 0000 [#1] SMP NOPTI
> > CPU: 11 PID: 22939 Comm: kworker/u48:1 Not tainted 4.19.86 #1
> > Hardware name: Supermicro SYS-2026T-6RFT+/X8DTU-6+, BIOS 2.1c       11/30/2012
> > Workqueue: dm-thin do_worker [dm_thin_pool]
> > RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 b0 a5 a9 e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55
> > RSP: 0018:ffffc90007237c78 EFLAGS: 00010297
> > RAX: 0000000000000000 RBX: ffff88861d7ac000 RCX: 0000000000000000
> > RDX: ffff8885f3d13f00 RSI: 0000000000000246 RDI: ffff888620e38c00
> > RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888620e38c98
> > R10: ffffffff810f9177 R11: 0000000000000000 R12: ffffc90007237d48
> > R13: ffffc90007237d48 R14: 00000000ffffffc3 R15: 00000000000d9e3a
> > FS:  0000000000000000(0000) GS:ffff8886279c0000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000081ceff0 CR3: 000000000200a004 CR4: 00000000000226e0
> > Call Trace:
> >  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> >  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> >  process_cell+0x2a3/0x550 [dm_thin_pool]
> >  ? mempool_alloc+0x6f/0x180
> >  ? sort+0x17b/0x270
> >  ? u32_swap+0x10/0x10
> >  process_deferred_bios+0x1af/0x870 [dm_thin_pool]
> >  do_worker+0x94/0xe0 [dm_thin_pool]
> >  process_one_work+0x171/0x370
> >  worker_thread+0x49/0x3f0
> >  kthread+0xf8/0x130
> >  ? max_active_store+0x80/0x80
> >  ? kthread_bind+0x10/0x10
> >  ret_from_fork+0x1f/0x40
> > 
> > > 
> > > [199391.677689] ------------[ cut here ]------------
> > > [199391.678437] kernel BUG at drivers/md/persistent-data/dm-space-map-disk.c:178!
> > > [199391.679183] invalid opcode: 0000 [#1] SMP NOPTI
> > > [199391.679941] CPU: 4 PID: 31359 Comm: kworker/u16:4 Not tainted 4.19.75 #1
> > > [199391.680683] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199391.681446] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199391.682187] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.682929] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.684432] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.685186] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.685936] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.686659] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.687379] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.688120] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.688843] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.689571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.690253] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.690984] Call Trace:
> > > [199391.691714]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199391.692411]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199391.693142]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199391.693852]  ? sort+0x17b/0x270
> > > [199391.694527]  ? u32_swap+0x10/0x10
> > > [199391.695192]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199391.695890]  process_one_work+0x171/0x370
> > > [199391.696640]  worker_thread+0x49/0x3f0
> > > [199391.697332]  kthread+0xf8/0x130
> > > [199391.697988]  ? max_active_store+0x80/0x80
> > > [199391.698659]  ? kthread_bind+0x10/0x10
> > > [199391.699281]  ret_from_fork+0x1f/0x40
> > > [199391.699930] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199391.705631]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199391.708083] ---[ end trace c31536d98046e8ec ]---
> > > [199391.866776] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199391.867960] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199391.870379] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199391.871524] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199391.872364] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199391.873173] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199391.873871] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199391.874550] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199391.875231] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199391.875941] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199391.876633] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199391.877317] Kernel panic - not syncing: Fatal exception
> > > [199391.878006] Kernel Offset: disabled
> > > [199392.032304] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > [199392.032962] unchecked MSR access error: WRMSR to 0x83f (tried to write 0x00000000000000f6) at rIP: 0xffffffff81067af4 (native_write_msr+0x4/0x20)
> > > [199392.034277] Call Trace:
> > > [199392.034929]  <IRQ>
> > > [199392.035576]  native_apic_msr_write+0x2e/0x40
> > > [199392.036228]  arch_irq_work_raise+0x28/0x40
> > > [199392.036877]  irq_work_queue_on+0x83/0xa0
> > > [199392.037518]  irq_work_run_list+0x4c/0x70
> > > [199392.038149]  irq_work_run+0x14/0x40
> > > [199392.038771]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.039393]  call_function_single_interrupt+0xf/0x20
> > > [199392.040011]  </IRQ>
> > > [199392.040624] RIP: 0010:panic+0x209/0x25c
> > > [199392.041234] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.042518] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.043174] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.043833] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.044493] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.045155] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.045820] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.046486]  oops_end+0xc1/0xd0
> > > [199392.047149]  do_trap+0x13d/0x150
> > > [199392.047795]  do_error_trap+0xd5/0x130
> > > [199392.048427]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.049048]  invalid_op+0x14/0x20
> > > [199392.049650] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.050245] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.051434] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.052010] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.052580] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.053150] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.053715] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.054266] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.054807]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.055342]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.055877]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.056410]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.056947]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.057484]  ? sort+0x17b/0x270
> > > [199392.058016]  ? u32_swap+0x10/0x10
> > > [199392.058538]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.059060]  process_one_work+0x171/0x370
> > > [199392.059576]  worker_thread+0x49/0x3f0
> > > [199392.060083]  kthread+0xf8/0x130
> > > [199392.060587]  ? max_active_store+0x80/0x80
> > > [199392.061086]  ? kthread_bind+0x10/0x10
> > > [199392.061569]  ret_from_fork+0x1f/0x40
> > > [199392.062038] ------------[ cut here ]------------
> > > [199392.062508] sched: Unexpected reschedule of offline CPU#1!
> > > [199392.062989] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.063485] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199392.067463]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.068729] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D           4.19.75 #1
> > > [199392.069356] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.069982] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.070604] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.071229] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.072528] RSP: 0018:ffff88880fb03dc0 EFLAGS: 00010082
> > > [199392.073186] RAX: 0000000000000000 RBX: ffff88880fa625c0 RCX: 0000000000000006
> > > [199392.073847] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff88880fb168b0
> > > [199392.074512] RBP: ffff88880fa625c0 R08: 0000000000000001 R09: 00000000000174e5
> > > [199392.075182] R10: ffff8881f2512300 R11: 0000000000000000 R12: ffff888804541640
> > > [199392.075849] R13: ffff88880fb03e08 R14: 0000000000000000 R15: 0000000000000001
> > > [199392.076512] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.077179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.077833] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.078481] Call Trace:
> > > [199392.079117]  <IRQ>
> > > [199392.079745]  check_preempt_curr+0x6b/0x90
> > > [199392.080373]  ttwu_do_wakeup+0x19/0x130
> > > [199392.080999]  try_to_wake_up+0x1e2/0x460
> > > [199392.081623]  __wake_up_common+0x8f/0x160
> > > [199392.082246]  ep_poll_callback+0x1af/0x300
> > > [199392.082860]  __wake_up_common+0x8f/0x160
> > > [199392.083470]  __wake_up_common_lock+0x7a/0xc0
> > > [199392.084074]  irq_work_run_list+0x4c/0x70
> > > [199392.084675]  smp_call_function_single_interrupt+0x3a/0xd0
> > > [199392.085277]  call_function_single_interrupt+0xf/0x20
> > > [199392.085879]  </IRQ>
> > > [199392.086477] RIP: 0010:panic+0x209/0x25c
> > > [199392.087079] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.088341] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
> > > [199392.088988] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.089639] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.090291] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.090947] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.091601] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.092255]  oops_end+0xc1/0xd0
> > > [199392.092894]  do_trap+0x13d/0x150
> > > [199392.093516]  do_error_trap+0xd5/0x130
> > > [199392.094122]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.094718]  invalid_op+0x14/0x20
> > > [199392.095296] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.095866] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.097006] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.097558] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.098103] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.098644] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.099176] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.099703] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.100229]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.100744]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.101251]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.101752]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.102252]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.102750]  ? sort+0x17b/0x270
> > > [199392.103242]  ? u32_swap+0x10/0x10
> > > [199392.103733]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.104228]  process_one_work+0x171/0x370
> > > [199392.104714]  worker_thread+0x49/0x3f0
> > > [199392.105193]  kthread+0xf8/0x130
> > > [199392.105665]  ? max_active_store+0x80/0x80
> > > [199392.106132]  ? kthread_bind+0x10/0x10
> > > [199392.106601]  ret_from_fork+0x1f/0x40
> > > [199392.107069] ---[ end trace c31536d98046e8ed ]---
> > > [199392.107544] ------------[ cut here ]------------
> > > [199392.108017] sched: Unexpected reschedule of offline CPU#7!
> > > [199392.108497] WARNING: CPU: 4 PID: 31359 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x39/0x40
> > > [199392.108996] Modules linked in: dm_snapshot btrfs xor zstd_decompress zstd_compress xxhash raid6_pq xfs dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio drbd lru_cache xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables binfmt_misc ip6table_filter ip6_tables bcache xt_comment crc64 iptable_filter netconsole bridge 8021q garp stp mrp llc lz4 lz4_compress zram sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_support kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul pcc_cpufreq ghash_clmulni_intel pcspkr sg ipmi_si ipmi_devintf lpc_ich ipmi_msghandler video i2c_i801 mfd_core ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper
> > > [199392.112967]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core ahci libahci ixgbe libata e1000e arcmsr mdio dca dm_mirror dm_region_hash dm_log dm_mod
> > > [199392.114203] CPU: 4 PID: 31359 Comm: kworker/u16:4 Tainted: G      D W         4.19.75 #1
> > > [199392.114819] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015
> > > [199392.115439] Workqueue: dm-thin do_worker [dm_thin_pool]
> > > [199392.116061] RIP: 0010:native_smp_send_reschedule+0x39/0x40
> > > [199392.116683] Code: 0f 92 c0 84 c0 74 15 48 8b 05 93 f1 eb 00 be fd 00 00 00 48 8b 40 30 e9 85 f4 9a 00 89 fe 48 c7 c7 08 23 e5 81 e8 c7 9a 05 00 <0f> 0b c3 0f 1f 40 00 0f 1f 44 00 00 53 be 20 00 48 00 48 89 fb 48
> > > [199392.117982] RSP: 0018:ffff88880fb03ee0 EFLAGS: 00010082
> > > [199392.118632] RAX: 0000000000000000 RBX: ffff8887d093ac80 RCX: 0000000000000006
> > > [199392.119295] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff88880fb168b0
> > > [199392.119961] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000017529
> > > [199392.120623] R10: ffff8881438ba800 R11: 0000000000000000 R12: ffffc9000a147988
> > > [199392.121283] R13: ffffffff8113f9a0 R14: 0000000000000002 R15: ffff88880fb1cff8
> > > [199392.121938] FS:  0000000000000000(0000) GS:ffff88880fb00000(0000) knlGS:0000000000000000
> > > [199392.122589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [199392.123229] CR2: 00007f5ae49a1000 CR3: 000000000200a003 CR4: 00000000001626e0
> > > [199392.123867] Call Trace:
> > > [199392.124495]  <IRQ>
> > > [199392.125116]  update_process_times+0x40/0x50
> > > [199392.125742]  tick_sched_handle+0x25/0x60
> > > [199392.126367]  tick_sched_timer+0x37/0x70
> > > [199392.126987]  __hrtimer_run_queues+0xfb/0x270
> > > [199392.127601]  hrtimer_interrupt+0x122/0x270
> > > [199392.128211]  smp_apic_timer_interrupt+0x6a/0x140
> > > [199392.128819]  apic_timer_interrupt+0xf/0x20
> > > [199392.129423]  </IRQ>
> > > [199392.130020] RIP: 0010:panic+0x209/0x25c
> > > [199392.130619] Code: 83 3d 86 de 75 01 00 74 05 e8 ff 5d 02 00 48 c7 c6 e0 bc 80 82 48 c7 c7 10 cd e5 81 31 c0 e8 fe 2b 06 00 fb 66 0f 1f 44 00 00 <45> 31 e4 e8 eb 2e 0d 00 4d 39 ec 7c 1e 41 83 f6 01 48 8b 05 2b de
> > > [199392.131883] RSP: 0018:ffffc9000a147a30 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > [199392.132530] RAX: 0000000000000039 RBX: 0000000000000200 RCX: 0000000000000006
> > > [199392.133180] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88880fb168b0
> > > [199392.133830] RBP: ffffc9000a147aa0 R08: 0000000000000001 R09: 00000000000174b5
> > > [199392.134482] R10: ffff8883d9753f00 R11: 0000000000000001 R12: ffffffff81e4a15c
> > > [199392.135136] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff81e49951
> > > [199392.135792]  oops_end+0xc1/0xd0
> > > [199392.136444]  do_trap+0x13d/0x150
> > > [199392.137094]  do_error_trap+0xd5/0x130
> > > [199392.137740]  ? sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.138383]  invalid_op+0x14/0x20
> > > [199392.139010] RIP: 0010:sm_disk_new_block+0xa0/0xb0 [dm_persistent_data]
> > > [199392.139630] Code: 22 00 00 49 8b 34 24 e8 4e f9 ff ff 85 c0 75 11 83 7c 24 04 01 75 13 48 83 83 28 22 00 00 01 eb af 89 c5 eb ab e8 e0 95 9b e0 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
> > > [199392.140870] RSP: 0018:ffffc9000a147c88 EFLAGS: 00010297
> > > [199392.141472] RAX: 0000000000000000 RBX: ffff8887ceed8000 RCX: 0000000000000000
> > > [199392.142066] RDX: ffff8887d093ac80 RSI: 0000000000000246 RDI: ffff8887faab0a00
> > > [199392.142647] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff8887faab0a98
> > > [199392.143211] R10: ffffffff810f8077 R11: 0000000000000000 R12: ffffc9000a147d58
> > > [199392.143759] R13: ffffc9000a147d58 R14: 00000000ffffffc3 R15: 000000000014bbc0
> > > [199392.144301]  ? __wake_up_common_lock+0x87/0xc0
> > > [199392.144838]  ? sm_disk_new_block+0x82/0xb0 [dm_persistent_data]
> > > [199392.145374]  dm_pool_alloc_data_block+0x3f/0x60 [dm_thin_pool]
> > > [199392.145909]  alloc_data_block.isra.52+0x6d/0x1e0 [dm_thin_pool]
> > > [199392.146435]  process_cell+0x2a3/0x550 [dm_thin_pool]
> > > [199392.146949]  ? sort+0x17b/0x270
> > > [199392.147450]  ? u32_swap+0x10/0x10
> > > [199392.147944]  do_worker+0x268/0x9a0 [dm_thin_pool]
> > > [199392.148441]  process_one_work+0x171/0x370
> > > [199392.148937]  worker_thread+0x49/0x3f0
> > > [199392.149430]  kthread+0xf8/0x130
> > > [199392.149922]  ? max_active_store+0x80/0x80
> > > [199392.150406]  ? kthread_bind+0x10/0x10
> > > [199392.150883]  ret_from_fork+0x1f/0x40
> > > [199392.151353] ---[ end trace c31536d98046e8ee ]---
> > > 
> > > 
> > > --
> > > Eric Wheeler
> > > 
> > > --
> > > dm-devel mailing list
> > > dm-devel at redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> > > 
> > 
> > 
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > 
> > 
> 
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> 





More information about the dm-devel mailing list