[dm-devel] [v2] block: Fix a WRITE SAME BUG_ON

zhangxiaoxu (A) zhangxiaoxu5 at huawei.com
Mon Jan 28 05:48:15 UTC 2019


Hi, Thanks for your reply.

BUG_ON is because the 'bio_iovec(bio).bv_len' not same with 'sdp->sector_size'.

Just as below reproducer, If the 'golden' is stacked by disks 'sda'(logical_block_size=512)
and 'sdb'(logical_block_size=4096), call 'blkdev_issue_write_same' on it will BUG_ON because
of the
	bio->bi_io_vec->bv_len = bdev_logical_block_size('golden'), and the
	bdev_logical_block_size('golden') = max_logical_block_size(sda, sdb).

There are 2 solutions about the problem:
1. Disable the WRITE SAME for that situation: Just like this patch.
2. Improve the WRITE SAME to adapt that situation:
	Reorganization the 'WRITE SAME' bio for the disks which logical_block_size is smaller than the logical volume.


For now, I think we should close the 'WRITE SAME' for this situation.


Reproducer:
1. Start qemu with parameter:
     -drive file=/home/qemu/disk/512k.img,if=none,format=raw,cache=none,id=data.1,discard=on \
     -device scsi-hd,drive=data.1,id=vdata.1 \
     -drive file=/home/qemu/disk/4096k.img,if=none,format=raw,cache=none,id=data.2,discard=on \
     -device scsi-hd,drive=data.2,id=vdata.2,logical_block_size=4096,physical_block_size=4096 \

2. Create LV:
     vgremove golden -y
     vgcreate golden /dev/sda /dev/sdb
     lvcreate -L1G -n testvol -i 2 golden -y

3. Call 'blkdev_issue_write_same' on the device:
	#include <linux/module.h>
	#include <linux/kernel.h>
	#include <linux/init.h>
	#include <linux/fs.h>
	#include <linux/slab.h>
	#include <linux/blkdev.h>

	static int __init hello_module_init(void)
	{
		struct block_device *dev;
		int err;

		printk(KERN_ERR "/dev/golden/testvol\n");

		dev = lookup_bdev("/dev/golden/testvol");
		if (IS_ERR(dev)) {
			pr_warn("lookup blk error!\n");
			return 0;
		}

		err = blkdev_get(dev, FMODE_WRITE | FMODE_READ | FMODE_EXCL, (void *)&err);

		if (err) {
			pr_warn("get blk error!\n");
			return 0;
		}

		err = blkdev_issue_write_same(dev, 8, 0xf8, GFP_NOIO, ZERO_PAGE(0));

		printk(KERN_ERR "blkdev_issue_write_same return %d\n", err);

		return 0;
	}

	static void __exit hello_module_exit(void)
	{
		pr_warn("\n");
	}

	module_init(hello_module_init);
	module_exit(hello_module_exit);

	MODULE_DESCRIPTION("hello module");
	MODULE_LICENSE("GPL");

On 1/26/2019 7:17 PM, John Dorminy wrote:
> Hi. I have read a bit of DM code and spent an hour reviewing this... I
> didn't get to the point of knowing what the right fix for the problem
> is, and I may have a wrong understanding, but I have two thoughts
> about the patch:
> 
> I don't think this is the right solution for two reasons:
> 
> In the first place, if it's an LVM-only issue, we should fix it only
> for device-mapper devices. If this is the right way to fix it,
> possibly the way to do that would be to change DM calls to
> blk_queue_max_write_same_sectors() to only set the max sectors to more
> than 0 if and only if the logical block sizes match.
> 
> In the second place, I don't think this is the problem. The line of
> code that's calling BUG_ON is, I think,
> BUG_ON(bio_offset(bio) || bio_iovec(bio).bv_len != sdp->sector_size);
> 
> This is because write_same bios are supposed to have a single sector
> of data at the beginning of a page in their bio_iovec(bio) [I think,
> based on a commit message I've read] -- in other words, bio_iovec(bio)
> is supposed to say something like { .page = something, .offset = 0,
> .len = 1 native sector }. But clearly, since the BUG_ON is being
> called, one of these is not true. Have you added a log statement right
> before the BUG_ON() to print out bio_offset(bio) and
> bio_iovec(bio).bv_len to see which value defies expectations?
> 
> I would be happy to help you trace through this or attempt to
> reproduce it myself -- what stack of devices can you reproduce this
> on? Is this a dm-linear device on top of a disk? Does it have a
> filesystem on top, and if so, what filesystem?
> 
> Thank you!
> 
> John Dorminy
> 
> 
> On Fri, Jan 25, 2019 at 9:53 AM Zhang Xiaoxu <zhangxiaoxu5 at huawei.com> wrote:
>>
>> If the lvm is stacked by different logical_block_size disks,
>> when WRITE SAME on it, will bug_on:
>>
>> kernel BUG at drivers/scsi/sd.c:968!
>> invalid opcode: 0000 [#1] SMP PTI
>> CPU: 11 PID: 525 Comm: kworker/11:1H Tainted: G           O      5.0.0-rc3+ #2
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
>> Workqueue: kblockd blk_mq_run_work_fn
>> RIP: 0010:sd_init_command+0x7aa/0xdb0
>> Code: 30 75 00 00 c7 85 44 01 00 00 18 00 00 00 0f 85 fa 04 00 00 48 83 c4 40 48
>>        89 df 5b 5d 41 5c 41 5d 41 5e 41 5f e9 b6 ca fe ff <0f> 0b 41 bc 09
>> RSP: 0018:ffffb55f80ddbca0 EFLAGS: 00010206
>> RAX: 0000000000001000 RBX: ffff9ed23fb927a0 RCX: 000000000000f000
>> RDX: ffff9ed23f0a8400 RSI: ffff9ed27bc79800 RDI: 0000000000000000
>> RBP: ffff9ed23fb92680 R08: ffff9ed27c8c0000 R09: ffff9ed23fb927d8
>> R10: 0000000000000000 R11: fefefefefefefeff R12: ffff9ed27bc79800
>> R13: ffff9ed2787a0000 R14: ffff9ed27bdf3400 R15: ffff9ed23fb927a0
>> FS:  0000000000000000(0000) GS:ffff9ed27c8c0000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f6b14cf9341 CR3: 0000000069058000 CR4: 00000000000006e0
>> Call Trace:
>>   ? vp_notify+0x12/0x20
>>   scsi_queue_rq+0x525/0xa30
>>   blk_mq_dispatch_rq_list+0x8d/0x580
>>   ? syscall_return_via_sysret+0x10/0x7f
>>   ? elv_rb_del+0x1f/0x30
>>   ? deadline_remove_request+0x55/0xc0
>>   blk_mq_do_dispatch_sched+0x76/0x110
>>   blk_mq_sched_dispatch_requests+0xf9/0x170
>>   __blk_mq_run_hw_queue+0x51/0xd0
>>   process_one_work+0x195/0x380
>>   worker_thread+0x30/0x390
>>   ? process_one_work+0x380/0x380
>>   kthread+0x113/0x130
>>   ? kthread_park+0x90/0x90
>>   ret_from_fork+0x35/0x40
>> Modules linked in: alloc(O+)
>> ---[ end trace dc92ddeb2e6d1fe5 ]---
>>
>> The logical_block_size of the LVM is the max value of the sub disks,
>> it maybe different with one of the sub disk. when WRITE SAME on the
>> disk, it will BUG_ON when setup WRITE SAME cmd.
>>
>> Close WRITE_SAME feature on the LVM if it was stacked by different
>> logical_block_size disk.
>>
>> Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5 at huawei.com>
>> ---
>>   block/blk-settings.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 3e7038e475ee..e4664280edb4 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -497,8 +497,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>          t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
>>          t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
>>          t->max_dev_sectors = min_not_zero(t->max_dev_sectors, b->max_dev_sectors);
>> -       t->max_write_same_sectors = min(t->max_write_same_sectors,
>> -                                       b->max_write_same_sectors);
>>          t->max_write_zeroes_sectors = min(t->max_write_zeroes_sectors,
>>                                          b->max_write_zeroes_sectors);
>>          t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
>> @@ -537,6 +535,14 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>>                  }
>>          }
>>
>> +       /* If the logical block size is different, forbid write same */
>> +       if (t->logical_block_size != b->logical_block_size &&
>> +           t->max_write_same_sectors != UINT_MAX)
>> +               t->max_write_same_sectors = 0;
>> +       else
>> +               t->max_write_same_sectors = min(t->max_write_same_sectors,
>> +                                               b->max_write_same_sectors);
>> +
>>          t->logical_block_size = max(t->logical_block_size,
>>                                      b->logical_block_size);
>>
>> --
>> 2.14.4
>>
>> --
>> dm-devel mailing list
>> dm-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
> 
> .
> 




More information about the dm-devel mailing list