[dm-devel] need help with dm & md raid10 issue

Mike Snitzer snitzer at redhat.com
Sat Dec 12 14:28:56 UTC 2020


On Sat, Dec 12 2020 at  3:42am -0500,
Song Liu <songliubraving at fb.com> wrote:

> Hi Mike,
> 
> I am looking at the a new warning while reverting the raid10 changes:
> 
> In file included from ./include/linux/kernel.h:14,
>                 from ./include/asm-generic/bug.h:20,
>                 from ./arch/x86/include/asm/bug.h:93,
>                 from ./include/linux/bug.h:5,
>                 from ./include/linux/mmdebug.h:5,
>                 from ./include/linux/gfp.h:5,
>                 from ./include/linux/slab.h:15,
>                 from drivers/md/dm-raid.c:8:
> drivers/md/dm-raid.c: In function ‘raid_io_hints’:
> ./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast
>   18 |  (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
>      |                            ^~
> ./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’
>   32 |   (__typecheck(x, y) && __no_side_effects(x, y))
>      |    ^~~~~~~~~~~
> ./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’
>   42 |  __builtin_choose_expr(__safe_cmp(x, y), \
>      |                        ^~~~~~~~~~
> ./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’
>   51 | #define min(x, y) __careful_cmp(x, y, <)
>      |                   ^~~~~~~~~~~~~
> ./include/linux/minmax.h:84:39: note: in expansion of macro ‘min’
>   84 |  __x == 0 ? __y : ((__y == 0) ? __x : min(__x, __y)); })
>      |                                       ^~~
> drivers/md/dm-raid.c:3739:33: note: in expansion of macro ‘min_not_zero’
> 3739 |   limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
>      |                                 ^~~~~~~~~~~~
> 
> This is caused by recent reverts. Here is what happened. 
> 
> We are looking at 7 patches, in the original committed order:
> 
> [md1 - md5]
> md: add md_submit_discard_bio() for submitting discard bio
> md/raid10: extend r10bio devs to raid disks
> md/raid10: pull codes that wait for blocked dev into one function
> md/raid10: improve raid10 discard request
> md/raid10: improve discard request for far layout
> 
> [dm1 - dm2]
> dm raid: fix discard limits for raid1 and raid10
> dm raid: remove unnecessary discard limits for raid10
> 
> dm2 depends on the md1-5 changes, while dm1 doesn't. 
> 
> I reverted md patches and dm2, which caused the new warning above. I 
> didn't pay much attention to it, because I thought I was reverting a 
> patch, so I just brought back an old warning. However, this was wrong. 
> The warning was introduced in dm1, and fixed in dm2. Therefore, there 
> wasn't warning before dm1 or after dm2. It happens with dm1 only. 

OK, I see it when I revert f0e90b6c663a7e3b4736cb318c6c7c589f152c28

But a simple cast silences it:

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index dc8568ab96f2..8e04a4cb16a4 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3736,7 +3736,7 @@ static void raid_io_hints(struct dm_target *ti, struct queue_limits *limits)
        if (rs_is_raid10(rs)) {
                limits->discard_granularity = max(chunk_size_bytes,
                                                  limits->discard_granularity);
-               limits->max_discard_sectors = min_not_zero(rs->md.chunk_sectors,
+               limits->max_discard_sectors = min_not_zero((unsigned)rs->md.chunk_sectors,
                                                           limits->max_discard_sectors);
        }
 }

But I think a proper fix is needed in MD (see below).

> At this point, I think our best option is to revert all of these patches. 
> As dm1 alone hasn't been tested much (and it triggers new warning).
> 
> However, as I plan to test dm raid10, I found there might be some issue 
> with it. I am following https://wiki.gentoo.org/wiki/Device-mapper#RAID10 
> and using commands:
> 
> for S in {0..3} ; do dmsetup create test-raid-metadata$S \
>     --table "0 8192 linear /dev/loop$S 0"; \
>     dmsetup create test-raid-data$S --table "0 1953125 linear /dev/loop$S 8192"; done
> 
> dmsetup create test-raid10 --table '0 1953024 raid raid10 5 512 raid10_format near raid10_copies 2 4 - /dev/mapper/test-raid-data0 - /dev/mapper/test-raid-data1 - /dev/mapper/test-raid-data2 - /dev/mapper/test-raid-data3'
> 
> The second command give some error. After debugging I found raid_ctr() 
> calls md_run() with rs->md.new_layout == 0, which doesn't work. The 
> new_layout was initially set to 258 in parse_raid_params(), but got 
> overwritten to zero in rs_set_cur(). With the following hack, I was 
> finally able to run tests with dm-raid10. 
> 
> =================== 8< =========================
> 
> diff --git i/drivers/md/dm-raid.c w/drivers/md/dm-raid.c
> index 9c1f7c4de65b3..13b624490e24c 100644
> --- i/drivers/md/dm-raid.c
> +++ w/drivers/md/dm-raid.c
> @@ -3178,7 +3178,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>                         /* Reshaping ain't recovery, so disable recovery */
>                         rs_setup_recovery(rs, MaxSector);
>                 }
> -               rs_set_cur(rs);
>         } else {
>  size_check:
>                 /* May not set recovery when a device rebuild is requested */
> 
> =================== 8< =========================
> 
> Could you please help me with the following:
> 
> 1. Confirm it is OK to revert both dm patches, which is available at 
> 
>   https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-fixes
> 
> This is urgent, as we need it in 5.10 final this weekend. 

I'm not understanding how the compiler warning relates to your gentoo
recipe's failure (seems it doesn't).

But given MD raid1 doesn't require bio splitting, reverting
e0910c8e4f87b is excessive.

Why not do the unsigned cast like I showed above?  But secondarily: why
is chunk_sectorsin 'struct mddev' _not_ 'unsigned int' like it is in
'struct queue_limits'?

Isn't the proper fix just changing MD's chunk_sectors to 'unsigned int'?
E.g. this seems to silence the compiler warning too:

diff --git a/drivers/md/md.h b/drivers/md/md.h
index ccfb69868c2e..b0814d8f3523 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -311,7 +311,7 @@ struct mddev {
        int                             external;       /* metadata is
                                                         * managed externally */
        char                            metadata_type[17]; /* externally set*/
-       int                             chunk_sectors;
+       unsigned int                    chunk_sectors;
        time64_t                        ctime, utime;
        int                             level, layout;
        char                            clevel[16];
@@ -339,7 +339,7 @@ struct mddev {
         */
        sector_t                        reshape_position;
        int                             delta_disks, new_level, new_layout;
-       int                             new_chunk_sectors;
+       unsigned int                    new_chunk_sectors;
        int                             reshape_backwards;

        struct md_thread                *thread;        /* management thread */

> 2. Help me with dm-raid testing. Is it a bug? Or did I use wrong options? 
> 
> This is probably not urgent, as the same command gives same error on 5.9.0. 

I don't know, likely best for Xiao or Heinz to have a look.

These private mails are completely unnecessary.

You really should be communicating with Xiao and Heinz and cc'ing
dm-devel (I've done that now).  I never even saw any exchange where you
even ask Xaio about the corruption, caused by his raid10 discard
changes, that kicked off your cascade of reverts.

That may have happened and I just missed it.  But if it didn't happen:
why not?

Mike




More information about the dm-devel mailing list