[dm-devel] [PATCH] md: submit MMP reads REQ_SYNC to bypass RAID5 cache

NeilBrown neilb at suse.de
Mon Nov 3 22:04:36 UTC 2014


On Mon, 3 Nov 2014 14:01:10 -0700 James Simmons <uja.ornl at gmail.com> wrote:

> Hello.
> 
>    This is a patch against the latest kernel source which is based on
> a patch used by Lustre. The below describes what we are trying to
> achieve. I like to get a feedback if this is the right approach.
> 
> ----------------------------------------------------------------------
> 
> The ext4 MMP block reads always need to get fresh data from the
> underlying disk.  Otherwise, if a remote node is updating the MMP
> block and the reads are fetched from the MD RAID5 stripe cache,
> it is possible that the local node will incorrectly decide the
> remote node has died and allow the filesystem to be mounted on
> two nodes at the same time.

It is preferred for patches to be inline, rather than as attachments, as it
makes it easier to comment on them....

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9c66e59..11b749c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2678,6 +2678,9 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
 		}
 		if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
 			set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
+	} else if (bi->bi_rw & REQ_NOCACHE) {
+		/* force to read from underlying disk if requested */
+		clear_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
 	}
 
 	pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",


This doesn't provide a useful guarantee.  If the device that stores that
block has failed, the md/raid5 will read all other devices to recover the
block.
If that recently happened and you just clear the UPTODATE bit on the block,
md/raid5 will recover the data from all the other blocks, without reading
them.

But considering this at a higher level: if two different nodes try to
assemble the same RAID5 array then you already potentially have a problem.
You really want some sensible cluster co-ordinator and let it make these
decisions.   Hoping the a block device can be a reliable semaphore seems ...
misguided.

NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 828 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20141104/93627e82/attachment.sig>


More information about the dm-devel mailing list