[dm-devel] [PATCH] md: submit MMP reads REQ_SYNC to bypass RAID5 cache
NeilBrown
neilb at suse.de
Mon Nov 3 22:04:36 UTC 2014
On Mon, 3 Nov 2014 14:01:10 -0700 James Simmons <uja.ornl at gmail.com> wrote:
> Hello.
>
> This is a patch against the latest kernel source which is based on
> a patch used by Lustre. The below describes what we are trying to
> achieve. I like to get a feedback if this is the right approach.
>
> ----------------------------------------------------------------------
>
> The ext4 MMP block reads always need to get fresh data from the
> underlying disk. Otherwise, if a remote node is updating the MMP
> block and the reads are fetched from the MD RAID5 stripe cache,
> it is possible that the local node will incorrectly decide the
> remote node has died and allow the filesystem to be mounted on
> two nodes at the same time.
It is preferred for patches to be inline, rather than as attachments, as it
makes it easier to comment on them....
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9c66e59..11b749c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2678,6 +2678,9 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
}
if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
+ } else if (bi->bi_rw & REQ_NOCACHE) {
+ /* force to read from underlying disk if requested */
+ clear_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
}
pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
This doesn't provide a useful guarantee. If the device that stores that
block has failed, the md/raid5 will read all other devices to recover the
block.
If that recently happened and you just clear the UPTODATE bit on the block,
md/raid5 will recover the data from all the other blocks, without reading
them.
But considering this at a higher level: if two different nodes try to
assemble the same RAID5 array then you already potentially have a problem.
You really want some sensible cluster co-ordinator and let it make these
decisions. Hoping the a block device can be a reliable semaphore seems ...
misguided.
NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 828 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20141104/93627e82/attachment.sig>
More information about the dm-devel
mailing list