[dm-devel] [PATCH] md: submit MMP reads REQ_SYNC to bypass RAID5 cache

James Simmons uja.ornl at gmail.com
Mon Nov 3 21:01:10 UTC 2014


Hello.

   This is a patch against the latest kernel source which is based on
a patch used by Lustre. The below describes what we are trying to
achieve. I like to get a feedback if this is the right approach.

----------------------------------------------------------------------

The ext4 MMP block reads always need to get fresh data from the
underlying disk.  Otherwise, if a remote node is updating the MMP
block and the reads are fetched from the MD RAID5 stripe cache,
it is possible that the local node will incorrectly decide the
remote node has died and allow the filesystem to be mounted on
two nodes at the same time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20141103/fb7b942b/attachment.htm>
-------------- next part --------------
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9c66e59..11b749c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2678,6 +2678,9 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, in
 		}
 		if (sector >= sh->dev[dd_idx].sector + STRIPE_SECTORS)
 			set_bit(R5_OVERWRITE, &sh->dev[dd_idx].flags);
+	} else if (bi->bi_rw & REQ_NOCACHE) {
+		/* force to read from underlying disk if requested */
+		clear_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
 	}
 
 	pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n",
@@ -4740,6 +4743,9 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 					 bi, 0);
 		bio_endio(bi, 0);
 	}
+
+	if (bi->bi_rw & REQ_NOCACHE)
+		md_wakeup_thread(mddev->thread);
 }
 
 static sector_t raid5_size(struct mddev *mddev, sector_t sectors, int raid_disks);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 445d592..6c329c9 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -165,6 +165,7 @@ enum rq_flag_bits {
 	__REQ_INTEGRITY,	/* I/O includes block integrity payload */
 	__REQ_FUA,		/* forced unit access */
 	__REQ_FLUSH,		/* request for cache flush */
+	__REQ_NOCACHE,		/* request bypass any cache */
 
 	/* bio only flags */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
@@ -239,6 +240,7 @@ enum rq_flag_bits {
 #define REQ_FLUSH_SEQ		(1ULL << __REQ_FLUSH_SEQ)
 #define REQ_IO_STAT		(1ULL << __REQ_IO_STAT)
 #define REQ_MIXED_MERGE		(1ULL << __REQ_MIXED_MERGE)
+#define REQ_NOCACHE		(1ULL << __REQ_NOCACHE)
 #define REQ_SECURE		(1ULL << __REQ_SECURE)
 #define REQ_PM			(1ULL << __REQ_PM)
 #define REQ_HASHED		(1ULL << __REQ_HASHED)


More information about the dm-devel mailing list