[dm-devel] [PATCH 0/8] dm: request-based dm-multipath
Kiyoshi Ueda
k-ueda at ct.jp.nec.com
Fri Jan 30 08:05:26 UTC 2009
Hi Hannes,
Thank you for the comments and patches.
See my comments below.
On 01/29/2009 07:41 PM +0900, Hannes Reinecke wrote:
> On Thu, Jan 29, 2009 at 04:18:59PM +0900, Kiyoshi Ueda wrote:
>> Hi Alasdair,
>>
>> On 01/29/2009 12:40 AM +0900, Alasdair G Kergon wrote:
>>> On Fri, Oct 03, 2008 at 11:08:25AM -0400, Kiyoshi Ueda wrote:
>>>> This patch-set is the updated version of request-based dm-multipath.
>>>> The changes from the previous version (*) are to follow up the change
>>>> of the interface to export lld's busy state (PATCH 5).
>>> I've fixed them up to compile again, but haven't thoroughly checked for
>>> side-effects.
>> I found some problems below in my patches and now considering
>> how to fix them:
>> o Unnecessary EIO is returned to application if it submits
>> a bio during table swapping.
> Yes, I've noticed that. Problem is this:
>
> --- linux-2.6.27.orig/drivers/md/dm.c
> +++ linux-2.6.27/drivers/md/dm.c
> @@ -1304,7 +1304,11 @@ static int dm_make_request(struct reques
> return 0;
> }
>
> - if (unlikely(!md->map)) {
> + /*
> + * Submitting to a stopped queue with no map is okay;
> + * might happen during reconfiguration.
> + */
> + if (unlikely(!md->map) && !blk_queue_stopped(q)) {
> bio_endio(bio, -EIO);
> return 0;
> }
>
> The make_request callback should never return EIO if there's any
> chance at all to get this request done.
Exactly this part has a race condition with table swapping.
Although you patch fixes the race condition, I think I need
more careful consideration here.
(e.g. Is it really OK to go bios through __make_request() with
old queue restrictions during table swapping?)
I'll work on this problem after my vacation.
>> o kernel panic occurs by frequent table swapping during heavy I/Os.
>>
> That's probably fixed by this patch:
>
> --- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23 15:59:22.741461315 +0100
> +++ linux-2.6.27/drivers/md/dm.c 2009-01-26 09:03:02.787605723 +0100
> @@ -714,13 +714,14 @@ static void free_bio_clone(struct reques
> struct dm_rq_target_io *tio = clone->end_io_data;
> struct mapped_device *md = tio->md;
> struct bio *bio;
> - struct dm_clone_bio_info *info;
>
> while ((bio = clone->bio) != NULL) {
> clone->bio = bio->bi_next;
>
> - info = bio->bi_private;
> - free_bio_info(md, info);
> + if (bio->bi_private) {
> + struct dm_clone_bio_info *info = bio->bi_private;
> + free_bio_info(md, info);
> + }
>
> bio->bi_private = md->bs;
> bio_put(bio);
>
> The info field is not necessarily filled here, so we have to check for it
> explicitly.
>
> With these two patches request-based multipathing have survived all stress-tests
> so far. Except on mainframe (zfcp), but that's more a driver-related thing.
Hmm, it seems I'm hitting a different problem from the one you hit,
because I still see my problem even with your patch.
I need more investigation after my vacation.
Thanks,
Kiyoshi Ueda
More information about the dm-devel
mailing list