[dm-devel] dm mirror: fix crash caused by NULL-pointer dereference
Eric Ren
zren at suse.com
Mon Jun 26 15:27:56 UTC 2017
Hi Mike,
On 06/26/2017 10:37 PM, Mike Snitzer wrote:
> On Mon, Jun 26 2017 at 9:47am -0400,
> Eric Ren <zren at suse.com> wrote:
> [...snip...]
>>> """
>>> Revert "dm mirror: use all available legs on multiple failures"
>>> dm io: fix duplicate bio completion due to missing ref count
>> I have a confusion about this "dm io..." fix. The fix itself is good.
>>
>> Without it, a mkfs.ext4 on a mirrored dev whose primary mirror dev
>> has failed, will crash the kernel with the discard operation from mkfs.ext4.
>> However, mkfs.ext4 can succeed on a healthy mirrored device. This
>> is the thing I don't understand, because no matter the mirrored device is
>> good or not, there's always a duplicate bio completion before having this
>> this fix, thus write_callback() will be called twice, crashing will
>> occur on the
>> second write_callback():
> No, there is only a duplicate bio completion if the error path is taken
> (e.g. underlying device doesn't support discard).
Hmm, when "op == REQ_OP_DISCARD", please see comments in do_region():
"""
static void do_region(int op, int op_flags, unsigned region,
struct dm_io_region *where, struct dpages *dp,
struct io *io)
{
...
if (op == REQ_OP_DISCARD)
special_cmd_max_sectors = q->limits.max_discard_sectors;
...
if ((op == REQ_OP_DISCARD || op == REQ_OP_WRITE_ZEROES ||
op == REQ_OP_WRITE_SAME) && special_cmd_max_sectors == 0) {
atomic_inc(&io->count); ===> [1]
dec_count(io, region, -EOPNOTSUPP); ===> [2]
return;
}
"""
[1] ref count fixed by patch "dm io: ...";
[2] we won't come here if "special_cmd_max_sectors != 0", which is true
when both sides
of the mirror is good.
So only when a mirror device fails, "max_discard_sectors" on its queue
is 0, thus this error path
will be taken, right?
>
>> """
>> static void write_callback(unsigned long error, void *context)
>> {
>> unsigned i;
>> struct bio *bio = (struct bio *) context;
>> struct mirror_set *ms;
>> int should_wake = 0;
>> unsigned long flags;
>>
>> ms = bio_get_m(bio)->ms; ====> NULL pointer at the
>> duplicate completion
>> bio_set_m(bio, NULL);
>> """
>>
>> If no this fix, I expected the DISCARD IO would always crash the
>> kernel, but it's not true when
>> the mirrored device is good. Hope someone happen to know the reason
>> can give some hints ;-P
> If the mirror is healthy then only one completion is returned to
> dm-mirror (via write_callback). The problem was the error patch wasn't
> managing the reference count as needed. Whereas dm-io's normal discard
> IO path does.
Yes, I can understand this.
Thanks a lot,
Eric
>
> Mike
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>
More information about the dm-devel
mailing list