[dm-devel] Newbie device mapper questions

Tue Jun 16 18:54:33 UTC 2015

On 15.06.2015 21:52, Doug Dumitru wrote:

>> Sounds pretty easy and I also got surprisingly far with my little kernel
>> module. I've so far implemented ctr, dtr, map and status.
> 
> Congratulations, you are actually a long way there.

Thanks but I think I have the mountain still ahead -- still, I would
really like to figure out the nitty-gritty.

> You have to allocate a bio, populate it, allocate pages for buffer,
> populate the bvec, and call make_request (or generic make request).  You
> will get the completion from the bio on the bottom half of the interrupt
> handler, so how much work you can do there is debatable.  You cannot start
> an new IO from there, which you need to.  You will probably want to start a
> helper thread and have the completion routine schedule itself onto your
> thread.  Once you are back on your thread, you can do just about anything.
> 
> Because you need to do IO, you will not be able to do a simple bio "bounce
> redirect".  You will need to do the IO youself (ie, call another make
> request), but you can use the callers bvec for this, so there is no data
> copy required.  Once the request completes, you can then fin the caller.

Oh, wow. This sounds truly terrifying. Let's dive in!

I tried to read your hints one word at a time. So here's the somewhat
pseudocodish solution to my homework:

struct bio *b = bio_alloc(GFP_NOIO, 1);
b->bi_size = 8;
bio_alloc_pages(b, GFP_NOIO);
b->bi_sector = 1234;
b->bi_bdev = lc->metadev->bdev;
b->bi_rw = READ;
b->bi_private = local_ctx;
b->bi_end_io = read_complete_callback;
generic_make_request(bi);

static void read_complete_callback(struct bio *b, int error) {
  // ???
  printk(KERN_INFO "First read byte: %02x\n",
     b->bi_io_vec[0]->bv_page[0]);
}

So I hope this is even remotely close to what I should end up with.

This will alloc a new bio with, as I understand it, one page buffer in
b->bi_io_vec. This buffer is then allocated with bio_alloc_pages to 8
sectors in size (i.e. exactly one page of 4096 bytes). Then the read
address, block device and read mode is set. I pass some kind of local
context so I can do something meaningful in the callback and specify the
callback function. Then I execute the request.

As I understand, this executes asynchronously. So here comes the
threading into play, right? Just pseudocode (because I can't judge how
far I'm off here), but let's say this is map():

void read_complete_callback() {
    semaphore_inc(local_ctx);
}

void map() {
   local_ctx->semaphore->value = 0;

   // Issue read as above
   generic_make_request(bi);

   semaphore_dec(&local_ctx->semaphore);

   // Now the concurrent async IO has finished and we interpret the data
   [...]
}

Oh boy I really don't know if this is even remotely close. Any hints, as
easy as they may seem to you guys, are really greatly appreciated. I've
never worked with this stuff.

> If you cannot continue because devices are not present or the right size,
> yes you should fail the ctr routine.

Alright!

> If you want to setup /proc or other monitoring stuff, you can use the init
> routine, probably plus some statics, to setup "views" into your module.  If
> you want to support multiple instances (and you should), setup a
> /proc/{yourname} directory on the init and then populate it with
> sub-directories every time you create a device.

Okay, I'll try to do this (want to make statistics available via procfs
later on), but one construction site at a time for me.

>> - Can I determine the size the bio in map() will have already in ctr()
>> somehow? Can I assume it will never change if it was once determined?
>> The reason is that for my example I need to make sure the chunk size is
>> a integer multiple of the bio size and I would only like to check this
>> once (in ctr) and not every time (in map).
> 
> Block size will not change.  The size of requests to you is limited by the
> setup of ti->max_io_len.  If you don't set this with recent kernels, you
> will only get 4K, which is not all that efficient.  This is actually part
> of another big topic of "stacked limits", which someone could write a book
> on (and I would read it).

So if I would want to do a large I/O operation (say write one megabyte
of data to a block device somewhere within my driver) I'd have to make
lots of calls to generic_make_request?

Thank you so much for your help,
Best regards,
Johannes