[dm-devel] [PATCH} dm-throttle: new device mapper target to throttle reads and writes
Heinz Mauelshagen
heinzm at redhat.com
Thu Aug 12 09:08:09 UTC 2010
On Tue, 2010-08-10 at 10:44 -0400, Vivek Goyal wrote:
> On Tue, Aug 10, 2010 at 03:42:22PM +0200, Heinz Mauelshagen wrote:
> >
> > This is a new device mapper "throttle" target which allows for
> > throttling reads and writes (ie. enforcing throughput limits) in units
> > of kilobytes per second.
> >
>
> Hi Heinz,
>
> How about extending this stuff to handle cgroups also. So instead of
> having deivice wide throttling policy, we throttle cgroups. That will
> be a much more useful thing and will serve well the use case of throttling
> virtual machines in cgroup.
Hi Vivek,
needs a serious design discussion but I think we could leverage it to
allow for throttling of cgroups.
>
> Yesterday I had raised the issue of cgroup IO bandwidth throttling at
> Linux Storage and Filesystem session. I thought that a device mapper
> target will be the easiest thing to because I can make use of lots
> of existing infrastructure.
>
> Christoph did not like it because of configuration concerns. He preferred
> something in block layer/request queue. It was also hinted that there
> were some ideas floating of better integation of device mapper
> infrastructure with request queue and this thing should go behind that.
Right, if a block layer change of that kind will be pending, we should
wait for it to settle.
> But the problem is I am not sure how long it is going to take before
> this new infrastructure becomes a reality and it will not be practical
> to wait for that.
Did any reliable plans come out of the discussion or will there be any
in the near future?
>
> There is a possibility that we can put a hook in __make_request function
> and first take out all the bios and subject them to bandwidth limitation
> and then pass it to lower layers. But that will mean redoing lots of
> common infrastructure which has already been done. For example,
>
> - What happens to queue congestion semantics.
>
> - Request queue already has it based on requests and device mapper
> seems to have its own congestion functions.
Yes, dm does.
>
> - If I go for taking the bio out on request queue and hold them
> back then I am not sure how to define congestion semantics.
> To keep congestion semantcs simple, it would make sense to
> create a new request queue (with the help of dm target), and
> use that.
Yes, that's an obvious approach to stay with the same congestion
semantics.
>
> - I have yet to think through it but I think I wil be doing other common
> operations like holding back requests in internal queues, dispatching
> these later with the help of a kernel thread, allowing some to dispatch
> immediately as these come in, Putting processes to sleep and waking
> them later if we are already holding too many bios etc.
>
> To me it sounds that doing it is lot simpler with the help of device
> mapper target. Though the not so nice part is the need of configuring
> another device mapper target on every block device we want to control.
Yes, we'd need identity mappings in the stack to be prepared.
Or we need some __generic_make_request() hack ala bcache to hijack the
request function on the fly.
>
> Christoph, would it make sense to currently go ahead with device mapper
> target and later convert that to whenever request queue and device mapper
> fusion thing happens. Or, do you have other ideas which I have not been
> able to grasp....
Let's see what he wants to fill in.
Cheers,
Heinz
>
> Thanks
> Vivek
>
>
>
>
> > I've been using it for a while in testing configurations and think it's
> > valuable for many people requiring simulation of low bandwidth
> > interconnects or simulating different throughput characteristics on
> > distinct address segments of a device (eg. fast outer disk spindles vs.
> > slower inner ones).
> >
> > Please read Documentation/device-mapper/throttle.txt for how to use it.
> >
> > Note: this target can be combined with the "delay" target, which is
> > already upstream in order to set io delays in addition to throttling,
> > again valuable for long distance transport simulations.
> >
> >
> > This target should stay separate rather than merged IMO, because it
> > basically serves testing purposes and hence should not complicate any
> > production mapping target. A potential merge with the "delay" target is
> > subject to discussion.
> >
> >
> > Signed-off-by: Heinz Mauelshagen <heinzm at redhat.com>
> >
> > Documentation/device-mapper/throttle.txt | 68 ++++++
> > drivers/md/Kconfig | 8 +
> > drivers/md/Makefile | 1 +
> > drivers/md/dm-throttle.c | 389 ++++++++++++++++++++++++++++++
> > 4 files changed, 466 insertions(+), 0 deletions(-)
> >
> > diff --git a/Documentation/device-mapper/throttle.txt b/Documentation/device-mapper/throttle.txt
> > new file mode 100644
> > index 0000000..9deea6e
> > --- /dev/null
> > +++ b/Documentation/device-mapper/throttle.txt
> > @@ -0,0 +1,68 @@
> > +dm-throttle
> > +===========
> > +
> > +Device-Mapper's "throttle" target maps a linear range of the Device-Mapper
> > +device onto a linear range of another device providing the option to throttle
> > +read and write ios seperately.
> > +
> > +This target provides the ability to simulate low bandwidth transports to
> > +devices or different throughput to seperate address segements of a device.
> > +
> > +Parameters: <#variable params> <read kbs> <write kbs> <dev path> <offset>
> > + <#variable params> number of variable paramaters to set read and
> > + write throttling kilobytes per second limits.
> > + Range: 0 - 2 with
> > + 0 = no throttling,
> > + 1 and <read kbs> = read throttling only and
> > + 2 and <read kbs> <write kbs> = read and write throttling.
> > + <read kbs> read kilobatyes per second limit
> > + <write kbs> write kilobatyes per second limit
> > + <dev path>: Full pathname to the underlying block-device, or a
> > + "major:minor" device-number.
> > + <offset>: Starting sector within the device.
> > +
> > +Throttling read and write values can be adjusted through the constructor
> > +by reloading a mapping table with the respective parameters or without
> > +reloading through the message interface:
> > +
> > +dmsetup message <mapped device name> <offset> read_kbs <read kbs>
> > +dmsetup message <mapped device name> <offset> write_kbs <read kbs>
> > +
> > +The target provides status information via its status interface:
> > +
> > +dmsetup status <mapped device name>
> > +
> > +Output includes the target version, the actual read and write kilobytes
> > +per second limits used, how many read and write ios have been processed,
> > +deferred and accounted for.
> > +
> > +Status can be reset without reloading the mapping table via the message
> > +interface as well:
> > +
> > +dmsetup message <mapped device name> <offset> stats reset
> > +
> > +
> > +Example scripts
> > +===============
> > +[[
> > +#!/bin/sh
> > +# Create an identity mapping for a device
> > +# setting 1MB/s read and write throttling
> > +echo "0 `blockdev --getsize $1` throttle 2 1024 1024 $1 0" | \
> > +dmsetup create throttle_identity
> > +]]
> > +
> > +[[
> > +#!/bin/sh
> > +# Set different throughput to first and second half of a device
> > +let size=`blockdev --getsize $1`/2
> > +echo "0 $size throttle 2 10480 8192 $1 0
> > +$size $size throttle 2 2048 1024 $1 $size" | \
> > +dmsetup create throttle_segmented
> > +]]
> > +
> > +[[
> > +#!/bin/sh
> > +# Change read throughput on 2nd segment of previous segemented mapping
> > +dmsetup message throttle_segmented $size 1 4096"
> > +]]
> > diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> > index 4a6feac..9c3cbe0 100644
> > --- a/drivers/md/Kconfig
> > +++ b/drivers/md/Kconfig
> > @@ -313,6 +313,14 @@ config DM_DELAY
> >
> > If unsure, say N.
> >
> > +config DM_THROTTLE
> > + tristate "Throttling target (EXPERIMENTAL)"
> > + depends on BLK_DEV_DM && EXPERIMENTAL
> > + ---help---
> > +
> > + A target that supports device throughput throttling
> > + with bandwidth selection for reads and writes.
> > +
> > config DM_UEVENT
> > bool "DM uevents (EXPERIMENTAL)"
> > depends on BLK_DEV_DM && EXPERIMENTAL
> > diff --git a/drivers/md/Makefile b/drivers/md/Makefile
> > index e355e7f..6ea2598 100644
> > --- a/drivers/md/Makefile
> > +++ b/drivers/md/Makefile
> > @@ -37,6 +37,7 @@ obj-$(CONFIG_BLK_DEV_MD) += md-mod.o
> > obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o
> > obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
> > obj-$(CONFIG_DM_DELAY) += dm-delay.o
> > +obj-$(CONFIG_DM_THROTTLE) += dm-throttle.o
> > obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o
> > obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o
> > obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o
> > diff --git a/drivers/md/dm-throttle.c b/drivers/md/dm-throttle.c
> > new file mode 100644
> > index 0000000..bc000d0
> > --- /dev/null
> > +++ b/drivers/md/dm-throttle.c
> > @@ -0,0 +1,389 @@
> > +/*
> > + * Copyright (C) 2010 Red Hat GmbH
> > + *
> > + * Module Author: Heinz Mauelshagen <heinzm at redhat.com>
> > + *
> > + * This file is released under the GPL.
> > + *
> > + * Test target to stack on top of arbitrary other block
> > + * device to throttle io in units of kilobyes per second.
> > + *
> > + * Throttling is configurable separately for reads and write
> > + * via the constructor and the message interfaces.
> > + */
> > +
> > +#include "dm.h"
> > +#include <linux/slab.h>
> > +
> > +static const char *version = "1.0";
> > +
> > +#define DM_MSG_PREFIX "dm-throttle"
> > +#define TI_ERR_RET(str, ret) \
> > + do { ti->error = DM_MSG_PREFIX ": " str; return ret; } while (0);
> > +#define TI_ERR(str) TI_ERR_RET(str, -EINVAL)
> > +
> > +/* Statistics for target status output (see throttle_status()). */
> > +struct stats {
> > + atomic_t accounted[2];
> > + atomic_t deferred_io[2];
> > + atomic_t io[2];
> > +};
> > +
> > +/* Reset statistics variables. */
> > +static void stats_reset(struct stats *stats)
> > +{
> > + int i = 2;
> > +
> > + while (i--) {
> > + atomic_set(&stats->accounted[i], 0);
> > + atomic_set(&stats->deferred_io[i], 0);
> > + atomic_set(&stats->io[i], 0);
> > + }
> > +}
> > +
> > +/* Throttle context. */
> > +struct throttle_c {
> > + /* Device to throttle. */
> > + struct {
> > + struct dm_dev *dev;
> > + sector_t start;
> > + } dev;
> > +
> > + /* ctr parameters. */
> > + struct params {
> > + unsigned bs[2]; /* Bytes per second. */
> > + unsigned kbs_ctr[2]; /* To save kb/s constructor args. */
> > + unsigned params; /* # of variable parameters. */
> > + } params;
> > +
> > + struct account {
> > + /* Accounting for reads and writes. */
> > + struct ac_rw {
> > + struct mutex mutex;
> > +
> > + unsigned long end_jiffies;
> > + unsigned size;
> > + } rw[2];
> > +
> > + unsigned long flags;
> > + } account;
> > +
> > + struct stats stats;
> > +};
> > +
> > +/* Return bytes/s value for kilobytes/s. */
> > +static inline unsigned to_bs(unsigned kbs)
> > +{
> > + return kbs << 10;
> > +}
> > +
> > +static inline unsigned to_kbs(unsigned bs)
> > +{
> > + return bs >> 10;
> > +}
> > +
> > +/* Reset account. */
> > +static void account_reset(int rw, struct throttle_c *tc)
> > +{
> > + struct account *ac = &tc->account;
> > + struct ac_rw *ac_rw = ac->rw + rw;
> > +
> > + ac_rw->size = 0;
> > + ac_rw->end_jiffies = jiffies + HZ;
> > + clear_bit(rw, &ac->flags);
> > + smp_wmb();
> > +}
> > +
> > +/* Decide about throttling (ie. deferring bios). */
> > +static int throttle(struct throttle_c *tc, struct bio *bio)
> > +{
> > + int rw = (bio_data_dir(bio) == WRITE);
> > + unsigned bps; /* Bytes per second. */
> > +
> > + smp_rmb();
> > + bps = tc->params.bs[rw];
> > + if (bps) {
> > + unsigned size;
> > + struct account *ac = &tc->account;
> > + struct ac_rw *ac_rw = ac->rw + rw;
> > +
> > + if (time_after(jiffies, ac_rw->end_jiffies))
> > + /* Measure time exceeded. */
> > + account_reset(rw, tc);
> > + else if (test_bit(rw, &ac->flags))
> > + /* In case we're throttled already. */
> > + return 1;
> > +
> > + /* Account I/O size. */
> > + size = ac_rw->size + bio->bi_size;
> > + if (size > bps) {
> > + /* Hit kilobytes per second threshold. */
> > + set_bit(rw, &ac->flags);
> > + return 1;
> > + } else {
> > + ac_rw->size = size;
> > + smp_wmb();
> > + }
> > +
> > + atomic_inc(tc->stats.accounted + rw); /* Statistics. */
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/*
> > + * Destruct a throttle mapping.
> > + */
> > +static void throttle_dtr(struct dm_target *ti)
> > +{
> > + struct throttle_c *tc = ti->private;
> > +
> > + if (tc->dev.dev)
> > + dm_put_device(ti, tc->dev.dev);
> > +
> > + kfree(tc);
> > +}
> > +
> > +/* Check @arg to be >= @min && <= @max. */
> > +static inline int range_ok(int arg, int min, int max)
> > +{
> > + return !(arg < min || arg > max);
> > +}
> > +
> > +/* Return "write" or "read" string for @write */
> > +static const char *rw_str(int write)
> > +{
> > + return write ? "write" : "read";
> > +}
> > +
> > +/*
> > + * Construct a throttle mapping:
> > + *
> > + * <start> <len> throttle \
> > + * #throttle_params <throttle_params> \
> > + * orig_dev_name orig_dev_start
> > + *
> > + * #throttle_params = 0 - 2
> > + * throttle_parms = [read_kbs [write_kbs]]
> > + *
> > + */
> > +static int throttle_ctr(struct dm_target *ti, unsigned argc, char **argv)
> > +{
> > + int i, kbs[] = { 0, 0 }, r, throttle_params;
> > + unsigned long long tmp;
> > + sector_t start;
> > + struct throttle_c *tc;
> > + struct params *params;
> > +
> > + if (!range_ok(argc, 3, 5))
> > + TI_ERR("Invalid argument count");
> > +
> > + /* Get #throttle_params. */
> > + if (sscanf(argv[0], "%d", &throttle_params) != 1 ||
> > + !range_ok(throttle_params, 0, 2))
> > + TI_ERR("Invalid throttle parameter number argument");
> > +
> > + /* Handle any variable throttle parameters. */
> > + for (i = 0; i < throttle_params; i++) {
> > + /* Get throttle read/write kilobytes per second. */
> > + if (sscanf(argv[i + 1], "%d", kbs + i) != 1 || kbs[i] < 0) {
> > + static char msg[60];
> > +
> > + snprintf(msg, sizeof(msg),
> > + "Invalid throttle %s kilobytes per second",
> > + rw_str(i));
> > + ti->error = msg;
> > + return -EINVAL;
> > + }
> > + }
> > +
> > + if (sscanf(argv[2 + throttle_params], "%llu", &tmp) != 1)
> > + TI_ERR("Invalid throttle device offset");
> > +
> > + start = tmp;
> > +
> > + /* Allocate throttle context. */
> > + tc = ti->private = kzalloc(sizeof(*tc), GFP_KERNEL);
> > + if (!tc)
> > + TI_ERR_RET("Cannot allocate throttle context", -ENOMEM);
> > +
> > + /* Aquire throttle device. */
> > + r = dm_get_device(ti, argv[1 + throttle_params],
> > + dm_table_get_mode(ti->table), &tc->dev.dev);
> > + if (r) {
> > + DMERR("Throttle device lookup failed");
> > + goto err;
> > + }
> > +
> > + /* Check throttled device length. */
> > + if (ti->len >
> > + i_size_read(tc->dev.dev->bdev->bd_inode) >> SECTOR_SHIFT) {
> > + DMERR("Throttled device too small for mapping");
> > + goto err;
> > + }
> > +
> > + tc->dev.start = start;
> > + params = &tc->params;
> > + params->params = throttle_params;
> > +
> > + i = ARRAY_SIZE(kbs);
> > + while (i--) {
> > + params->kbs_ctr[i] = kbs[i];
> > + params->bs[i] = to_bs(kbs[i]);
> > + mutex_init(&tc->account.rw[i].mutex);
> > + }
> > +
> > + stats_reset(&tc->stats);
> > + return 0;
> > +err:
> > + throttle_dtr(ti);
> > + return -EINVAL;
> > +}
> > +
> > +/* Map a throttle io. */
> > +static int throttle_map(struct dm_target *ti, struct bio *bio,
> > + union map_info *map_context)
> > +{
> > + int r, rw = (bio_data_dir(bio) == WRITE);
> > + struct throttle_c *tc = ti->private;
> > + struct ac_rw *ac_rw = tc->account.rw + rw;
> > +
> > + mutex_lock(&ac_rw->mutex);
> > + do {
> > + r = throttle(tc, bio);
> > + if (r) {
> > + long end = ac_rw->end_jiffies, j = jiffies;
> > +
> > + /* Wait till next second when KB/s reached. */
> > + if (j < end)
> > + schedule_timeout_uninterruptible(end - j);
> > + }
> > + } while (r);
> > +
> > + mutex_unlock(&ac_rw->mutex);
> > +
> > + /* Remap. */
> > + bio->bi_bdev = tc->dev.dev->bdev;
> > + bio->bi_sector = bio->bi_sector - ti->begin + tc->dev.start;
> > +
> > + atomic_inc(&tc->stats.io[rw]); /* Statistics */
> > + return 1; /* Done with the bio; let dm core submit it. */
> > +}
> > +
> > +/* Message method. */
> > +static int throttle_message(struct dm_target *ti, unsigned argc, char **argv)
> > +{
> > + int kbs, rw;
> > + struct throttle_c *tc = ti->private;
> > +
> > + if (argc == 2) {
> > + if (!strcmp(argv[0], "stats") &&
> > + !strcmp(argv[1], "reset")) {
> > + /* Reset statistics. */
> > + stats_reset(&tc->stats);
> > + goto out;
> > + } else if (!strcmp(argv[0], "read_kbs"))
> > + /* Adjust read kilobytes per second. */
> > + rw = 0;
> > + else if (!strcmp(argv[0], "write_kbs"))
> > + /* Adjust write kilobytes per second. */
> > + rw = 1;
> > + else
> > + goto err;
> > +
> > + /* Read r/w kbs paramater. */
> > + if (sscanf(argv[1], "%d", &kbs) != 1 || kbs < 0) {
> > + DMWARN("Unrecognised throttle %s_kbs parameter.",
> > + rw_str(rw));
> > + return -EINVAL;
> > + }
> > +
> > + /* Update settings. */
> > + mutex_lock(&tc->account.rw[rw].mutex);
> > + tc->params.bs[rw] = to_bs(kbs);
> > + account_reset(rw, tc);
> > + mutex_unlock(&tc->account.rw[rw].mutex);
> > +out:
> > + return 0;
> > + }
> > +err:
> > + DMWARN("Unrecognised throttle message received.");
> > + return -EINVAL;
> > +}
> > +
> > +/* Status output method. */
> > +static int throttle_status(struct dm_target *ti, status_type_t type,
> > + char *result, unsigned maxlen)
> > +{
> > + ssize_t sz = 0;
> > + struct throttle_c *tc = ti->private;
> > + struct stats *s = &tc->stats;
> > + struct params *p = &tc->params;
> > +
> > + switch (type) {
> > + case STATUSTYPE_INFO:
> > + DMEMIT("v=%s rkb=%u wkb=%u r=%u w=%u rd=%u wd=%u "
> > + "acr=%u acw=%u",
> > + version,
> > + to_kbs(p->bs[0]), to_kbs(p->bs[1]),
> > + atomic_read(s->io), atomic_read(s->io + 1),
> > + atomic_read(s->deferred_io),
> > + atomic_read(s->deferred_io + 1),
> > + atomic_read(s->accounted),
> > + atomic_read(s->accounted + 1));
> > + break;
> > +
> > + case STATUSTYPE_TABLE:
> > + DMEMIT("%u", p->params);
> > +
> > + if (p->params) {
> > + DMEMIT(" %u", p->kbs_ctr[0]);
> > +
> > + if (p->params > 1)
> > + DMEMIT(" %u", p->kbs_ctr[1]);
> > + }
> > +
> > + DMEMIT(" %s %llu",
> > + tc->dev.dev->name,
> > + (unsigned long long) tc->dev.start);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static struct target_type throttle_target = {
> > + .name = "throttle",
> > + .version = {1, 0, 0},
> > + .module = THIS_MODULE,
> > + .ctr = throttle_ctr,
> > + .dtr = throttle_dtr,
> > + .map = throttle_map,
> > + .message = throttle_message,
> > + .status = throttle_status,
> > +};
> > +
> > +int __init dm_throttle_init(void)
> > +{
> > + int r = dm_register_target(&throttle_target);
> > +
> > + if (r)
> > + DMERR("Failed to register %s [%d]", DM_MSG_PREFIX, r);
> > + else
> > + DMINFO("registered %s %s", DM_MSG_PREFIX, version);
> > +
> > + return r;
> > +}
> > +
> > +void dm_throttle_exit(void)
> > +{
> > + dm_unregister_target(&throttle_target);
> > + DMINFO("unregistered %s %s", DM_MSG_PREFIX, version);
> > +}
> > +
> > +/* Module hooks */
> > +module_init(dm_throttle_init);
> > +module_exit(dm_throttle_exit);
> > +
> > +MODULE_DESCRIPTION(DM_NAME "device-mapper throttle target");
> > +MODULE_AUTHOR("Heinz Mauelshagen <heinzm at redhat.com>");
> > +MODULE_LICENSE("GPL");
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel at redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
More information about the dm-devel
mailing list