[dm-devel] Re: [PATCH] dm-add-ioband.patch
Christoph Hellwig
hch at infradead.org
Fri Apr 3 14:08:41 UTC 2009
On Fri, Apr 03, 2009 at 06:53:42PM +0900, Ryo Tsuruta wrote:
> Hi Alasdair,
>
> The patches I posted to this ML are still not reflected to the dm
> quilt tree and rejected when applying to the kernel, so I've attached
> the dm-add-ioband.patch and could you please replace the patch in the
> tree? The patch includes some changes which are reflected some
> Lindent's output and some points you suggested before.
>
> I would appreciate it if you could take a look and review this patch
> and advice me about merging dm-ioband to upstream.
Do we have any agreement on what io bandwith controller we want to
merge? Personally I don't think a dm target is a good idea, this seem
like something we want to tie into the block layer directly, especially
when using cfq so that it gets integrated into the scheduling decisions.
>
> Thanks,
> Ryo Tsuruta
>
> Signed-off-by: Ryo Tsuruta <ryov at valinux.co.jp>
> Signed-off-by: Hirokazu Takahashi <taka at valinux.co.jp>
>
> ---
> Documentation/device-mapper/ioband.txt | 976 ++++++++++++++++++++++++
> drivers/md/Kconfig | 13
> drivers/md/Makefile | 2
> drivers/md/dm-ioband-ctl.c | 1312 +++++++++++++++++++++++++++++++++
> drivers/md/dm-ioband-policy.c | 457 +++++++++++
> drivers/md/dm-ioband-type.c | 77 +
> drivers/md/dm-ioband.h | 186 ++++
> 7 files changed, 3023 insertions(+)
>
> Index: linux-2.6.29/Documentation/device-mapper/ioband.txt
> ===================================================================
> --- /dev/null
> +++ linux-2.6.29/Documentation/device-mapper/ioband.txt
> @@ -0,0 +1,976 @@
> + Block I/O bandwidth control: dm-ioband
> +
> + -------------------------------------------------------
> +
> + Table of Contents
> +
> + [1]What's dm-ioband all about?
> +
> + [2]Differences from the CFQ I/O scheduler
> +
> + [3]How dm-ioband works.
> +
> + [4]Setup and Installation
> +
> + [5]Getting started
> +
> + [6]Command Reference
> +
> + [7]Examples
> +
> +What's dm-ioband all about?
> +
> + dm-ioband is an I/O bandwidth controller implemented as a device-mapper
> + driver. Several jobs using the same block device have to share the
> + bandwidth of the device. dm-ioband gives bandwidth to each job according
> + to its weight, which each job can set its own value to.
> +
> + A job is a group of processes with the same pid or pgrp or uid or a
> + virtual machine such as KVM or Xen. A job can also be a cgroup by applying
> + the bio-cgroup patch, which can be found at
> + [8]http://people.valinux.co.jp/~ryov/bio-cgroup/.
> +
> + +------+ +------+ +------+ +------+ +------+ +------+
> + |cgroup| |cgroup| | the | | pid | | pid | | the | jobs
> + | A | | B | |others| | X | | Y | |others|
> + +--|---+ +--|---+ +--|---+ +--|---+ +--|---+ +--|---+
> + +--V----+---V---+----V---+ +--V----+---V---+----V---+
> + | group | group | default| | group | group | default| ioband groups
> + | | | group | | | | group |
> + +-------+-------+--------+ +-------+-------+--------+
> + | ioband1 | | ioband2 | ioband devices
> + +-----------|------------+ +-----------|------------+
> + +-----------V--------------+-------------V------------+
> + | | |
> + | sdb1 | sdb2 | block devices
> + +--------------------------+--------------------------+
> +
> +
> + --------------------------------------------------------------------------
> +
> +Differences from the CFQ I/O scheduler
> +
> + Dm-ioband is flexible to configure the bandwidth settings.
> +
> + Dm-ioband can work with any type of I/O scheduler such as the NOOP
> + scheduler, which is often chosen for high-end storages, since it is
> + implemented outside the I/O scheduling layer. It allows both of partition
> + based bandwidth control and job --- a group of processes --- based
> + control. In addition, it can set different configuration on each block
> + device to control its bandwidth.
> +
> + Meanwhile the current implementation of the CFQ scheduler has 8 IO
> + priority levels and all jobs whose processes have the same IO priority
> + share the bandwidth assigned to this level between them. And IO priority
> + is an attribute of a process, so that it equally effects to all block
> + devices.
> +
> + --------------------------------------------------------------------------
> +
> +How dm-ioband works.
> +
> + Every ioband device has one ioband group, which by default is called the
> + default group.
> +
> + Ioband devices can also have extra ioband groups in them. Each ioband
> + group has a job to support and a weight. Proportional to the weight,
> + dm-ioband gives tokens to the group.
> +
> + A group passes on I/O requests that its job issues to the underlying
> + layer so long as it has tokens left, while requests are blocked if there
> + aren't any tokens left in the group. Tokens are refilled once all of
> + groups that have requests on a given underlying block device use up their
> + tokens.
> +
> + There are two policies for token consumption. One is that a token is
> + consumed for each I/O request. The other is that a token is consumed for
> + each I/O sector, for example, one I/O request which consists of
> + 4Kbytes(512bytes * 8 sectors) read consumes 8 tokens. A user can choose
> + either policy.
> +
> + With this approach, a job running on an ioband group with large weight
> + is guaranteed a wide I/O bandwidth.
> +
> + --------------------------------------------------------------------------
> +
> +Setup and Installation
> +
> + Build a kernel with these options enabled:
> +
> + CONFIG_MD
> + CONFIG_BLK_DEV_DM
> + CONFIG_DM_IOBAND
> +
> +
> + If compiled as module, use modprobe to load dm-ioband.
> +
> + # make modules
> + # make modules_install
> + # depmod -a
> + # modprobe dm-ioband
> +
> +
> + "dmsetup targets" command shows all available device-mapper targets.
> + "ioband" and the version number are displayed when dm-ioband has been
> + loaded.
> +
> + # dmsetup targets | grep ioband
> + ioband v1.10.2
> +
> +
> + --------------------------------------------------------------------------
> +
> +Getting started
> +
> + The following is a brief description how to control the I/O bandwidth of
> + disks. In this description, we'll take one disk with two partitions as an
> + example target.
> +
> + --------------------------------------------------------------------------
> +
> + Create and map ioband devices
> +
> + Create two ioband devices "ioband1" and "ioband2". "ioband1" is mapped
> + to "/dev/sda1" and has a weight of 40. "ioband2" is mapped to "/dev/sda2"
> + and has a weight of 10. "ioband1" can use 80% --- 40/(40+10)*100 --- of
> + the bandwidth of "/dev/sda" while "ioband2" can use 20%.
> +
> + # echo "0 $(blockdev --getsize /dev/sda1) ioband /dev/sda1 1 0 0 none" \
> + "weight 0 :40" | dmsetup create ioband1
> + # echo "0 $(blockdev --getsize /dev/sda2) ioband /dev/sda2 1 0 0 none" \
> + "weight 0 :10" | dmsetup create ioband2
> +
> +
> + If the commands are successful then the device files
> + "/dev/mapper/ioband1" and "/dev/mapper/ioband2" will have been created.
> +
> + --------------------------------------------------------------------------
> +
> + Additional bandwidth control
> +
> + In this example two extra ioband groups are created on "ioband1."
> +
> + First, set the ioband group type as user. Next, create two ioband groups
> + that have id 1000 and 2000. Then, give weights of 30 and 20 to the ioband
> + groups respectively.
> +
> + # dmsetup message ioband1 0 type user
> + # dmsetup message ioband1 0 attach 1000
> + # dmsetup message ioband1 0 attach 2000
> + # dmsetup message ioband1 0 weight 1000:30
> + # dmsetup message ioband1 0 weight 2000:20
> +
> +
> + Now the processes owned by uid 1000 can use 30% --- 30/(30+20+40+10)*100
> + --- of the bandwidth of "/dev/sda" when the processes issue I/O requests
> + through "ioband1." The processes owned by uid 2000 can use 20% of the
> + bandwidth likewise.
> +
> + Table 1. Weight assignments
> +
> + +----------------------------------------------------------------+
> + | ioband device | ioband group | ioband weight |
> + |---------------+--------------------------------+---------------|
> + | ioband1 | user id 1000 | 30 |
> + |---------------+--------------------------------+---------------|
> + | ioband1 | user id 2000 | 20 |
> + |---------------+--------------------------------+---------------|
> + | ioband1 | default group(the other users) | 40 |
> + |---------------+--------------------------------+---------------|
> + | ioband2 | default group | 10 |
> + +----------------------------------------------------------------+
> +
> + --------------------------------------------------------------------------
> +
> + Remove the ioband devices
> +
> + Remove the ioband devices when no longer used.
> +
> + # dmsetup remove ioband1
> + # dmsetup remove ioband2
> +
> +
> + --------------------------------------------------------------------------
> +
> +Command Reference
> +
> + Create an ioband device
> +
> + SYNOPSIS
> +
> + dmsetup create IOBAND_DEVICE
> +
> + DESCRIPTION
> +
> + Create an ioband device with the given name IOBAND_DEVICE.
> + Generally, dmsetup reads a table from standard input. Each line of
> + the table specifies a single target and is of the form:
> +
> + start_sector num_sectors "ioband" device_file ioband_device_id \
> + io_throttle io_limit ioband_group_type policy token_base \
> + :weight [ioband_group_id:weight...]
> +
> +
> + start_sector, num_sectors
> +
> + The sector range of the underlying device where
> + dm-ioband maps.
> +
> + ioband
> +
> + Specify the string "ioband" as a target type.
> +
> + device_file
> +
> + Underlying device name.
> +
> + ioband_device_id
> +
> + The ID number for an ioband device. The same ID
> + must be set among the ioband devices that share the
> + same bandwidth. This is useful for grouping disk
> + drives partitioned from one disk drive such as RAID
> + drive or LVM logical striped volume.
> +
> + io_throttle
> +
> + Dm-ioband starts to control the bandwidth when the
> + number of BIOs in progress exceeds this value. If 0
> + is specified, the default value is used. This setting
> + applies all ioband devices which has the same ioband
> + device ID as you specified by "ioband_device_id."
> +
> + io_limit
> +
> + Dm-ioband blocks all I/O requests for IOBAND_DEVICE
> + when the number of BIOs in progress exceeds this
> + value. If 0 is specified, the default value is used.
> + This setting applies all ioband devices which has the
> + same ioband device ID as you specified by
> + "ioband_device_id."
> +
> + ioband_group_type
> +
> + Specify how to evaluate the ioband group ID. The
> + type must be one of "none", "user", "gid", "pid" or
> + "pgrp." The type "cgroup" is enabled by applying the
> + bio-cgroup patch. Specify "none" if you don't need
> + any ioband groups other than the default ioband
> + group.
> +
> + policy
> +
> + Specify a bandwidth control policy. A user can
> + choose either policy "weight" or "weight-iosize."
> + This setting applies all ioband devices which has the
> + same ioband device ID as you specified by
> + "ioband_device_id."
> +
> + weight
> +
> + This policy controls bandwidth
> + according to the proportional to the
> + weight of each ioband group based on the
> + number of I/O requests.
> +
> + weight-iosize
> +
> + This policy controls bandwidth
> + according to the proportional to the
> + weight of each ioband group based on the
> + number of I/O sectors.
> +
> + token_base
> +
> + The number of tokens which specified by token_base
> + will be distributed to all ioband groups according to
> + the proportional to the weight of each ioband group.
> + If 0 is specified, the default value is used. This
> + setting applies all ioband devices which has the same
> + ioband device ID as you specified by
> + "ioband_device_id."
> +
> + ioband_group_id:weight
> +
> + Set the weight of the ioband group specified by
> + ioband_group_id. If ioband_group_id is omitted, the
> + weight is assigned to the default ioband group.
> +
> + EXAMPLE
> +
> + Create an ioband device with the following parameters:
> +
> + * Starting sector = "0"
> +
> + * The number of sectors = "$(blockdev --getsize /dev/sda1)"
> +
> + * Target type = "ioband"
> +
> + * Underlying device name = "/dev/sda1"
> +
> + * Ioband device ID = "128"
> +
> + * I/O throttle = "10"
> +
> + * I/O limit = "400"
> +
> + * Ioband group type = "user"
> +
> + * Bandwidth control policy = "weight"
> +
> + * Token base = "2048"
> +
> + * Weight for the default ioband group = "100"
> +
> + * Weight for the ioband group 1000 = "80"
> +
> + * Weight for the ioband group 2000 = "20"
> +
> + * Ioband device name = "ioband1"
> +
> + # echo "0 $(blockdev --getsize /dev/sda1) ioband" \
> + "/dev/sda1 128 10 400 user weight 2048 :100 1000:80 2000:20" \
> + | dmsetup create ioband1
> +
> +
> + Create two device groups (ID=1,2). The bandwidths of these
> + device groups will be individually controlled.
> +
> + # echo "0 $(blockdev --getsize /dev/sda1) ioband /dev/sda1 1" \
> + "0 0 none weight 0 :80" | dmsetup create ioband1
> + # echo "0 $(blockdev --getsize /dev/sda2) ioband /dev/sda2 1" \
> + "0 0 none weight 0 :20" | dmsetup create ioband2
> + # echo "0 $(blockdev --getsize /dev/sdb3) ioband /dev/sdb3 2" \
> + "0 0 none weight 0 :60" | dmsetup create ioband3
> + # echo "0 $(blockdev --getsize /dev/sdb4) ioband /dev/sdb4 2" \
> + "0 0 none weight 0 :40" | dmsetup create ioband4
> +
> +
> + --------------------------------------------------------------------------
> +
> + Remove the ioband device
> +
> + SYNOPSIS
> +
> + dmsetup remove IOBAND_DEVICE
> +
> + DESCRIPTION
> +
> + Remove the specified ioband device IOBAND_DEVICE. All the band
> + groups attached to the ioband device are also removed
> + automatically.
> +
> + EXAMPLE
> +
> + Remove ioband device "ioband1."
> +
> + # dmsetup remove ioband1
> +
> +
> + --------------------------------------------------------------------------
> +
> + Set an ioband group type
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 type TYPE
> +
> + DESCRIPTION
> +
> + Set an ioband group type of IOBAND_DEVICE. TYPE must be one of
> + "none", "user", "gid", "pid" or "pgrp." The type "cgroup" is
> + enabled by applying the bio-cgroup patch. Once the type is set,
> + new ioband groups can be created on IOBAND_DEVICE.
> +
> + EXAMPLE
> +
> + Set the ioband group type of ioband device "ioband1" to "user."
> +
> + # dmsetup message ioband1 0 type user
> +
> +
> + --------------------------------------------------------------------------
> +
> + Create an ioband group
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 attach ID
> +
> + DESCRIPTION
> +
> + Create an ioband group and attach it to IOBAND_DEVICE. ID
> + specifies user-id, group-id, process-id or process-group-id
> + depending the ioband group type of IOBAND_DEVICE.
> +
> + EXAMPLE
> +
> + Create an ioband group which consists of all processes with
> + user-id 1000 and attach it to ioband device "ioband1."
> +
> + # dmsetup message ioband1 0 type user
> + # dmsetup message ioband1 0 attach 1000
> +
> +
> + --------------------------------------------------------------------------
> +
> + Detach the ioband group
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 detach ID
> +
> + DESCRIPTION
> +
> + Detach the ioband group specified by ID from ioband device
> + IOBAND_DEVICE.
> +
> + EXAMPLE
> +
> + Detach the ioband group with ID "2000" from ioband device
> + "ioband2."
> +
> + # dmsetup message ioband2 0 detach 1000
> +
> +
> + --------------------------------------------------------------------------
> +
> + Set bandwidth control policy
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 policy policy
> +
> + DESCRIPTION
> +
> + Set a bandwidth control policy. A user can choose either policy
> + "weight" or "weight-iosize." This setting applies all ioband
> + devices which has the same ioband device ID as IOBAND_DEVICE.
> +
> + weight
> +
> + This policy controls bandwidth according to the
> + proportional to the weight of each ioband group based
> + on the number of I/O requests.
> +
> + weight-iosize
> +
> + This policy controls bandwidth according to the
> + proportional to the weight of each ioband group based
> + on the number of I/O sectors.
> +
> + EXAMPLE
> +
> + Set bandwidth control policy of ioband devices which have the
> + same ioband device ID as "ioband1" to "weight-iosize."
> +
> + # dmsetup message ioband1 0 policy weight-iosize
> +
> +
> + --------------------------------------------------------------------------
> +
> + Set the weight of an ioband group
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 weight VAL
> +
> + dmsetup message IOBAND_DEVICE 0 weight ID:VAL
> +
> + DESCRIPTION
> +
> + Set the weight of the ioband group specified by ID. Set the
> + weight of the default ioband group of IOBAND_DEVICE if ID isn't
> + specified.
> +
> + The following example means that "ioband1" can use 80% ---
> + 40/(40+10)*100 --- of the bandwidth of the underlying block device
> + while "ioband2" can use 20%.
> +
> + # dmsetup message ioband1 0 weight 40
> + # dmsetup message ioband2 0 weight 10
> +
> +
> + The following lines have the same effect as the above:
> +
> + # dmsetup message ioband1 0 weight 4
> + # dmsetup message ioband2 0 weight 1
> +
> +
> + VAL must be an integer larger than 0. The default value, which
> + is assigned to newly created ioband groups, is 100.
> +
> + EXAMPLE
> +
> + Set the weight of the default ioband group of "ioband1" to 40.
> +
> + # dmsetup message ioband1 0 weight 40
> +
> +
> + Set the weight of the ioband group of "ioband1" with ID "1000"
> + to 10.
> +
> + # dmsetup message ioband1 0 weight 1000:10
> +
> +
> + --------------------------------------------------------------------------
> +
> + Set the number of tokens
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 token VAL
> +
> + DESCRIPTION
> +
> + The number of tokens will be distributed to all ioband groups
> + according to the proportional to the weight of each ioband group.
> + If 0 is specified, the default value is used. This setting applies
> + all ioband devices which has the same ioband device ID as
> + IOBAND_DEVICE
> +
> + EXAMPLE
> +
> + Set the number of tokens to 256.
> +
> + # dmsetup message ioband1 0 token 256
> +
> +
> + --------------------------------------------------------------------------
> +
> + Set a limit of how many tokens are carried over
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 carryover VAL
> +
> + DESCRIPTION
> +
> + When dm-ioband tries to refill an ioband group with tokens after
> + another ioband group is already refilled several times, dm-ioband
> + determines the number of tokens to refill by multiplying the
> + number of tokens refilled once by the smaller of how many times
> + the other group is already refilled or this limit. If 0 is
> + specified, the default value is used. This setting applies all
> + ioband devices which has the same ioband device ID as
> + IOBAND_DEVICE.
> +
> + EXAMPLE
> +
> + Set a limit for "ioband1" to 2.
> +
> + # dmsetup message ioband1 0 carryover 2
> +
> +
> + --------------------------------------------------------------------------
> +
> + Set I/O throttling
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 io_throttle VAL
> +
> + DESCRIPTION
> +
> + Dm-ioband starts to control the bandwidth when the number of
> + BIOs in progress exceeds this value. If 0 is specified, the
> + default value is used. This setting applies all ioband devices
> + which has the same ioband device ID as IOBAND_DEVICE.
> +
> + EXAMPLE
> +
> + Set the I/O throttling value of "ioband1" to 16.
> +
> + # dmsetup message ioband1 0 io_throttle 16
> +
> +
> + --------------------------------------------------------------------------
> +
> + Set I/O limiting
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 io_limit VAL
> +
> + DESCRIPTION
> +
> + Dm-ioband blocks all I/O requests for IOBAND_DEVICE when the
> + number of BIOs in progress exceeds this value. If 0 is specified,
> + the default value is used. This setting applies all ioband devices
> + which has the same ioband device ID as IOBAND_DEVICE.
> +
> + EXAMPLE
> +
> + Set the I/O limiting value of "ioband1" to 128.
> +
> + # dmsetup message ioband1 0 io_limit 128
> +
> +
> + --------------------------------------------------------------------------
> +
> + Display settings
> +
> + SYNOPSIS
> +
> + dmsetup table --target ioband
> +
> + DESCRIPTION
> +
> + Display the current table for the ioband device in a format. See
> + "dmsetup create" command for information on the table format.
> +
> + EXAMPLE
> +
> + The following output shows the current table of "ioband1."
> +
> + # dmsetup table --target ioband
> + ioband: 0 32129937 ioband1 8:29 128 10 400 user weight \
> + 2048 :100 1000:80 2000:20
> +
> +
> + --------------------------------------------------------------------------
> +
> + Display Statistics
> +
> + SYNOPSIS
> +
> + dmsetup status --target ioband
> +
> + DESCRIPTION
> +
> + Display the statistics of all the ioband devices whose target
> + type is "ioband."
> +
> + The output format is as below. the first five columns shows:
> +
> + * ioband device name
> +
> + * logical start sector of the device (must be 0)
> +
> + * device size in sectors
> +
> + * target type (must be "ioband")
> +
> + * device group ID
> +
> + The remaining columns show the statistics of each ioband group
> + on the band device. Each group uses seven columns for its
> + statistics.
> +
> + * ioband group ID (-1 means default)
> +
> + * total read requests
> +
> + * delayed read requests
> +
> + * total read sectors
> +
> + * total write requests
> +
> + * delayed write requests
> +
> + * total write sectors
> +
> + EXAMPLE
> +
> + The following output shows the statistics of two ioband devices.
> + Ioband2 only has the default ioband group and ioband1 has three
> + (default, 1001, 1002) ioband groups.
> +
> + # dmsetup status
> + ioband2: 0 44371467 ioband 128 -1 143 90 424 122 78 352
> + ioband1: 0 44371467 ioband 128 -1 223 172 408 211 136 600 1001 \
> + 166 107 472 139 95 352 1002 211 146 520 210 147 504
> +
> +
> + --------------------------------------------------------------------------
> +
> + Reset status counter
> +
> + SYNOPSIS
> +
> + dmsetup message IOBAND_DEVICE 0 reset
> +
> + DESCRIPTION
> +
> + Reset the statistics of ioband device IOBAND_DEVICE.
> +
> + EXAMPLE
> +
> + Reset the statistics of "ioband1."
> +
> + # dmsetup message ioband1 0 reset
> +
> +
> + --------------------------------------------------------------------------
> +
> +Examples
> +
> + Example #1: Bandwidth control on Partitions
> +
> + This example describes how to control the bandwidth with disk
> + partitions. The following diagram illustrates the configuration of this
> + example. You may want to run a database on /dev/mapper/ioband1 and web
> + applications on /dev/mapper/ioband2.
> +
> + /mnt1 /mnt2 mount points
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/mapper/ioband1 | | /dev/mapper/ioband2 | ioband devices
> + +--------------------------+ +--------------------------+
> + | default group | | default group | ioband groups
> + | (80) | | (40) | (weight)
> + +-------------|------------+ +-------------|------------+
> + | |
> + +-------------V-------------+--------------V------------+
> + | /dev/sda1 | /dev/sda2 | partitions
> + +---------------------------+---------------------------+
> +
> +
> + To setup the above configuration, follow these steps:
> +
> + 1. Create ioband devices with the same device group ID and assign
> + weights of 80 and 40 to the default ioband groups respectively.
> +
> + # echo "0 $(blockdev --getsize /dev/sda1) ioband /dev/sda1 1 0 0" \
> + "none weight 0 :80" | dmsetup create ioband1
> + # echo "0 $(blockdev --getsize /dev/sda2) ioband /dev/sda2 1 0 0" \
> + "none weight 0 :40" | dmsetup create ioband2
> +
> +
> + 2. Create filesystems on the ioband devices and mount them.
> +
> + # mkfs.ext3 /dev/mapper/ioband1
> + # mount /dev/mapper/ioband1 /mnt1
> +
> + # mkfs.ext3 /dev/mapper/ioband2
> + # mount /dev/mapper/ioband2 /mnt2
> +
> +
> + --------------------------------------------------------------------------
> +
> + Example #2: Bandwidth control on Logical Volumes
> +
> + This example is similar to the example #1 but it uses LVM logical
> + volumes instead of disk partitions. This example shows how to configure
> + ioband devices on two striped logical volumes.
> +
> + /mnt1 /mnt2 mount points
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/mapper/ioband1 | | /dev/mapper/ioband2 | ioband devices
> + +--------------------------+ +--------------------------+
> + | default group | | default group | ioband groups
> + | (80) | | (40) | (weight)
> + +-------------|------------+ +-------------|------------+
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/mapper/lv0 | | /dev/mapper/lv1 | striped logical
> + | | | | volumes
> + +-------------------------------------------------------+
> + | vg0 | volume group
> + +-------------|----------------------------|------------+
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/sdb | | /dev/sdc | physical disks
> + +--------------------------+ +--------------------------+
> +
> +
> + To setup the above configuration, follow these steps:
> +
> + 1. Initialize the partitions for use by LVM.
> +
> + # pvcreate /dev/sdb
> + # pvcreate /dev/sdc
> +
> +
> + 2. Create a new volume group named "vg0" with /dev/sdb and /dev/sdc.
> +
> + # vgcreate vg0 /dev/sdb /dev/sdc
> +
> +
> + 3. Create two logical volumes in "vg0." The volumes have to be striped.
> +
> + # lvcreate -n lv0 -i 2 -I 64 vg0 -L 1024M
> + # lvcreate -n lv1 -i 2 -I 64 vg0 -L 1024M
> +
> +
> + The rest is the same as the example #1.
> +
> + 4. Create ioband devices corresponding to each logical volume and
> + assign weights of 80 and 40 to the default ioband groups respectively.
> +
> + # echo "0 $(blockdev --getsize /dev/mapper/vg0-lv0)" \
> + "ioband /dev/mapper/vg0-lv0 1 0 0 none weight 0 :80" | \
> + dmsetup create ioband1
> + # echo "0 $(blockdev --getsize /dev/mapper/vg0-lv1)" \
> + "ioband /dev/mapper/vg0-lv1 1 0 0 none weight 0 :40" | \
> + dmsetup create ioband2
> +
> +
> + 5. Create filesystems on the ioband devices and mount them.
> +
> + # mkfs.ext3 /dev/mapper/ioband1
> + # mount /dev/mapper/ioband1 /mnt1
> +
> + # mkfs.ext3 /dev/mapper/ioband2
> + # mount /dev/mapper/ioband2 /mnt2
> +
> +
> + --------------------------------------------------------------------------
> +
> + Example #3: Bandwidth control on processes
> +
> + This example describes how to control the bandwidth with groups of
> + processes. You may also want to run an additional application on the same
> + machine described in the example #1. This example shows how to add a new
> + ioband group for this application.
> +
> + /mnt1 /mnt2 mount points
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/mapper/ioband1 | | /dev/mapper/ioband2 | ioband devices
> + +-------------+------------+ +-------------+------------+
> + | default | | user=1000 | default | ioband groups
> + | (80) | | (20) | (40) | (weight)
> + +-------------+------------+ +-------------+------------+
> + | |
> + +-------------V-------------+--------------V------------+
> + | /dev/sda1 | /dev/sda2 | partitions
> + +---------------------------+---------------------------+
> +
> +
> + The following shows to set up a new ioband group on the machine that is
> + already configured as the example #1. The application will have a weight
> + of 20 and run with user-id 1000 on /dev/mapper/ioband2.
> +
> + 1. Set the type of ioband2 to "user."
> +
> + # dmsetup message ioband2 0 type user.
> +
> +
> + 2. Create a new ioband group on ioband2.
> +
> + # dmsetup message ioband2 0 attach 1000
> +
> +
> + 3. Assign weight of 10 to this newly created ioband group.
> +
> + # dmsetup message ioband2 0 weight 1000:20
> +
> +
> + --------------------------------------------------------------------------
> +
> + Example #4: Bandwidth control for Xen virtual block devices
> +
> + This example describes how to control the bandwidth for Xen virtual
> + block devices. The following diagram illustrates the configuration of this
> + example.
> +
> + Virtual Machine 1 Virtual Machine 2 virtual machines
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/xvda1 | | /dev/xvda1 | virtual block
> + +-------------|------------+ +-------------|------------+ devices
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/mapper/ioband1 | | /dev/mapper/ioband2 | ioband devices
> + +--------------------------+ +--------------------------+
> + | default group | | default group | ioband groups
> + | (80) | | (40) | (weight)
> + +-------------|------------+ +-------------|------------+
> + | |
> + +-------------V-------------+--------------V------------+
> + | /dev/sda1 | /dev/sda2 | partitions
> + +---------------------------+---------------------------+
> +
> +
> + The followings shows how to map ioband device "ioband1" and "ioband2" to
> + virtual block device "/dev/xvda1 on Virtual Machine 1" and "/dev/xvda1 on
> + Virtual Machine 2" respectively on the machine configured as the example
> + #1. Add the following lines to the configuration files that are referenced
> + when creating "Virtual Machine 1" and "Virtual Machine 2."
> +
> + For "Virtual Machine 1"
> + disk = [ 'phy:/dev/mapper/ioband1,xvda,w' ]
> +
> + For "Virtual Machine 2"
> + disk = [ 'phy:/dev/mapper/ioband2,xvda,w' ]
> +
> +
> + --------------------------------------------------------------------------
> +
> + Example #5: Bandwidth control for Xen blktap devices
> +
> + This example describes how to control the bandwidth for Xen virtual
> + block devices when Xen blktap devices are used. The following diagram
> + illustrates the configuration of this example.
> +
> + Virtual Machine 1 Virtual Machine 2 virtual machines
> + | |
> + +-------------V------------+ +-------------V------------+
> + | /dev/xvda1 | | /dev/xvda1 | virtual block
> + +-------------|------------+ +-------------|------------+ devices
> + | |
> + +-------------V----------------------------V------------+
> + | /dev/mapper/ioband1 | ioband device
> + +---------------------------+---------------------------+
> + | default group | default group | ioband groups
> + | (80) | (40) | (weight)
> + +-------------|-------------+--------------|------------+
> + | |
> + +-------------|----------------------------|------------+
> + | +----------V----------+ +----------V---------+ |
> + | | vm1.img | | vm2.img | | disk image files
> + | +---------------------+ +--------------------+ |
> + | /vmdisk | mount point
> + +---------------------------|---------------------------+
> + |
> + +---------------------------V---------------------------+
> + | /dev/sda1 | partition
> + +-------------------------------------------------------+
> +
> +
> + To setup the above configuration, follow these steps:
> +
> + 1. Create an ioband device.
> +
> + # echo "0 $(blockdev --getsize /dev/sda1) ioband /dev/sda1" \
> + "1 0 0 none weight 0 :100" | dmsetup create ioband1
> +
> +
> + 2. Add the following lines to the configuration files that are
> + referenced when creating "Virtual Machine 1" and "Virtual Machine 2."
> + Disk image files "/vmdisk/vm1.img" and "/vmdisk/vm2.img" will be used.
> +
> + For "Virtual Machine 1"
> + disk = [ 'tap:aio:/vmdisk/vm1.img,xvda,w', ]
> +
> + For "Virtual Machine 1"
> + disk = [ 'tap:aio:/vmdisk/vm2.img,xvda,w', ]
> +
> +
> + 3. Run the virtual machines.
> +
> + # xm create vm1
> + # xm create vm2
> +
> +
> + 4. Find out the process IDs of the daemons which control the blktap
> + devices.
> +
> + # lsof /vmdisk/disk[12].img
> + COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
> + tapdisk 15011 root 11u REG 253,0 2147483648 48961 /vmdisk/vm1.img
> + tapdisk 15276 root 13u REG 253,0 2147483648 48962 /vmdisk/vm2.img
> +
> +
> + 5. Create new ioband groups of pid 15011 and pid 15276, which are
> + process IDs of the tapdisks, and assign weight of 80 and 40 to the
> + groups respectively.
> +
> + # dmsetup message ioband1 0 type pid
> + # dmsetup message ioband1 0 attach 15011
> + # dmsetup message ioband1 0 weight 15011:80
> + # dmsetup message ioband1 0 attach 15276
> + # dmsetup message ioband1 0 weight 15276:40
> Index: linux-2.6.29/drivers/md/Kconfig
> ===================================================================
> --- linux-2.6.29.orig/drivers/md/Kconfig
> +++ linux-2.6.29/drivers/md/Kconfig
> @@ -289,4 +289,17 @@ config DM_UEVENT
> ---help---
> Generate udev events for DM events.
>
> +config DM_IOBAND
> + tristate "I/O bandwidth control (EXPERIMENTAL)"
> + depends on BLK_DEV_DM && EXPERIMENTAL
> + ---help---
> + This device-mapper target allows to define how the
> + available bandwidth of a storage device should be
> + shared between processes, cgroups, the partitions or the LUNs.
> +
> + Information on how to use dm-ioband is available in:
> + <file:Documentation/device-mapper/ioband.txt>.
> +
> + If unsure, say N.
> +
> endif # MD
> Index: linux-2.6.29/drivers/md/Makefile
> ===================================================================
> --- linux-2.6.29.orig/drivers/md/Makefile
> +++ linux-2.6.29/drivers/md/Makefile
> @@ -8,6 +8,7 @@ dm-multipath-objs := dm-path-selector.o
> dm-snapshot-objs := dm-snap.o dm-exception-store.o dm-snap-transient.o \
> dm-snap-persistent.o
> dm-mirror-objs := dm-raid1.o
> +dm-ioband-objs := dm-ioband-ctl.o dm-ioband-policy.o dm-ioband-type.o
> md-mod-objs := md.o bitmap.o
> raid456-objs := raid5.o raid6algos.o raid6recov.o raid6tables.o \
> raid6int1.o raid6int2.o raid6int4.o \
> @@ -37,6 +38,7 @@ obj-$(CONFIG_DM_MULTIPATH) += dm-multipa
> obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
> obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
> obj-$(CONFIG_DM_ZERO) += dm-zero.o
> +obj-$(CONFIG_DM_IOBAND) += dm-ioband.o
>
> quiet_cmd_unroll = UNROLL $@
> cmd_unroll = $(PERL) $(srctree)/$(src)/unroll.pl $(UNROLL) \
> Index: linux-2.6.29/drivers/md/dm-ioband-ctl.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.29/drivers/md/dm-ioband-ctl.c
> @@ -0,0 +1,1312 @@
> +/*
> + * Copyright (C) 2008 VA Linux Systems Japan K.K.
> + * Authors: Hirokazu Takahashi <taka at valinux.co.jp>
> + * Ryo Tsuruta <ryov at valinux.co.jp>
> + *
> + * I/O bandwidth control
> + *
> + * This file is released under the GPL.
> + */
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/bio.h>
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +#include <linux/raid/md.h>
> +#include <linux/rbtree.h>
> +#include "dm.h"
> +#include "dm-bio-list.h"
> +#include "dm-ioband.h"
> +
> +#define POLICY_PARAM_START 6
> +#define POLICY_PARAM_DELIM "=:,"
> +
> +static LIST_HEAD(ioband_device_list);
> +/* to protect ioband_device_list */
> +static DEFINE_SPINLOCK(ioband_devicelist_lock);
> +
> +static void suspend_ioband_device(struct ioband_device *, unsigned long, int);
> +static void resume_ioband_device(struct ioband_device *);
> +static void ioband_conduct(struct work_struct *);
> +static void ioband_hold_bio(struct ioband_group *, struct bio *);
> +static struct bio *ioband_pop_bio(struct ioband_group *);
> +static int ioband_set_param(struct ioband_group *, char *, char *);
> +static int ioband_group_attach(struct ioband_group *, int, char *);
> +static int ioband_group_type_select(struct ioband_group *, char *);
> +
> +static void do_nothing(void) {}
> +
> +static int policy_init(struct ioband_device *dp, char *name,
> + int argc, char **argv)
> +{
> + struct policy_type *p;
> + struct ioband_group *gp;
> + unsigned long flags;
> + int r;
> +
> + for (p = dm_ioband_policy_type; p->p_name; p++) {
> + if (!strcmp(name, p->p_name))
> + break;
> + }
> + if (!p->p_name)
> + return -EINVAL;
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> + if (dp->g_policy == p) {
> + /* do nothing if the same policy is already set */
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return 0;
> + }
> +
> + suspend_ioband_device(dp, flags, 1);
> + list_for_each_entry(gp, &dp->g_groups, c_list)
> + dp->g_group_dtr(gp);
> +
> + /* switch to the new policy */
> + dp->g_policy = p;
> + r = p->p_policy_init(dp, argc, argv);
> + if (!r) {
> + if (!dp->g_hold_bio)
> + dp->g_hold_bio = ioband_hold_bio;
> + if (!dp->g_pop_bio)
> + dp->g_pop_bio = ioband_pop_bio;
> +
> + list_for_each_entry(gp, &dp->g_groups, c_list)
> + dp->g_group_ctr(gp, NULL);
> + }
> + resume_ioband_device(dp);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return r;
> +}
> +
> +static struct ioband_device *alloc_ioband_device(char *name,
> + int io_throttle, int io_limit)
> +{
> + struct ioband_device *dp, *new_dp;
> + unsigned long flags;
> +
> + new_dp = kzalloc(sizeof(struct ioband_device), GFP_KERNEL);
> + if (!new_dp)
> + return NULL;
> +
> + /*
> + * Prepare its own workqueue as generic_make_request() may
> + * potentially block the workqueue when submitting BIOs.
> + */
> + new_dp->g_ioband_wq = create_workqueue("kioband");
> + if (!new_dp->g_ioband_wq) {
> + kfree(new_dp);
> + return NULL;
> + }
> +
> + spin_lock_irqsave(&ioband_devicelist_lock, flags);
> + list_for_each_entry(dp, &ioband_device_list, g_list) {
> + if (!strcmp(dp->g_name, name)) {
> + dp->g_ref++;
> + spin_unlock_irqrestore(&ioband_devicelist_lock, flags);
> + destroy_workqueue(new_dp->g_ioband_wq);
> + kfree(new_dp);
> + return dp;
> + }
> + }
> +
> + INIT_DELAYED_WORK(&new_dp->g_conductor, ioband_conduct);
> + INIT_LIST_HEAD(&new_dp->g_groups);
> + INIT_LIST_HEAD(&new_dp->g_list);
> + spin_lock_init(&new_dp->g_lock);
> + mutex_init(&new_dp->g_lock_device);
> + bio_list_init(&new_dp->g_urgent_bios);
> + new_dp->g_io_throttle = io_throttle;
> + new_dp->g_io_limit[READ] = io_limit;
> + new_dp->g_io_limit[WRITE] = io_limit;
> + new_dp->g_issued[READ] = 0;
> + new_dp->g_issued[WRITE] = 0;
> + new_dp->g_blocked = 0;
> + new_dp->g_ref = 1;
> + new_dp->g_flags = 0;
> + strlcpy(new_dp->g_name, name, sizeof(new_dp->g_name));
> + new_dp->g_policy = NULL;
> + new_dp->g_hold_bio = NULL;
> + new_dp->g_pop_bio = NULL;
> + init_waitqueue_head(&new_dp->g_waitq);
> + init_waitqueue_head(&new_dp->g_waitq_suspend);
> + init_waitqueue_head(&new_dp->g_waitq_flush);
> + list_add_tail(&new_dp->g_list, &ioband_device_list);
> +
> + spin_unlock_irqrestore(&ioband_devicelist_lock, flags);
> + return new_dp;
> +}
> +
> +static void release_ioband_device(struct ioband_device *dp)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&ioband_devicelist_lock, flags);
> + dp->g_ref--;
> + if (dp->g_ref > 0) {
> + spin_unlock_irqrestore(&ioband_devicelist_lock, flags);
> + return;
> + }
> + list_del(&dp->g_list);
> + spin_unlock_irqrestore(&ioband_devicelist_lock, flags);
> + destroy_workqueue(dp->g_ioband_wq);
> + kfree(dp);
> +}
> +
> +static int is_ioband_device_flushed(struct ioband_device *dp,
> + int wait_completion)
> +{
> + struct ioband_group *gp;
> +
> + if (wait_completion && dp->g_issued[READ] + dp->g_issued[WRITE] > 0)
> + return 0;
> + if (dp->g_blocked || waitqueue_active(&dp->g_waitq))
> + return 0;
> + list_for_each_entry(gp, &dp->g_groups, c_list)
> + if (waitqueue_active(&gp->c_waitq))
> + return 0;
> + return 1;
> +}
> +
> +static void suspend_ioband_device(struct ioband_device *dp,
> + unsigned long flags, int wait_completion)
> +{
> + struct ioband_group *gp;
> +
> + /* block incoming bios */
> + set_device_suspended(dp);
> +
> + /* wake up all blocked processes and go down all ioband groups */
> + wake_up_all(&dp->g_waitq);
> + list_for_each_entry(gp, &dp->g_groups, c_list) {
> + if (!is_group_down(gp)) {
> + set_group_down(gp);
> + set_group_need_up(gp);
> + }
> + wake_up_all(&gp->c_waitq);
> + }
> +
> + /* flush the already mapped bios */
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + queue_delayed_work(dp->g_ioband_wq, &dp->g_conductor, 0);
> + flush_workqueue(dp->g_ioband_wq);
> +
> + /* wait for all processes to wake up and bios to release */
> + spin_lock_irqsave(&dp->g_lock, flags);
> + wait_event_lock_irq(dp->g_waitq_flush,
> + is_ioband_device_flushed(dp, wait_completion),
> + dp->g_lock, do_nothing());
> +}
> +
> +static void resume_ioband_device(struct ioband_device *dp)
> +{
> + struct ioband_group *gp;
> +
> + /* go up ioband groups */
> + list_for_each_entry(gp, &dp->g_groups, c_list) {
> + if (group_need_up(gp)) {
> + clear_group_need_up(gp);
> + clear_group_down(gp);
> + }
> + }
> +
> + /* accept incoming bios */
> + wake_up_all(&dp->g_waitq_suspend);
> + clear_device_suspended(dp);
> +}
> +
> +static struct ioband_group *ioband_group_find(struct ioband_group *head, int id)
> +{
> + struct rb_node *node = head->c_group_root.rb_node;
> +
> + while (node) {
> + struct ioband_group *p =
> + container_of(node, struct ioband_group, c_group_node);
> +
> + if (p->c_id == id || id == IOBAND_ID_ANY)
> + return p;
> + node = (id < p->c_id) ? node->rb_left : node->rb_right;
> + }
> + return NULL;
> +}
> +
> +static void ioband_group_add_node(struct rb_root *root, struct ioband_group *gp)
> +{
> + struct rb_node **node = &root->rb_node, *parent = NULL;
> + struct ioband_group *p;
> +
> + while (*node) {
> + p = container_of(*node, struct ioband_group, c_group_node);
> + parent = *node;
> + node = (gp->c_id < p->c_id) ?
> + &(*node)->rb_left : &(*node)->rb_right;
> + }
> +
> + rb_link_node(&gp->c_group_node, parent, node);
> + rb_insert_color(&gp->c_group_node, root);
> +}
> +
> +static int ioband_group_init(struct ioband_group *gp,
> + struct ioband_group *head,
> + struct ioband_device *dp, int id, char *param)
> +{
> + unsigned long flags;
> + int r;
> +
> + INIT_LIST_HEAD(&gp->c_list);
> + bio_list_init(&gp->c_blocked_bios);
> + bio_list_init(&gp->c_prio_bios);
> + gp->c_id = id; /* should be verified */
> + gp->c_blocked = 0;
> + gp->c_prio_blocked = 0;
> + memset(gp->c_stat, 0, sizeof(gp->c_stat));
> + init_waitqueue_head(&gp->c_waitq);
> + gp->c_flags = 0;
> + gp->c_group_root = RB_ROOT;
> + gp->c_banddev = dp;
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> + if (head && ioband_group_find(head, id)) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + DMWARN("ioband_group: id=%d already exists.", id);
> + return -EEXIST;
> + }
> +
> + list_add_tail(&gp->c_list, &dp->g_groups);
> +
> + r = dp->g_group_ctr(gp, param);
> + if (r) {
> + list_del(&gp->c_list);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return r;
> + }
> +
> + if (head) {
> + ioband_group_add_node(&head->c_group_root, gp);
> + gp->c_dev = head->c_dev;
> + gp->c_target = head->c_target;
> + }
> +
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> +
> + return 0;
> +}
> +
> +static void ioband_group_release(struct ioband_group *head,
> + struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + list_del(&gp->c_list);
> + if (head)
> + rb_erase(&gp->c_group_node, &head->c_group_root);
> + dp->g_group_dtr(gp);
> + kfree(gp);
> +}
> +
> +static void ioband_group_destroy_all(struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + struct ioband_group *p;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> + while ((p = ioband_group_find(gp, IOBAND_ID_ANY)))
> + ioband_group_release(gp, p);
> + ioband_group_release(NULL, gp);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> +}
> +
> +static void ioband_group_stop_all(struct ioband_group *head, int suspend)
> +{
> + struct ioband_device *dp = head->c_banddev;
> + struct ioband_group *p;
> + struct rb_node *node;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> + for (node = rb_first(&head->c_group_root); node; node = rb_next(node)) {
> + p = rb_entry(node, struct ioband_group, c_group_node);
> + set_group_down(p);
> + if (suspend)
> + set_group_suspended(p);
> + }
> + set_group_down(head);
> + if (suspend)
> + set_group_suspended(head);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + queue_delayed_work(dp->g_ioband_wq, &dp->g_conductor, 0);
> + flush_workqueue(dp->g_ioband_wq);
> +}
> +
> +static void ioband_group_resume_all(struct ioband_group *head)
> +{
> + struct ioband_device *dp = head->c_banddev;
> + struct ioband_group *p;
> + struct rb_node *node;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> + for (node = rb_first(&head->c_group_root); node; node = rb_next(node)) {
> + p = rb_entry(node, struct ioband_group, c_group_node);
> + clear_group_down(p);
> + clear_group_suspended(p);
> + }
> + clear_group_down(head);
> + clear_group_suspended(head);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> +}
> +
> +static int split_string(char *s, long *id, char **v)
> +{
> + char *p, *q;
> + int r = 0;
> +
> + *id = IOBAND_ID_ANY;
> + p = strsep(&s, POLICY_PARAM_DELIM);
> + q = strsep(&s, POLICY_PARAM_DELIM);
> + if (!q) {
> + *v = p;
> + } else {
> + r = strict_strtol(p, 0, id);
> + *v = q;
> + }
> + return r;
> +}
> +
> +/*
> + * Create a new band device:
> + * parameters: <device> <device-group-id> <io_throttle> <io_limit>
> + * <type> <policy> <policy-param...> <group-id:group-param...>
> + */
> +static int ioband_ctr(struct dm_target *ti, unsigned argc, char **argv)
> +{
> + struct ioband_group *gp;
> + struct ioband_device *dp;
> + struct dm_dev *dev;
> + int io_throttle;
> + int io_limit;
> + int i, r, start;
> + long val, id;
> + char *param, *s;
> +
> + if (argc < POLICY_PARAM_START) {
> + ti->error = "Requires " __stringify(POLICY_PARAM_START)
> + " or more arguments";
> + return -EINVAL;
> + }
> +
> + if (strlen(argv[1]) > IOBAND_NAME_MAX) {
> + ti->error = "Ioband device name is too long";
> + return -EINVAL;
> + }
> +
> + r = strict_strtol(argv[2], 0, &val);
> + if (r || val < 0 || val > SHORT_MAX) {
> + ti->error = "Invalid io_throttle";
> + return -EINVAL;
> + }
> + io_throttle = (val == 0) ? DEFAULT_IO_THROTTLE : val;
> +
> + r = strict_strtol(argv[3], 0, &val);
> + if (r || val < 0 || val > SHORT_MAX) {
> + ti->error = "Invalid io_limit";
> + return -EINVAL;
> + }
> + io_limit = val;
> +
> + r = dm_get_device(ti, argv[0], 0, ti->len,
> + dm_table_get_mode(ti->table), &dev);
> + if (r) {
> + ti->error = "Device lookup failed";
> + return r;
> + }
> +
> + if (io_limit == 0) {
> + struct request_queue *q;
> +
> + q = bdev_get_queue(dev->bdev);
> + if (!q) {
> + ti->error = "Can't get queue size";
> + r = -ENXIO;
> + goto release_dm_device;
> + }
> + io_limit = q->nr_requests;
> + }
> +
> + if (io_limit < io_throttle)
> + io_limit = io_throttle;
> +
> + dp = alloc_ioband_device(argv[1], io_throttle, io_limit);
> + if (!dp) {
> + ti->error = "Cannot create ioband device";
> + r = -EINVAL;
> + goto release_dm_device;
> + }
> +
> + mutex_lock(&dp->g_lock_device);
> + r = policy_init(dp, argv[POLICY_PARAM_START - 1],
> + argc - POLICY_PARAM_START, &argv[POLICY_PARAM_START]);
> + if (r) {
> + ti->error = "Invalid policy parameter";
> + goto release_ioband_device;
> + }
> +
> + gp = kzalloc(sizeof(struct ioband_group), GFP_KERNEL);
> + if (!gp) {
> + ti->error = "Cannot allocate memory for ioband group";
> + r = -ENOMEM;
> + goto release_ioband_device;
> + }
> +
> + ti->private = gp;
> + gp->c_target = ti;
> + gp->c_dev = dev;
> +
> + /* Find a default group parameter */
> + for (start = POLICY_PARAM_START; start < argc; start++) {
> + s = strpbrk(argv[start], POLICY_PARAM_DELIM);
> + if (s == argv[start])
> + break;
> + }
> + param = (start < argc) ? &argv[start][1] : NULL;
> +
> + /* Create a default ioband group */
> + r = ioband_group_init(gp, NULL, dp, IOBAND_ID_ANY, param);
> + if (r) {
> + kfree(gp);
> + ti->error = "Cannot create default ioband group";
> + goto release_ioband_device;
> + }
> +
> + r = ioband_group_type_select(gp, argv[4]);
> + if (r) {
> + ti->error = "Cannot set ioband group type";
> + goto release_ioband_group;
> + }
> +
> + /* Create sub ioband groups */
> + for (i = start + 1; i < argc; i++) {
> + r = split_string(argv[i], &id, ¶m);
> + if (r) {
> + ti->error = "Invalid ioband group parameter";
> + goto release_ioband_group;
> + }
> + r = ioband_group_attach(gp, id, param);
> + if (r) {
> + ti->error = "Cannot create ioband group";
> + goto release_ioband_group;
> + }
> + }
> + mutex_unlock(&dp->g_lock_device);
> + return 0;
> +
> +release_ioband_group:
> + ioband_group_destroy_all(gp);
> +release_ioband_device:
> + mutex_unlock(&dp->g_lock_device);
> + release_ioband_device(dp);
> +release_dm_device:
> + dm_put_device(ti, dev);
> + return r;
> +}
> +
> +static void ioband_dtr(struct dm_target *ti)
> +{
> + struct ioband_group *gp = ti->private;
> + struct ioband_device *dp = gp->c_banddev;
> +
> + mutex_lock(&dp->g_lock_device);
> + ioband_group_stop_all(gp, 0);
> + cancel_delayed_work_sync(&dp->g_conductor);
> + dm_put_device(ti, gp->c_dev);
> + ioband_group_destroy_all(gp);
> + mutex_unlock(&dp->g_lock_device);
> + release_ioband_device(dp);
> +}
> +
> +static void ioband_hold_bio(struct ioband_group *gp, struct bio *bio)
> +{
> + /* Todo: The list should be split into a read list and a write list */
> + bio_list_add(&gp->c_blocked_bios, bio);
> +}
> +
> +static struct bio *ioband_pop_bio(struct ioband_group *gp)
> +{
> + return bio_list_pop(&gp->c_blocked_bios);
> +}
> +
> +static int is_urgent_bio(struct bio *bio)
> +{
> + struct page *page = bio_iovec_idx(bio, 0)->bv_page;
> + /*
> + * ToDo: A new flag should be added to struct bio, which indicates
> + * it contains urgent I/O requests.
> + */
> + if (!PageReclaim(page))
> + return 0;
> + if (PageSwapCache(page))
> + return 2;
> + return 1;
> +}
> +
> +static inline int device_should_block(struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + if (is_group_down(gp))
> + return 0;
> + if (is_device_blocked(dp))
> + return 1;
> + if (dp->g_blocked >= dp->g_io_limit[READ] + dp->g_io_limit[WRITE]) {
> + set_device_blocked(dp);
> + return 1;
> + }
> + return 0;
> +}
> +
> +static inline int group_should_block(struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + if (is_group_down(gp))
> + return 0;
> + if (is_group_blocked(gp))
> + return 1;
> + if (dp->g_should_block(gp)) {
> + set_group_blocked(gp);
> + return 1;
> + }
> + return 0;
> +}
> +
> +static void prevent_burst_bios(struct ioband_group *gp, struct bio *bio)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + if (current->flags & PF_KTHREAD || is_urgent_bio(bio)) {
> + /*
> + * Kernel threads shouldn't be blocked easily since each of
> + * them may handle BIOs for several groups on several
> + * partitions.
> + */
> + wait_event_lock_irq(dp->g_waitq, !device_should_block(gp),
> + dp->g_lock, do_nothing());
> + } else {
> + wait_event_lock_irq(gp->c_waitq, !group_should_block(gp),
> + dp->g_lock, do_nothing());
> + }
> +}
> +
> +static inline int should_pushback_bio(struct ioband_group *gp)
> +{
> + return is_group_suspended(gp) && dm_noflush_suspending(gp->c_target);
> +}
> +
> +static inline int prepare_to_issue(struct ioband_group *gp, struct bio *bio)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + dp->g_issued[bio_data_dir(bio)]++;
> + return dp->g_prepare_bio(gp, bio, 0);
> +}
> +
> +static inline int room_for_bio(struct ioband_device *dp)
> +{
> + return dp->g_issued[READ] < dp->g_io_limit[READ]
> + || dp->g_issued[WRITE] < dp->g_io_limit[WRITE];
> +}
> +
> +static void hold_bio(struct ioband_group *gp, struct bio *bio)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + dp->g_blocked++;
> + if (is_urgent_bio(bio)) {
> + /*
> + * ToDo:
> + * When barrier mode is supported, write bios sharing the same
> + * file system with the currnt one would be all moved
> + * to g_urgent_bios list.
> + * You don't have to care about barrier handling if the bio
> + * is for swapping.
> + */
> + dp->g_prepare_bio(gp, bio, IOBAND_URGENT);
> + bio_list_add(&dp->g_urgent_bios, bio);
> + } else {
> + gp->c_blocked++;
> + dp->g_hold_bio(gp, bio);
> + }
> +}
> +
> +static inline int room_for_bio_rw(struct ioband_device *dp, int direct)
> +{
> + return dp->g_issued[direct] < dp->g_io_limit[direct];
> +}
> +
> +static void push_prio_bio(struct ioband_group *gp, struct bio *bio, int direct)
> +{
> + if (bio_list_empty(&gp->c_prio_bios))
> + set_prio_queue(gp, direct);
> + bio_list_add(&gp->c_prio_bios, bio);
> + gp->c_prio_blocked++;
> +}
> +
> +static struct bio *pop_prio_bio(struct ioband_group *gp)
> +{
> + struct bio *bio = bio_list_pop(&gp->c_prio_bios);
> +
> + if (bio_list_empty(&gp->c_prio_bios))
> + clear_prio_queue(gp);
> +
> + if (bio)
> + gp->c_prio_blocked--;
> + return bio;
> +}
> +
> +static int make_issue_list(struct ioband_group *gp, struct bio *bio,
> + struct bio_list *issue_list,
> + struct bio_list *pushback_list)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + dp->g_blocked--;
> + gp->c_blocked--;
> + if (!gp->c_blocked && is_group_blocked(gp)) {
> + clear_group_blocked(gp);
> + wake_up_all(&gp->c_waitq);
> + }
> + if (should_pushback_bio(gp))
> + bio_list_add(pushback_list, bio);
> + else {
> + int rw = bio_data_dir(bio);
> +
> + gp->c_stat[rw].deferred++;
> + gp->c_stat[rw].sectors += bio_sectors(bio);
> + bio_list_add(issue_list, bio);
> + }
> + return prepare_to_issue(gp, bio);
> +}
> +
> +static void release_urgent_bios(struct ioband_device *dp,
> + struct bio_list *issue_list,
> + struct bio_list *pushback_list)
> +{
> + struct bio *bio;
> +
> + if (bio_list_empty(&dp->g_urgent_bios))
> + return;
> + while (room_for_bio_rw(dp, WRITE)) {
> + bio = bio_list_pop(&dp->g_urgent_bios);
> + if (!bio)
> + return;
> + dp->g_blocked--;
> + dp->g_issued[bio_data_dir(bio)]++;
> + bio_list_add(issue_list, bio);
> + }
> +}
> +
> +static int release_prio_bios(struct ioband_group *gp,
> + struct bio_list *issue_list,
> + struct bio_list *pushback_list)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + struct bio *bio;
> + int direct;
> + int ret;
> +
> + if (bio_list_empty(&gp->c_prio_bios))
> + return R_OK;
> + direct = prio_queue_direct(gp);
> + while (gp->c_prio_blocked) {
> + if (!dp->g_can_submit(gp))
> + return R_BLOCK;
> + if (!room_for_bio_rw(dp, direct))
> + return R_OK;
> + bio = pop_prio_bio(gp);
> + if (!bio)
> + return R_OK;
> + ret = make_issue_list(gp, bio, issue_list, pushback_list);
> + if (ret)
> + return ret;
> + }
> + return R_OK;
> +}
> +
> +static int release_norm_bios(struct ioband_group *gp,
> + struct bio_list *issue_list,
> + struct bio_list *pushback_list)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + struct bio *bio;
> + int direct;
> + int ret;
> +
> + while (gp->c_blocked - gp->c_prio_blocked) {
> + if (!dp->g_can_submit(gp))
> + return R_BLOCK;
> + if (!room_for_bio(dp))
> + return R_OK;
> + bio = dp->g_pop_bio(gp);
> + if (!bio)
> + return R_OK;
> +
> + direct = bio_data_dir(bio);
> + if (!room_for_bio_rw(dp, direct)) {
> + push_prio_bio(gp, bio, direct);
> + continue;
> + }
> + ret = make_issue_list(gp, bio, issue_list, pushback_list);
> + if (ret)
> + return ret;
> + }
> + return R_OK;
> +}
> +
> +static inline int release_bios(struct ioband_group *gp,
> + struct bio_list *issue_list,
> + struct bio_list *pushback_list)
> +{
> + int ret = release_prio_bios(gp, issue_list, pushback_list);
> + if (ret)
> + return ret;
> + return release_norm_bios(gp, issue_list, pushback_list);
> +}
> +
> +static struct ioband_group *ioband_group_get(struct ioband_group *head,
> + struct bio *bio)
> +{
> + struct ioband_group *gp;
> +
> + if (!head->c_type->t_getid)
> + return head;
> +
> + gp = ioband_group_find(head, head->c_type->t_getid(bio));
> +
> + if (!gp)
> + gp = head;
> + return gp;
> +}
> +
> +/*
> + * Start to control the bandwidth once the number of uncompleted BIOs
> + * exceeds the value of "io_throttle".
> + */
> +static int ioband_map(struct dm_target *ti, struct bio *bio,
> + union map_info *map_context)
> +{
> + struct ioband_group *gp = ti->private;
> + struct ioband_device *dp = gp->c_banddev;
> + unsigned long flags;
> + int direct;
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> +
> + /*
> + * The device is suspended while some of the ioband device
> + * configurations are being changed.
> + */
> + if (is_device_suspended(dp))
> + wait_event_lock_irq(dp->g_waitq_suspend,
> + !is_device_suspended(dp), dp->g_lock,
> + do_nothing());
> +
> + gp = ioband_group_get(gp, bio);
> + prevent_burst_bios(gp, bio);
> + if (should_pushback_bio(gp)) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return DM_MAPIO_REQUEUE;
> + }
> +
> + bio->bi_bdev = gp->c_dev->bdev;
> + bio->bi_sector -= ti->begin;
> + direct = bio_data_dir(bio);
> +
> + if (!gp->c_blocked && room_for_bio_rw(dp, direct)) {
> + if (dp->g_can_submit(gp)) {
> + prepare_to_issue(gp, bio);
> + gp->c_stat[direct].immediate++;
> + gp->c_stat[direct].sectors += bio_sectors(bio);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return DM_MAPIO_REMAPPED;
> + } else if (!dp->g_blocked &&
> + dp->g_issued[READ] + dp->g_issued[WRITE] == 0) {
> + DMDEBUG("%s: token expired gp:%p", __func__, gp);
> + queue_delayed_work(dp->g_ioband_wq,
> + &dp->g_conductor, 1);
> + }
> + }
> + hold_bio(gp, bio);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> +
> + return DM_MAPIO_SUBMITTED;
> +}
> +
> +/*
> + * Select the best group to resubmit its BIOs.
> + */
> +static struct ioband_group *choose_best_group(struct ioband_device *dp)
> +{
> + struct ioband_group *gp;
> + struct ioband_group *best = NULL;
> + int highest = 0;
> + int pri;
> +
> + /* Todo: The algorithm should be optimized.
> + * It would be better to use rbtree.
> + */
> + list_for_each_entry(gp, &dp->g_groups, c_list) {
> + if (!gp->c_blocked || !room_for_bio(dp))
> + continue;
> + if (gp->c_blocked == gp->c_prio_blocked &&
> + !room_for_bio_rw(dp, prio_queue_direct(gp))) {
> + continue;
> + }
> + pri = dp->g_can_submit(gp);
> + if (pri > highest) {
> + highest = pri;
> + best = gp;
> + }
> + }
> +
> + return best;
> +}
> +
> +/*
> + * This function is called right after it becomes able to resubmit BIOs.
> + * It selects the best BIOs and passes them to the underlying layer.
> + */
> +static void ioband_conduct(struct work_struct *work)
> +{
> + struct ioband_device *dp =
> + container_of(work, struct ioband_device, g_conductor.work);
> + struct ioband_group *gp = NULL;
> + struct bio *bio;
> + unsigned long flags;
> + struct bio_list issue_list, pushback_list;
> +
> + bio_list_init(&issue_list);
> + bio_list_init(&pushback_list);
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> + release_urgent_bios(dp, &issue_list, &pushback_list);
> + if (dp->g_blocked) {
> + gp = choose_best_group(dp);
> + if (gp &&
> + release_bios(gp, &issue_list, &pushback_list) == R_YIELD)
> + queue_delayed_work(dp->g_ioband_wq,
> + &dp->g_conductor, 0);
> + }
> +
> + if (is_device_blocked(dp) &&
> + dp->g_blocked < dp->g_io_limit[READ] + dp->g_io_limit[WRITE]) {
> + clear_device_blocked(dp);
> + wake_up_all(&dp->g_waitq);
> + }
> +
> + if (dp->g_blocked &&
> + room_for_bio_rw(dp, READ) && room_for_bio_rw(dp, WRITE) &&
> + bio_list_empty(&issue_list) && bio_list_empty(&pushback_list) &&
> + dp->g_restart_bios(dp)) {
> + DMDEBUG("%s: token expired dp:%p issued(%d,%d) g_blocked(%d)",
> + __func__, dp, dp->g_issued[READ], dp->g_issued[WRITE],
> + dp->g_blocked);
> + queue_delayed_work(dp->g_ioband_wq, &dp->g_conductor, 0);
> + }
> +
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> +
> + while ((bio = bio_list_pop(&issue_list)))
> + generic_make_request(bio);
> + while ((bio = bio_list_pop(&pushback_list)))
> + bio_endio(bio, -EIO);
> +}
> +
> +static int ioband_end_io(struct dm_target *ti, struct bio *bio,
> + int error, union map_info *map_context)
> +{
> + struct ioband_group *gp = ti->private;
> + struct ioband_device *dp = gp->c_banddev;
> + unsigned long flags;
> + int r = error;
> +
> + /*
> + * XXX: A new error code for device mapper devices should be used
> + * rather than EIO.
> + */
> + if (error == -EIO && should_pushback_bio(gp)) {
> + /* This ioband device is suspending */
> + r = DM_ENDIO_REQUEUE;
> + }
> + /*
> + * Todo: The algorithm should be optimized to eliminate the spinlock.
> + */
> + spin_lock_irqsave(&dp->g_lock, flags);
> + dp->g_issued[bio_data_dir(bio)]--;
> +
> + /*
> + * Todo: It would be better to introduce high/low water marks here
> + * not to kick the workqueues so often.
> + */
> + if (dp->g_blocked)
> + queue_delayed_work(dp->g_ioband_wq, &dp->g_conductor, 0);
> + else if (is_device_suspended(dp) &&
> + dp->g_issued[READ] + dp->g_issued[WRITE] == 0)
> + wake_up_all(&dp->g_waitq_flush);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return r;
> +}
> +
> +static void ioband_presuspend(struct dm_target *ti)
> +{
> + struct ioband_group *gp = ti->private;
> + struct ioband_device *dp = gp->c_banddev;
> +
> + mutex_lock(&dp->g_lock_device);
> + ioband_group_stop_all(gp, 1);
> + mutex_unlock(&dp->g_lock_device);
> +}
> +
> +static void ioband_resume(struct dm_target *ti)
> +{
> + struct ioband_group *gp = ti->private;
> + struct ioband_device *dp = gp->c_banddev;
> +
> + mutex_lock(&dp->g_lock_device);
> + ioband_group_resume_all(gp);
> + mutex_unlock(&dp->g_lock_device);
> +}
> +
> +static void ioband_group_status(struct ioband_group *gp, int *szp,
> + char *result, unsigned maxlen)
> +{
> + struct ioband_group_stat *stat;
> + int i, sz = *szp; /* used in DMEMIT() */
> +
> + DMEMIT(" %d", gp->c_id);
> + for (i = 0; i < 2; i++) {
> + stat = &gp->c_stat[i];
> + DMEMIT(" %lu %lu %lu",
> + stat->immediate + stat->deferred, stat->deferred,
> + stat->sectors);
> + }
> + *szp = sz;
> +}
> +
> +static int ioband_status(struct dm_target *ti, status_type_t type,
> + char *result, unsigned maxlen)
> +{
> + struct ioband_group *gp = ti->private, *p;
> + struct ioband_device *dp = gp->c_banddev;
> + struct rb_node *node;
> + int sz = 0; /* used in DMEMIT() */
> + unsigned long flags;
> +
> + mutex_lock(&dp->g_lock_device);
> +
> + switch (type) {
> + case STATUSTYPE_INFO:
> + spin_lock_irqsave(&dp->g_lock, flags);
> + DMEMIT("%s", dp->g_name);
> + ioband_group_status(gp, &sz, result, maxlen);
> + for (node = rb_first(&gp->c_group_root); node;
> + node = rb_next(node)) {
> + p = rb_entry(node, struct ioband_group, c_group_node);
> + ioband_group_status(p, &sz, result, maxlen);
> + }
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + break;
> +
> + case STATUSTYPE_TABLE:
> + spin_lock_irqsave(&dp->g_lock, flags);
> + DMEMIT("%s %s %d %d %s %s",
> + gp->c_dev->name, dp->g_name,
> + dp->g_io_throttle, dp->g_io_limit[READ],
> + gp->c_type->t_name, dp->g_policy->p_name);
> + dp->g_show(gp, &sz, result, maxlen);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + break;
> + }
> +
> + mutex_unlock(&dp->g_lock_device);
> + return 0;
> +}
> +
> +static int ioband_group_type_select(struct ioband_group *gp, char *name)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + struct group_type *t;
> + unsigned long flags;
> +
> + for (t = dm_ioband_group_type; (t->t_name); t++) {
> + if (!strcmp(name, t->t_name))
> + break;
> + }
> + if (!t->t_name) {
> + DMWARN("ioband type select: %s isn't supported.", name);
> + return -EINVAL;
> + }
> + spin_lock_irqsave(&dp->g_lock, flags);
> + if (!RB_EMPTY_ROOT(&gp->c_group_root)) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return -EBUSY;
> + }
> + gp->c_type = t;
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> +
> + return 0;
> +}
> +
> +static int ioband_set_param(struct ioband_group *gp, char *cmd, char *value)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + char *val_str;
> + long id;
> + unsigned long flags;
> + int r;
> +
> + r = split_string(value, &id, &val_str);
> + if (r)
> + return r;
> +
> + spin_lock_irqsave(&dp->g_lock, flags);
> + if (id != IOBAND_ID_ANY) {
> + gp = ioband_group_find(gp, id);
> + if (!gp) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + DMWARN("ioband_set_param: id=%ld not found.", id);
> + return -EINVAL;
> + }
> + }
> + r = dp->g_set_param(gp, cmd, val_str);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return r;
> +}
> +
> +static int ioband_group_attach(struct ioband_group *gp, int id, char *param)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + struct ioband_group *sub_gp;
> + int r;
> +
> + if (id < 0) {
> + DMWARN("ioband_group_attach: invalid id:%d", id);
> + return -EINVAL;
> + }
> + if (!gp->c_type->t_getid) {
> + DMWARN("ioband_group_attach: "
> + "no ioband group type is specified");
> + return -EINVAL;
> + }
> +
> + sub_gp = kzalloc(sizeof(struct ioband_group), GFP_KERNEL);
> + if (!sub_gp)
> + return -ENOMEM;
> +
> + r = ioband_group_init(sub_gp, gp, dp, id, param);
> + if (r < 0) {
> + kfree(sub_gp);
> + return r;
> + }
> + return 0;
> +}
> +
> +static int ioband_group_detach(struct ioband_group *gp, int id)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + struct ioband_group *sub_gp;
> + unsigned long flags;
> +
> + if (id < 0) {
> + DMWARN("ioband_group_detach: invalid id:%d", id);
> + return -EINVAL;
> + }
> + spin_lock_irqsave(&dp->g_lock, flags);
> + sub_gp = ioband_group_find(gp, id);
> + if (!sub_gp) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + DMWARN("ioband_group_detach: invalid id:%d", id);
> + return -EINVAL;
> + }
> +
> + /*
> + * Todo: Calling suspend_ioband_device() before releasing the
> + * ioband group has a large overhead. Need improvement.
> + */
> + suspend_ioband_device(dp, flags, 0);
> + ioband_group_release(gp, sub_gp);
> + resume_ioband_device(dp);
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return 0;
> +}
> +
> +/*
> + * Message parameters:
> + * "policy" <name>
> + * ex)
> + * "policy" "weight"
> + * "type" "none"|"pid"|"pgrp"|"node"|"cpuset"|"cgroup"|"user"|"gid"
> + * "io_throttle" <value>
> + * "io_limit" <value>
> + * "attach" <group id>
> + * "detach" <group id>
> + * "any-command" <group id>:<value>
> + * ex)
> + * "weight" 0:<value>
> + * "token" 24:<value>
> + */
> +static int __ioband_message(struct dm_target *ti, unsigned argc, char **argv)
> +{
> + struct ioband_group *gp = ti->private, *p;
> + struct ioband_device *dp = gp->c_banddev;
> + struct rb_node *node;
> + long val;
> + int r = 0;
> + unsigned long flags;
> +
> + if (argc == 1 && !strcmp(argv[0], "reset")) {
> + spin_lock_irqsave(&dp->g_lock, flags);
> + memset(gp->c_stat, 0, sizeof(gp->c_stat));
> + for (node = rb_first(&gp->c_group_root); node;
> + node = rb_next(node)) {
> + p = rb_entry(node, struct ioband_group, c_group_node);
> + memset(p->c_stat, 0, sizeof(p->c_stat));
> + }
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return 0;
> + }
> +
> + if (argc != 2) {
> + DMWARN("Unrecognised band message received.");
> + return -EINVAL;
> + }
> + if (!strcmp(argv[0], "io_throttle")) {
> + r = strict_strtol(argv[1], 0, &val);
> + if (r || val < 0 || val > SHORT_MAX)
> + return -EINVAL;
> + if (val == 0)
> + val = DEFAULT_IO_THROTTLE;
> + spin_lock_irqsave(&dp->g_lock, flags);
> + if (val > dp->g_io_limit[READ] ||
> + val > dp->g_io_limit[WRITE]) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return -EINVAL;
> + }
> + dp->g_io_throttle = val;
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + ioband_set_param(gp, argv[0], argv[1]);
> + return 0;
> + } else if (!strcmp(argv[0], "io_limit")) {
> + r = strict_strtol(argv[1], 0, &val);
> + if (r || val < 0 || val > SHORT_MAX)
> + return -EINVAL;
> + spin_lock_irqsave(&dp->g_lock, flags);
> + if (val == 0) {
> + struct request_queue *q;
> +
> + q = bdev_get_queue(gp->c_dev->bdev);
> + if (!q) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return -ENXIO;
> + }
> + val = q->nr_requests;
> + }
> + if (val < dp->g_io_throttle) {
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + return -EINVAL;
> + }
> + dp->g_io_limit[READ] = dp->g_io_limit[WRITE] = val;
> + spin_unlock_irqrestore(&dp->g_lock, flags);
> + ioband_set_param(gp, argv[0], argv[1]);
> + return 0;
> + } else if (!strcmp(argv[0], "type")) {
> + return ioband_group_type_select(gp, argv[1]);
> + } else if (!strcmp(argv[0], "attach")) {
> + r = strict_strtol(argv[1], 0, &val);
> + if (r)
> + return r;
> + return ioband_group_attach(gp, val, NULL);
> + } else if (!strcmp(argv[0], "detach")) {
> + r = strict_strtol(argv[1], 0, &val);
> + if (r)
> + return r;
> + return ioband_group_detach(gp, val);
> + } else if (!strcmp(argv[0], "policy")) {
> + r = policy_init(dp, argv[1], 0, &argv[2]);
> + return r;
> + } else {
> + /* message anycommand <group-id>:<value> */
> + r = ioband_set_param(gp, argv[0], argv[1]);
> + if (r < 0)
> + DMWARN("Unrecognised band message received.");
> + return r;
> + }
> + return 0;
> +}
> +
> +static int ioband_message(struct dm_target *ti, unsigned argc, char **argv)
> +{
> + struct ioband_group *gp = ti->private;
> + struct ioband_device *dp = gp->c_banddev;
> + int r;
> +
> + mutex_lock(&dp->g_lock_device);
> + r = __ioband_message(ti, argc, argv);
> + mutex_unlock(&dp->g_lock_device);
> + return r;
> +}
> +
> +static int ioband_merge(struct dm_target *ti, struct bvec_merge_data *bvm,
> + struct bio_vec *biovec, int max_size)
> +{
> + struct ioband_group *gp = ti->private;
> + struct request_queue *q = bdev_get_queue(gp->c_dev->bdev);
> +
> + if (!q->merge_bvec_fn)
> + return max_size;
> +
> + bvm->bi_bdev = gp->c_dev->bdev;
> + bvm->bi_sector -= ti->begin;
> +
> + return min(max_size, q->merge_bvec_fn(q, bvm, biovec));
> +}
> +
> +static struct target_type ioband_target = {
> + .name = "ioband",
> + .module = THIS_MODULE,
> + .version = {1, 10, 2},
> + .ctr = ioband_ctr,
> + .dtr = ioband_dtr,
> + .map = ioband_map,
> + .end_io = ioband_end_io,
> + .presuspend = ioband_presuspend,
> + .resume = ioband_resume,
> + .status = ioband_status,
> + .message = ioband_message,
> + .merge = ioband_merge,
> +};
> +
> +static int __init dm_ioband_init(void)
> +{
> + int r;
> +
> + r = dm_register_target(&ioband_target);
> + if (r < 0) {
> + DMERR("register failed %d", r);
> + return r;
> + }
> + return r;
> +}
> +
> +static void __exit dm_ioband_exit(void)
> +{
> + dm_unregister_target(&ioband_target);
> +}
> +
> +module_init(dm_ioband_init);
> +module_exit(dm_ioband_exit);
> +
> +MODULE_DESCRIPTION(DM_NAME " I/O bandwidth control");
> +MODULE_AUTHOR("Hirokazu Takahashi <taka at valinux.co.jp>, "
> + "Ryo Tsuruta <ryov at valinux.co.jp");
> +MODULE_LICENSE("GPL");
> Index: linux-2.6.29/drivers/md/dm-ioband-policy.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.29/drivers/md/dm-ioband-policy.c
> @@ -0,0 +1,457 @@
> +/*
> + * Copyright (C) 2008 VA Linux Systems Japan K.K.
> + *
> + * I/O bandwidth control
> + *
> + * This file is released under the GPL.
> + */
> +#include <linux/bio.h>
> +#include <linux/workqueue.h>
> +#include <linux/rbtree.h>
> +#include "dm.h"
> +#include "dm-bio-list.h"
> +#include "dm-ioband.h"
> +
> +/*
> + * The following functions determine when and which BIOs should
> + * be submitted to control the I/O flow.
> + * It is possible to add a new BIO scheduling policy with it.
> + */
> +
> +/*
> + * Functions for weight balancing policy based on the number of I/Os.
> + */
> +#define DEFAULT_WEIGHT 100
> +#define DEFAULT_TOKENPOOL 2048
> +#define DEFAULT_BUCKET 2
> +#define IOBAND_IOPRIO_BASE 100
> +#define TOKEN_BATCH_UNIT 20
> +#define PROCEED_THRESHOLD 8
> +#define LOCAL_ACTIVE_RATIO 8
> +#define GLOBAL_ACTIVE_RATIO 16
> +#define OVERCOMMIT_RATE 4
> +
> +/*
> + * Calculate the effective number of tokens this group has.
> + */
> +static int get_token(struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + int token = gp->c_token;
> + int allowance = dp->g_epoch - gp->c_my_epoch;
> +
> + if (allowance) {
> + if (allowance > dp->g_carryover)
> + allowance = dp->g_carryover;
> + token += gp->c_token_initial * allowance;
> + }
> + if (is_group_down(gp))
> + token += gp->c_token_initial * dp->g_carryover * 2;
> +
> + return token;
> +}
> +
> +/*
> + * Calculate the priority of a given group.
> + */
> +static int iopriority(struct ioband_group *gp)
> +{
> + return get_token(gp) * IOBAND_IOPRIO_BASE / gp->c_token_initial + 1;
> +}
> +
> +/*
> + * This function is called when all the active group on the same ioband
> + * device has used up their tokens. It makes a new global epoch so that
> + * all groups on this device will get freshly assigned tokens.
> + */
> +static int make_global_epoch(struct ioband_device *dp)
> +{
> + struct ioband_group *gp = dp->g_dominant;
> +
> + /*
> + * Don't make a new epoch if the dominant group still has a lot of
> + * tokens, except when the I/O load is low.
> + */
> + if (gp) {
> + int iopri = iopriority(gp);
> + if (iopri * PROCEED_THRESHOLD > IOBAND_IOPRIO_BASE &&
> + dp->g_issued[READ] + dp->g_issued[WRITE] >=
> + dp->g_io_throttle)
> + return 0;
> + }
> +
> + dp->g_epoch++;
> + DMDEBUG("make_epoch %d", dp->g_epoch);
> +
> + /* The leftover tokens will be used in the next epoch. */
> + dp->g_token_extra = dp->g_token_left;
> + if (dp->g_token_extra < 0)
> + dp->g_token_extra = 0;
> + dp->g_token_left = dp->g_token_bucket;
> +
> + dp->g_expired = NULL;
> + dp->g_dominant = NULL;
> +
> + return 1;
> +}
> +
> +/*
> + * This function is called when this group has used up its own tokens.
> + * It will check whether it's possible to make a new epoch of this group.
> + */
> +static inline int make_epoch(struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + int allowance = dp->g_epoch - gp->c_my_epoch;
> +
> + if (!allowance)
> + return 0;
> + if (allowance > dp->g_carryover)
> + allowance = dp->g_carryover;
> + gp->c_my_epoch = dp->g_epoch;
> + return allowance;
> +}
> +
> +/*
> + * Check whether this group has tokens to issue an I/O. Return 0 if it
> + * doesn't have any, otherwise return the priority of this group.
> + */
> +static int is_token_left(struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + int allowance;
> + int delta;
> + int extra;
> +
> + if (gp->c_token > 0)
> + return iopriority(gp);
> +
> + if (is_group_down(gp)) {
> + gp->c_token = gp->c_token_initial;
> + return iopriority(gp);
> + }
> + allowance = make_epoch(gp);
> + if (!allowance)
> + return 0;
> + /*
> + * If this group has the right to get tokens for several epochs,
> + * give all of them to the group here.
> + */
> + delta = gp->c_token_initial * allowance;
> + dp->g_token_left -= delta;
> + /*
> + * Give some extra tokens to this group when there have left unused
> + * tokens on this ioband device from the previous epoch.
> + */
> + extra = dp->g_token_extra * gp->c_token_initial /
> + (dp->g_token_bucket - dp->g_token_extra / 2);
> + delta += extra;
> + gp->c_token += delta;
> + gp->c_consumed = 0;
> +
> + if (gp == dp->g_current)
> + dp->g_yield_mark += delta;
> + DMDEBUG("refill token: gp:%p token:%d->%d extra(%d) allowance(%d)",
> + gp, gp->c_token - delta, gp->c_token, extra, allowance);
> + if (gp->c_token > 0)
> + return iopriority(gp);
> + DMDEBUG("refill token: yet empty gp:%p token:%d", gp, gp->c_token);
> + return 0;
> +}
> +
> +/*
> + * Use tokens to issue an I/O. After the operation, the number of tokens left
> + * on this group may become negative value, which will be treated as debt.
> + */
> +static int consume_token(struct ioband_group *gp, int count, int flag)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + if (gp->c_consumed * LOCAL_ACTIVE_RATIO < gp->c_token_initial &&
> + gp->c_consumed * GLOBAL_ACTIVE_RATIO < dp->g_token_bucket) {
> + ; /* Do nothing unless this group is really active. */
> + } else if (!dp->g_dominant ||
> + get_token(gp) > get_token(dp->g_dominant)) {
> + /*
> + * Regard this group as the dominant group on this
> + * ioband device when it has larger number of tokens
> + * than those of the previous one.
> + */
> + dp->g_dominant = gp;
> + }
> + if (dp->g_epoch == gp->c_my_epoch &&
> + gp->c_token > 0 && gp->c_token - count <= 0) {
> + /* Remember the last group which used up its own tokens. */
> + dp->g_expired = gp;
> + if (dp->g_dominant == gp)
> + dp->g_dominant = NULL;
> + }
> +
> + if (gp != dp->g_current) {
> + /* This group is the current already. */
> + dp->g_current = gp;
> + dp->g_yield_mark =
> + gp->c_token - (TOKEN_BATCH_UNIT << dp->g_token_unit);
> + }
> + gp->c_token -= count;
> + gp->c_consumed += count;
> + if (gp->c_token <= dp->g_yield_mark && !(flag & IOBAND_URGENT)) {
> + /*
> + * Return-value 1 means that this policy requests dm-ioband
> + * to give a chance to another group to be selected since
> + * this group has already issued enough amount of I/Os.
> + */
> + dp->g_current = NULL;
> + return R_YIELD;
> + }
> + /*
> + * Return-value 0 means that this policy allows dm-ioband to select
> + * this group to issue I/Os without a break.
> + */
> + return R_OK;
> +}
> +
> +/*
> + * Consume one token on each I/O.
> + */
> +static int prepare_token(struct ioband_group *gp, struct bio *bio, int flag)
> +{
> + return consume_token(gp, 1, flag);
> +}
> +
> +/*
> + * Check if this group is able to receive a new bio.
> + */
> +static int is_queue_full(struct ioband_group *gp)
> +{
> + return gp->c_blocked >= gp->c_limit;
> +}
> +
> +static void set_weight(struct ioband_group *gp, int new)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + struct ioband_group *p;
> +
> + dp->g_weight_total += (new - gp->c_weight);
> + gp->c_weight = new;
> +
> + if (dp->g_weight_total == 0) {
> + list_for_each_entry(p, &dp->g_groups, c_list)
> + p->c_token = p->c_token_initial = p->c_limit = 1;
> + } else {
> + list_for_each_entry(p, &dp->g_groups, c_list) {
> + p->c_token = p->c_token_initial =
> + dp->g_token_bucket * p->c_weight /
> + dp->g_weight_total + 1;
> + p->c_limit = (dp->g_io_limit[READ] +
> + dp->g_io_limit[WRITE]) * p->c_weight /
> + dp->g_weight_total / OVERCOMMIT_RATE + 1;
> + }
> + }
> +}
> +
> +static void init_token_bucket(struct ioband_device *dp,
> + int token_bucket, int carryover)
> +{
> + if (!token_bucket)
> + dp->g_token_bucket =
> + ((dp->g_io_limit[READ] + dp->g_io_limit[WRITE]) *
> + DEFAULT_BUCKET) << dp->g_token_unit;
> + else
> + dp->g_token_bucket = token_bucket;
> + if (!carryover)
> + dp->g_carryover = (DEFAULT_TOKENPOOL << dp->g_token_unit) /
> + dp->g_token_bucket;
> + else
> + dp->g_carryover = carryover;
> + if (dp->g_carryover < 1)
> + dp->g_carryover = 1;
> + dp->g_token_left = 0;
> +}
> +
> +static int policy_weight_param(struct ioband_group *gp, char *cmd, char *value)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + long val;
> + int r = 0, err;
> +
> + err = strict_strtol(value, 0, &val);
> + if (!strcmp(cmd, "weight")) {
> + if (!err && 0 < val && val <= SHORT_MAX)
> + set_weight(gp, val);
> + else
> + r = -EINVAL;
> + } else if (!strcmp(cmd, "token")) {
> + if (!err && 0 <= val && val <= INT_MAX) {
> + init_token_bucket(dp, val, 0);
> + set_weight(gp, gp->c_weight);
> + dp->g_token_extra = 0;
> + } else
> + r = -EINVAL;
> + } else if (!strcmp(cmd, "carryover")) {
> + if (!err && 0 <= val && val <= INT_MAX) {
> + init_token_bucket(dp, dp->g_token_bucket, val);
> + set_weight(gp, gp->c_weight);
> + dp->g_token_extra = 0;
> + } else
> + r = -EINVAL;
> + } else if (!strcmp(cmd, "io_limit")) {
> + init_token_bucket(dp, 0, 0);
> + set_weight(gp, gp->c_weight);
> + } else {
> + r = -EINVAL;
> + }
> + return r;
> +}
> +
> +static int policy_weight_ctr(struct ioband_group *gp, char *arg)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> +
> + if (!arg)
> + arg = __stringify(DEFAULT_WEIGHT);
> + gp->c_my_epoch = dp->g_epoch;
> + gp->c_weight = 0;
> + gp->c_consumed = 0;
> + return policy_weight_param(gp, "weight", arg);
> +}
> +
> +static void policy_weight_dtr(struct ioband_group *gp)
> +{
> + struct ioband_device *dp = gp->c_banddev;
> + set_weight(gp, 0);
> + dp->g_dominant = NULL;
> + dp->g_expired = NULL;
> +}
> +
> +static void policy_weight_show(struct ioband_group *gp, int *szp,
> + char *result, unsigned maxlen)
> +{
> + struct ioband_group *p;
> + struct ioband_device *dp = gp->c_banddev;
> + struct rb_node *node;
> + int sz = *szp; /* used in DMEMIT() */
> +
> + DMEMIT(" %d :%d", dp->g_token_bucket, gp->c_weight);
> +
> + for (node = rb_first(&gp->c_group_root); node; node = rb_next(node)) {
> + p = rb_entry(node, struct ioband_group, c_group_node);
> + DMEMIT(" %d:%d", p->c_id, p->c_weight);
> + }
> + *szp = sz;
> +}
> +
> +/*
> + * <Method> <description>
> + * g_can_submit : To determine whether a given group has the right to
> + * submit BIOs. The larger the return value the higher the
> + * priority to submit. Zero means it has no right.
> + * g_prepare_bio : Called right before submitting each BIO.
> + * g_restart_bios : Called if this ioband device has some BIOs blocked but none
> + * of them can be submitted now. This method has to
> + * reinitialize the data to restart to submit BIOs and return
> + * 0 or 1.
> + * The return value 0 means that it has become able to submit
> + * them now so that this ioband device will continue its work.
> + * The return value 1 means that it is still unable to submit
> + * them so that this device will stop its work. And this
> + * policy module has to reactivate the device when it gets
> + * to be able to submit BIOs.
> + * g_hold_bio : To hold a given BIO until it is submitted.
> + * The default function is used when this method is undefined.
> + * g_pop_bio : To select and get the best BIO to submit.
> + * g_group_ctr : To initalize the policy own members of struct ioband_group.
> + * g_group_dtr : Called when struct ioband_group is removed.
> + * g_set_param : To update the policy own date.
> + * The parameters can be passed through "dmsetup message"
> + * command.
> + * g_should_block : Called every time this ioband device receive a BIO.
> + * Return 1 if a given group can't receive any more BIOs,
> + * otherwise return 0.
> + * g_show : Show the configuration.
> + */
> +static int policy_weight_init(struct ioband_device *dp, int argc, char **argv)
> +{
> + long val;
> + int r = 0;
> +
> + if (argc < 1)
> + val = 0;
> + else {
> + r = strict_strtol(argv[0], 0, &val);
> + if (r || val < 0 || val > INT_MAX)
> + return -EINVAL;
> + }
> +
> + dp->g_can_submit = is_token_left;
> + dp->g_prepare_bio = prepare_token;
> + dp->g_restart_bios = make_global_epoch;
> + dp->g_group_ctr = policy_weight_ctr;
> + dp->g_group_dtr = policy_weight_dtr;
> + dp->g_set_param = policy_weight_param;
> + dp->g_should_block = is_queue_full;
> + dp->g_show = policy_weight_show;
> +
> + dp->g_epoch = 0;
> + dp->g_weight_total = 0;
> + dp->g_current = NULL;
> + dp->g_dominant = NULL;
> + dp->g_expired = NULL;
> + dp->g_token_extra = 0;
> + dp->g_token_unit = 0;
> + init_token_bucket(dp, val, 0);
> + dp->g_token_left = dp->g_token_bucket;
> +
> + return 0;
> +}
> +
> +/* weight balancing policy based on the number of I/Os. --- End --- */
> +
> +/*
> + * Functions for weight balancing policy based on I/O size.
> + * It just borrows a lot of functions from the regular weight balancing policy.
> + */
> +static int w2_prepare_token(struct ioband_group *gp, struct bio *bio, int flag)
> +{
> + /* Consume tokens depending on the size of a given bio. */
> + return consume_token(gp, bio_sectors(bio), flag);
> +}
> +
> +static int w2_policy_weight_init(struct ioband_device *dp,
> + int argc, char **argv)
> +{
> + long val;
> + int r = 0;
> +
> + if (argc < 1)
> + val = 0;
> + else {
> + r = strict_strtol(argv[0], 0, &val);
> + if (r || val < 0 || val > INT_MAX)
> + return -EINVAL;
> + }
> +
> + r = policy_weight_init(dp, argc, argv);
> + if (r < 0)
> + return r;
> +
> + dp->g_prepare_bio = w2_prepare_token;
> + dp->g_token_unit = PAGE_SHIFT - 9;
> + init_token_bucket(dp, val, 0);
> + dp->g_token_left = dp->g_token_bucket;
> + return 0;
> +}
> +
> +/* weight balancing policy based on I/O size. --- End --- */
> +
> +static int policy_default_init(struct ioband_device *dp, int argc, char **argv)
> +{
> + return policy_weight_init(dp, argc, argv);
> +}
> +
> +struct policy_type dm_ioband_policy_type[] = {
> + {"default", policy_default_init},
> + {"weight", policy_weight_init},
> + {"weight-iosize", w2_policy_weight_init},
> + {NULL, policy_default_init}
> +};
> Index: linux-2.6.29/drivers/md/dm-ioband-type.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6.29/drivers/md/dm-ioband-type.c
> @@ -0,0 +1,77 @@
> +/*
> + * Copyright (C) 2008 VA Linux Systems Japan K.K.
> + *
> + * I/O bandwidth control
> + *
> + * This file is released under the GPL.
> + */
> +#include <linux/bio.h>
> +#include "dm.h"
> +#include "dm-bio-list.h"
> +#include "dm-ioband.h"
> +
> +/*
> + * Any I/O bandwidth can be divided into several bandwidth groups, each of which
> + * has its own unique ID. The following functions are called to determine
> + * which group a given BIO belongs to and return the ID of the group.
> + */
> +
> +/* ToDo: unsigned long value would be better for group ID */
> +
> +static int ioband_process_id(struct bio *bio)
> +{
> + /*
> + * This function will work for KVM and Xen.
> + */
> + return (int)current->tgid;
> +}
> +
> +static int ioband_process_group(struct bio *bio)
> +{
> + return (int)task_pgrp_nr(current);
> +}
> +
> +static int ioband_uid(struct bio *bio)
> +{
> + return (int)current_uid();
> +}
> +
> +static int ioband_gid(struct bio *bio)
> +{
> + return (int)current_gid();
> +}
> +
> +static int ioband_cpuset(struct bio *bio)
> +{
> + return 0; /* not implemented yet */
> +}
> +
> +static int ioband_node(struct bio *bio)
> +{
> + return 0; /* not implemented yet */
> +}
> +
> +static int ioband_cgroup(struct bio *bio)
> +{
> + /*
> + * This function should return the ID of the cgroup which
> + * issued "bio". The ID of the cgroup which the current
> + * process belongs to won't be suitable ID for this purpose,
> + * since some BIOs will be handled by kernel threads like aio
> + * or pdflush on behalf of the process requesting the BIOs.
> + */
> + return 0; /* not implemented yet */
> +}
> +
> +struct group_type dm_ioband_group_type[] = {
> + {"none", NULL},
> + {"pgrp", ioband_process_group},
> + {"pid", ioband_process_id},
> + {"node", ioband_node},
> + {"cpuset", ioband_cpuset},
> + {"cgroup", ioband_cgroup},
> + {"user", ioband_uid},
> + {"uid", ioband_uid},
> + {"gid", ioband_gid},
> + {NULL, NULL}
> +};
> Index: linux-2.6.29/drivers/md/dm-ioband.h
> ===================================================================
> --- /dev/null
> +++ linux-2.6.29/drivers/md/dm-ioband.h
> @@ -0,0 +1,186 @@
> +/*
> + * Copyright (C) 2008 VA Linux Systems Japan K.K.
> + *
> + * I/O bandwidth control
> + *
> + * This file is released under the GPL.
> + */
> +
> +#include <linux/version.h>
> +#include <linux/wait.h>
> +
> +#define DM_MSG_PREFIX "ioband"
> +
> +#define DEFAULT_IO_THROTTLE 4
> +#define DEFAULT_IO_LIMIT 128
> +#define IOBAND_NAME_MAX 31
> +#define IOBAND_ID_ANY (-1)
> +
> +struct ioband_group;
> +
> +struct ioband_device {
> + struct list_head g_groups;
> + struct delayed_work g_conductor;
> + struct workqueue_struct *g_ioband_wq;
> + struct bio_list g_urgent_bios;
> + int g_io_throttle;
> + int g_io_limit[2];
> + int g_issued[2];
> + int g_blocked;
> + spinlock_t g_lock;
> + struct mutex g_lock_device;
> + wait_queue_head_t g_waitq;
> + wait_queue_head_t g_waitq_suspend;
> + wait_queue_head_t g_waitq_flush;
> +
> + int g_ref;
> + struct list_head g_list;
> + int g_flags;
> + char g_name[IOBAND_NAME_MAX + 1];
> + struct policy_type *g_policy;
> +
> + /* policy dependent */
> + int (*g_can_submit) (struct ioband_group *);
> + int (*g_prepare_bio) (struct ioband_group *, struct bio *, int);
> + int (*g_restart_bios) (struct ioband_device *);
> + void (*g_hold_bio) (struct ioband_group *, struct bio *);
> + struct bio *(*g_pop_bio) (struct ioband_group *);
> + int (*g_group_ctr) (struct ioband_group *, char *);
> + void (*g_group_dtr) (struct ioband_group *);
> + int (*g_set_param) (struct ioband_group *, char *cmd, char *value);
> + int (*g_should_block) (struct ioband_group *);
> + void (*g_show) (struct ioband_group *, int *, char *, unsigned);
> +
> + /* members for weight balancing policy */
> + int g_epoch;
> + int g_weight_total;
> + /* the number of tokens which can be used in every epoch */
> + int g_token_bucket;
> + /* how many epochs tokens can be carried over */
> + int g_carryover;
> + /* how many tokens should be used for one page-sized I/O */
> + int g_token_unit;
> + /* the last group which used a token */
> + struct ioband_group *g_current;
> + /* give another group a chance to be scheduled when the rest
> + of tokens of the current group reaches this mark */
> + int g_yield_mark;
> + /* the latest group which used up its tokens */
> + struct ioband_group *g_expired;
> + /* the group which has the largest number of tokens in the
> + active groups */
> + struct ioband_group *g_dominant;
> + /* the number of unused tokens in this epoch */
> + int g_token_left;
> + /* left-over tokens from the previous epoch */
> + int g_token_extra;
> +};
> +
> +struct ioband_group_stat {
> + unsigned long sectors;
> + unsigned long immediate;
> + unsigned long deferred;
> +};
> +
> +struct ioband_group {
> + struct list_head c_list;
> + struct ioband_device *c_banddev;
> + struct dm_dev *c_dev;
> + struct dm_target *c_target;
> + struct bio_list c_blocked_bios;
> + struct bio_list c_prio_bios;
> + struct rb_root c_group_root;
> + struct rb_node c_group_node;
> + int c_id; /* should be unsigned long or unsigned long long */
> + char c_name[IOBAND_NAME_MAX + 1]; /* rfu */
> + int c_blocked;
> + int c_prio_blocked;
> + wait_queue_head_t c_waitq;
> + int c_flags;
> + struct ioband_group_stat c_stat[2]; /* hold rd/wr status */
> + struct group_type *c_type;
> +
> + /* members for weight balancing policy */
> + int c_weight;
> + int c_my_epoch;
> + int c_token;
> + int c_token_initial;
> + int c_limit;
> + int c_consumed;
> +
> + /* rfu */
> + /* struct bio_list c_ordered_tag_bios; */
> +};
> +
> +#define IOBAND_URGENT 1
> +
> +#define DEV_BIO_BLOCKED 1
> +#define DEV_SUSPENDED 2
> +
> +#define set_device_blocked(dp) ((dp)->g_flags |= DEV_BIO_BLOCKED)
> +#define clear_device_blocked(dp) ((dp)->g_flags &= ~DEV_BIO_BLOCKED)
> +#define is_device_blocked(dp) ((dp)->g_flags & DEV_BIO_BLOCKED)
> +
> +#define set_device_suspended(dp) ((dp)->g_flags |= DEV_SUSPENDED)
> +#define clear_device_suspended(dp) ((dp)->g_flags &= ~DEV_SUSPENDED)
> +#define is_device_suspended(dp) ((dp)->g_flags & DEV_SUSPENDED)
> +
> +#define IOG_PRIO_BIO_WRITE 1
> +#define IOG_PRIO_QUEUE 2
> +#define IOG_BIO_BLOCKED 4
> +#define IOG_GOING_DOWN 8
> +#define IOG_SUSPENDED 16
> +#define IOG_NEED_UP 32
> +
> +#define R_OK 0
> +#define R_BLOCK 1
> +#define R_YIELD 2
> +
> +#define set_group_blocked(gp) ((gp)->c_flags |= IOG_BIO_BLOCKED)
> +#define clear_group_blocked(gp) ((gp)->c_flags &= ~IOG_BIO_BLOCKED)
> +#define is_group_blocked(gp) ((gp)->c_flags & IOG_BIO_BLOCKED)
> +
> +#define set_group_down(gp) ((gp)->c_flags |= IOG_GOING_DOWN)
> +#define clear_group_down(gp) ((gp)->c_flags &= ~IOG_GOING_DOWN)
> +#define is_group_down(gp) ((gp)->c_flags & IOG_GOING_DOWN)
> +
> +#define set_group_suspended(gp) ((gp)->c_flags |= IOG_SUSPENDED)
> +#define clear_group_suspended(gp) ((gp)->c_flags &= ~IOG_SUSPENDED)
> +#define is_group_suspended(gp) ((gp)->c_flags & IOG_SUSPENDED)
> +
> +#define set_group_need_up(gp) ((gp)->c_flags |= IOG_NEED_UP)
> +#define clear_group_need_up(gp) ((gp)->c_flags &= ~IOG_NEED_UP)
> +#define group_need_up(gp) ((gp)->c_flags & IOG_NEED_UP)
> +
> +#define set_prio_read(gp) ((gp)->c_flags |= IOG_PRIO_QUEUE)
> +#define clear_prio_read(gp) ((gp)->c_flags &= ~IOG_PRIO_QUEUE)
> +#define is_prio_read(gp) \
> + ((gp)->c_flags & (IOG_PRIO_QUEUE|IOG_PRIO_BIO_WRITE) == IOG_PRIO_QUEUE)
> +
> +#define set_prio_write(gp) \
> + ((gp)->c_flags |= (IOG_PRIO_QUEUE|IOG_PRIO_BIO_WRITE))
> +#define clear_prio_write(gp) \
> + ((gp)->c_flags &= ~(IOG_PRIO_QUEUE|IOG_PRIO_BIO_WRITE))
> +#define is_prio_write(gp) \
> + ((gp)->c_flags & (IOG_PRIO_QUEUE|IOG_PRIO_BIO_WRITE) == \
> + (IOG_PRIO_QUEUE|IOG_PRIO_BIO_WRITE))
> +
> +#define set_prio_queue(gp, direct) \
> + ((gp)->c_flags |= (IOG_PRIO_QUEUE|direct))
> +#define clear_prio_queue(gp) clear_prio_write(gp)
> +#define is_prio_queue(gp) ((gp)->c_flags & IOG_PRIO_QUEUE)
> +#define prio_queue_direct(gp) ((gp)->c_flags & IOG_PRIO_BIO_WRITE)
> +
> +struct policy_type {
> + const char *p_name;
> + int (*p_policy_init) (struct ioband_device *, int, char **);
> +};
> +
> +extern struct policy_type dm_ioband_policy_type[];
> +
> +struct group_type {
> + const char *t_name;
> + int (*t_getid) (struct bio *);
> +};
> +
> +extern struct group_type dm_ioband_group_type[];
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
---end quoted text---
More information about the dm-devel
mailing list