[dm-devel] Preliminary Agenda and Activities for LSF

Tue Mar 29 23:09:28 UTC 2011

> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer at redhat.com]
> Sent: Tuesday, March 29, 2011 4:24 PM
> To: Iyer, Shyam
> Cc: linux-scsi at vger.kernel.org; lsf at lists.linux-foundation.org; linux-
> fsdevel at vger.kernel.org; rwheeler at redhat.com; vgoyal at redhat.com;
> device-mapper development
> Subject: Re: Preliminary Agenda and Activities for LSF
> 
> On Tue, Mar 29 2011 at  4:12pm -0400,
> Shyam_Iyer at dell.com <Shyam_Iyer at dell.com> wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Mike Snitzer [mailto:snitzer at redhat.com]
> > > Sent: Tuesday, March 29, 2011 4:00 PM
> > > To: Iyer, Shyam
> > > Cc: vgoyal at redhat.com; lsf at lists.linux-foundation.org; linux-
> > > scsi at vger.kernel.org; linux-fsdevel at vger.kernel.org;
> > > rwheeler at redhat.com; device-mapper development
> > > Subject: Re: Preliminary Agenda and Activities for LSF
> > >
> > > On Tue, Mar 29 2011 at  3:13pm -0400,
> > > Shyam_Iyer at dell.com <Shyam_Iyer at dell.com> wrote:
> > >
> > > > > > > Above is pretty generic. Do you have specific
> > > needs/ideas/concerns?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vivek
> > > > > > Yes.. if I limited by Ethernet b/w to 40% I don't need to
> limit
> > > I/O
> > > > > b/w via cgroups. Such bandwidth manipulations are network
> switch
> > > driven
> > > > > and cgroups never take care of these events from the Ethernet
> > > driver.
> > > > >
> > > > > So if IO is going over network and actual bandwidth control is
> > > taking
> > > > > place by throttling ethernet traffic then one does not have to
> > > specify
> > > > > block cgroup throttling policy and hence no need for cgroups to
> be
> > > > > worried
> > > > > about ethernet driver events?
> > > > >
> > > > > I think I am missing something here.
> > > > >
> > > > > Vivek
> > > > Well.. here is the catch.. example scenario..
> > > >
> > > > - Two iSCSI I/O sessions emanating from Ethernet ports eth0, eth1
> > > multipathed together. Let us say round-robin policy.
> > > >
> > > > - The cgroup profile is to limit I/O bandwidth to 40% of the
> > > multipathed I/O bandwidth. But the switch may have limited the I/O
> > > bandwidth to 40% for the corresponding vlan associated with one of
> the
> > > eth interface say eth1
> > > >
> > > > The computation that the bandwidth configured is 40% of the
> available
> > > bandwidth is false in this case.  What we need to do is possibly
> push
> > > more I/O through eth0 as it is allowed to run at 100% of bandwidth
> by
> > > the switch.
> > > >
> > > > Now this is a dynamic decision and multipathing layer should take
> > > care of it.. but it would need a hint..
> > >
> > > No hint should be needed.  Just use one of the newer multipath path
> > > selectors that are dynamic by design: "queue-length" or "service-
> time".
> > >
> > > This scenario is exactly what those path selectors are meant to
> > > address.
> > >
> > > Mike
> >
> > Since iSCSI multipaths are essentially sessions one could configure
> > more than one session through the same ethX interface. The sessions
> > need not be going to the same LUN and hence not governed by the same
> > multipath selector but the bandwidth policy group would be for a
> group
> > of resources.
> 
> Then the sessions don't correspond to the same backend LUN (and by
> definition aren't part of the same mpath device).  You're really all
> over the map with your talking points.
> 
> I'm having a hard time following you.
> 
> Mike

Let me back up here.. this has to be thought in not only the traditional Ethernet sense but also in a Data Centre Bridged environment. I shouldn't have wandered into the multipath constructs..

I think the statement on not going to the same LUN was a little erroneous. I meant different /dev/sdXs.. and hence different block I/O queues.

Each I/O queue could be thought of as a bandwidth queue class being serviced through a corresponding network adapter's queue(assuming a multiqueue capable adapter)

Let us say /dev/sda(Through eth0) and /dev/sdb(eth1) are a cgroup bandwidth group corresponding to a weightage of 20% of the I/O bandwidth the user has configured this weight thinking that this will correspond to say 200Mb of bandwidth.

Let us say the network bandwidth on the corresponding network queues corresponding was reduced by the DCB capable switch...
We still need an SLA of 200Mb of I/O bandwidth but the underlying dynamics have changed.

In such a scenario the option is to move I/O to a different bandwidth priority queue in the network adapter. This could be moving I/O to a new network queue in eth0 or another queue in eth1 .. 

This requires mapping the block queue to the new network queue.

One way of solving this is what is getting into the open-iscsi world i.e. creating a session tagged to the relevant DCB priority and thus the session gets mapped to the relevant tc queue which ultimately maps to one of the network adapters multiqueue..

But when multipath fails over to the different session path then the DCB bandwidth priority will not move with it..

Ok one could argue that is a user mistake to have configured bandwidth priorities differently but it may so happen that the bandwidth priority was just dynamically changed by the switch for the particular queue.

Although I gave an example of a DCB environment but we could definitely look at doing a 1:n map of block queues to network adapter queues for non-DCB environments too..

-Shyam