[dm-devel] [Lsf] Preliminary Agenda and Activities for LSF

Tue Mar 29 19:13:41 UTC 2011

> -----Original Message-----
> From: Vivek Goyal [mailto:vgoyal at redhat.com]
> Sent: Tuesday, March 29, 2011 2:45 PM
> To: Iyer, Shyam
> Cc: rwheeler at redhat.com; James.Bottomley at hansenpartnership.com;
> lsf at lists.linux-foundation.org; linux-fsdevel at vger.kernel.org; dm-
> devel at redhat.com; linux-scsi at vger.kernel.org
> Subject: Re: [Lsf] Preliminary Agenda and Activities for LSF
> 
> On Tue, Mar 29, 2011 at 11:10:18AM -0700, Shyam_Iyer at Dell.com wrote:
> >
> >
> > > -----Original Message-----
> > > From: Vivek Goyal [mailto:vgoyal at redhat.com]
> > > Sent: Tuesday, March 29, 2011 1:34 PM
> > > To: Iyer, Shyam
> > > Cc: rwheeler at redhat.com; James.Bottomley at hansenpartnership.com;
> > > lsf at lists.linux-foundation.org; linux-fsdevel at vger.kernel.org; dm-
> > > devel at redhat.com; linux-scsi at vger.kernel.org
> > > Subject: Re: [Lsf] Preliminary Agenda and Activities for LSF
> > >
> > > On Tue, Mar 29, 2011 at 10:20:57AM -0700, Shyam_Iyer at dell.com
> wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: linux-scsi-owner at vger.kernel.org [mailto:linux-scsi-
> > > > > owner at vger.kernel.org] On Behalf Of Ric Wheeler
> > > > > Sent: Tuesday, March 29, 2011 7:17 AM
> > > > > To: James Bottomley
> > > > > Cc: lsf at lists.linux-foundation.org; linux-fsdevel; linux-
> > > > > scsi at vger.kernel.org; device-mapper development
> > > > > Subject: Re: [Lsf] Preliminary Agenda and Activities for LSF
> > > > >
> > > > > On 03/29/2011 12:36 AM, James Bottomley wrote:
> > > > > > Hi All,
> > > > > >
> > > > > > Since LSF is less than a week away, the programme committee
> put
> > > > > together
> > > > > > a just in time preliminary agenda for LSF.  As you can see
> there
> > > is
> > > > > > still plenty of empty space, which you can make suggestions
> (to
> > > this
> > > > > > list with appropriate general list cc's) for filling:
> > > > > >
> > > > > >
> > > > >
> > >
> https://spreadsheets.google.com/pub?hl=en&hl=en&key=0AiQMl7GcVa7OdFdNQz
> > > > > M5UDRXUnVEbHlYVmZUVHQ2amc&output=html
> > > > > >
> > > > > > If you don't make suggestions, the programme committee will
> feel
> > > > > > empowered to make arbitrary assignments based on your topic
> and
> > > > > attendee
> > > > > > email requests ...
> > > > > >
> > > > > > We're still not quite sure what rooms we will have at the
> Kabuki,
> > > but
> > > > > > we'll add them to the spreadsheet when we know (they should
> be
> > > close
> > > > > to
> > > > > > each other).
> > > > > >
> > > > > > The spreadsheet above also gives contact information for all
> the
> > > > > > attendees and the programme committee.
> > > > > >
> > > > > > Yours,
> > > > > >
> > > > > > James Bottomley
> > > > > > on behalf of LSF/MM Programme Committee
> > > > > >
> > > > >
> > > > > Here are a few topic ideas:
> > > > >
> > > > > (1)  The first topic that might span IO & FS tracks (or just
> pull
> > > in
> > > > > device
> > > > > mapper people to an FS track) could be adding new commands that
> > > would
> > > > > allow
> > > > > users to grow/shrink/etc file systems in a generic way.  The
> > > thought I
> > > > > had was
> > > > > that we have a reasonable model that we could reuse for these
> new
> > > > > commands like
> > > > > mount and mount.fs or fsck and fsck.fs. With btrfs coming down
> the
> > > > > road, it
> > > > > could be nice to identify exactly what common operations users
> want
> > > to
> > > > > do and
> > > > > agree on how to implement them. Alasdair pointed out in the
> > > upstream
> > > > > thread that
> > > > > we had a prototype here in fsadm.
> > > > >
> > > > > (2) Very high speed, low latency SSD devices and testing. Have
> we
> > > > > settled on the
> > > > > need for these devices to all have block level drivers? For S-
> ATA
> > > or
> > > > > SAS
> > > > > devices, are there known performance issues that require
> > > enhancements
> > > > > in
> > > > > somewhere in the stack?
> > > > >
> > > > > (3) The union mount versus overlayfs debate - pros and cons.
> What
> > > each
> > > > > do well,
> > > > > what needs doing. Do we want/need both upstream? (Maybe this
> can
> > > get 10
> > > > > minutes
> > > > > in Al's VFS session?)
> > > > >
> > > > > Thanks!
> > > > >
> > > > > Ric
> > > >
> > > > A few others that I think may span across I/O, Block fs..layers.
> > > >
> > > > 1) Dm-thinp target vs File system thin profile vs block map based
> > > thin/trim profile.
> > >
> > > > Facilitate I/O throttling for thin/trimmable storage. Online and
> > > Offline profil.
> > >
> > > Is above any different from block IO throttling we have got for
> block
> > > devices?
> > >
> > Yes.. so the throttling would be capacity  based.. when the storage
> array wants us to throttle the I/O. Depending on the event we may keep
> getting space allocation write protect check conditions for writes
> until a user intervenes to stop I/O.
> >
> 
> Sounds like some user space daemon listening for these events and then
> modifying cgroup throttling limits dynamically?

But we have dm-targets in the horizon like dm-thinp setting soft limits on capacity.. we could extend the concept to H/W imposed soft/hard limits.

The user space could throttle the I/O but it had have to go about finding all processes running I/O on the LUN.. In some cases it could be an I/O process running within a VM.. 

That would require a passthrough interface to inform it.. I doubt if we would be able to accomplish that any sooner with the multiple operating systems involved. Or requiring each application to register with the userland process. Doable but cumbersome and buggy..

The dm-thinp target can help in this scenario by setting a blanket storage limit. We could go about extending the limit dynamically based on hints/commands from the userland daemon listening to such events.

This approach will probably not take care of scenarios where VM storage is over say NFS or clustered filesystem..
> 
> >
> > > > 2) Interfaces for SCSI, Ethernet/*transport configuration
> parameters
> > > floating around in sysfs, procfs. Architecting guidelines for
> accepting
> > > patches for hybrid devices.
> > > > 3) DM snapshot vs FS snapshots vs H/W snapshots. There is room
> for
> > > all and they have to help each other
> >
> > For instance if you took a DM snapshot and the storage sent a check
> condition to the original dm device I am not sure if the DM snapshot
> would get one too..
> >
> > If you had a scenario of taking H/W snapshot of an entire pool and
> decide to delete the individual DM snapshots the H/W snapshot would be
> inconsistent.
> >
> > The blocks being managed by a DM-device would have moved (SCSI
> referrals). I believe Hannes is working on the referrals piece..
> >
> > > > 4) B/W control - VM->DM->Block->Ethernet->Switch->Storage. Pick
> your
> > > subsystem and there are many non-cooperating B/W control constructs
> in
> > > each subsystem.
> > >
> > > Above is pretty generic. Do you have specific needs/ideas/concerns?
> > >
> > > Thanks
> > > Vivek
> > Yes.. if I limited by Ethernet b/w to 40% I don't need to limit I/O
> b/w via cgroups. Such bandwidth manipulations are network switch driven
> and cgroups never take care of these events from the Ethernet driver.
> 
> So if IO is going over network and actual bandwidth control is taking
> place by throttling ethernet traffic then one does not have to specify
> block cgroup throttling policy and hence no need for cgroups to be
> worried
> about ethernet driver events?
> 
> I think I am missing something here.
> 
> Vivek
Well.. here is the catch.. example scenario..

- Two iSCSI I/O sessions emanating from Ethernet ports eth0, eth1  multipathed together. Let us say round-robin policy.

- The cgroup profile is to limit I/O bandwidth to 40% of the multipathed I/O bandwidth. But the switch may have limited the I/O bandwidth to 40% for the corresponding vlan associated with one of the eth interface say eth1

The computation that the bandwidth configured is 40% of the available bandwidth is false in this case.  What we need to do is possibly push more I/O through eth0 as it is allowed to run at 100% of bandwidth by the switch. 

Now this is a dynamic decision and multipathing layer should take care of it.. but it would need a hint..

Policies are usually decided at different levels, SLAs and sometimes logistics determine these decisions etc. Sometimes the bandwidth lowering by the switch is traffic dependent but user level policies remain in tact. Typical case of network administrator not talking to the system administrator.

-Shyam