[dm-devel] Shell Scripts or Arbitrary Priority Callouts?

Sun Mar 29 18:09:51 UTC 2009

On Fri, Mar 27, 2009 at 02:28:45PM -0400, John A. Sullivan III wrote:
> On Fri, 2009-03-27 at 03:03 -0400, John A. Sullivan III wrote:
> > On Wed, 2009-03-25 at 12:21 -0400, John A. Sullivan III wrote:
> > > On Wed, 2009-03-25 at 17:52 +0200, Pasi Kärkkäinen wrote:
> > > > On Tue, Mar 24, 2009 at 11:41:00PM -0400, John A. Sullivan III wrote:
> > > > > > > Latency seems to be our key.  If I can add only 20 micro-seconds of
> > > > > > > latency from initiator and target each, that would be roughly 200 micro
> > > > > > > seconds.  That would almost triple the throughput from what we are
> > > > > > > currently seeing.
> > > > > > > 
> > > > > > 
> > > > > > Indeed :) 
> > > > > > 
> > > > > > > Unfortunately, I'm a bit ignorant of tweaking networks on opensolaris.
> > > > > > > I can certainly learn but am I headed in the right direction or is this
> > > > > > > direction of investigation misguided? Thanks - John
> > > > > > > 
> > > > > > 
> > > > > > Low latency is the key for good (iSCSI) SAN performance, as it directly
> > > > > > gives you more (possible) IOPS. 
> > > > > > 
> > > > > > Other option is to configure software/settings so that there are multiple
> > > > > > outstanding IO's on the fly.. then you're not limited with the latency (so much).
> > > > > > 
> > > > > > -- Pasi
> > > > > <snip>
> > > > > Ross has been of enormous help offline.  Indeed, disabling jumbo packets
> > > > > produced an almost 50% increase in single threaded throughput.  We are
> > > > > pretty well set although still a bit disappointed in the latency we are
> > > > > seeing in opensolaris and have escalated to the vendor about addressing
> > > > > it.
> > > > > 
> > > > 
> > > > Ok. That's pretty big increase. Did you figure out why that happens? 
> > > Greater latency with jumbo packets.
> > > > 
> > > > > The once piece which is still a mystery is why using four targets on
> > > > > four separate interfaces striped with dmadm RAID0 does not produce an
> > > > > aggregate of slightly less than four times the IOPS of a single target
> > > > > on a single interface. This would not seem to be the out of order SCSI
> > > > > command problem of multipath.  One of life's great mysteries yet to be
> > > > > revealed.  Thanks again, all - John
> > > > 
> > > > Hmm.. maybe the out-of-order problem happens at the target? It gets IO
> > > > requests to nearby offsets from 4 different sessions and there's some kind
> > > > of locking or so going on? 
> > > Ross pointed out a flaw in my test methodology.  By running one I/O at a
> > > time, it was literally doing that - not one full RAID0 I/O but one disk
> > > I/O apparently.  He said to truly test it, I would need to run as many
> > > concurrent I/Os as there were disks in the array.  Thanks - John
> > > ><snip>
> > Argh!!! This turned out to be alarmingly untrue.  This time, we were
> > doing some light testing on a different server with two bonded
> > interfaces in a single bridge (KVM environment) going to the same SAM we
> > used in our four port test.
> > 
> > For kicks and to prove to ourselves that RAID0 scaled with multiple I/O
> > as opposed to limiting the test to only single I/O, we tried some actual
> > file transfers to the SAN mounted in sync mode.  We found concurrently
> > transferring two identical files to the RAID0 array composed of two
> > iSCSI attached drives was 57% slower than concurrently transferring the
> > files to the drives separately. In other words, copying file1 and file2
> > concurrently to RAID0 took 57% longer than concurrently copying file1 to
> > disk1 and file2 to disk2.
> > 
> > We then took a little different approach and used disktest.  We ran two
> > concurrent sessions with -K1.  In one case, we ran both sessions to the
> > 2 disk RAID0 array.  The performance was significantly less again, than
> > running the two concurrent tests against two separate iSCSI disks.  Just
> > to be clear, these were the same disks as composed the array, just not
> > grouped in the array.
> > 
> > Even more alarmingly, we did the same test using multipath multibus,
> > i.e., two concurrent disktest with -K1 (both reads and rights, all
> > sequential with 4K block sizes).  The first session completely starved
> > the second.  The first one continued at only slightly reduced speed
> > while the second one (kicked off just as fast as we could hit the enter
> > key) received only roughly 50 IOPS.  Yes, that's fifty.
> > 
> > Frightening but I thought I had better pass along such extreme results
> > to the multipath team.  Thanks - John
> HOLD THE PRESSES - This turned out to be a DIFFERENT problem.  Argh!
> That's what I get for being a management type out of my depth doing
> engineering until we hire our engineering staff!
> 
> As mentioned, these tests were run on a different, lighter duty system.
> When we ran the same tests on the larger, four dedicated SAN port
> server, RAID0 scaled nicely showing little degradation between one
> thread and four concurrent threads, i.e., our test file transfers took
> almost the same when a single user did them as opposed to when four
> users did them concurrently.
> 
> The problem with our other system was, the RAID (and probably
> multi-path) was backfiring because the iSCSI connection was buckling
> under any appreciable load because the Ethernet interfaces use bridging.
> 
> These are much lighter duty systems and we bought them from the same
> vendor as the SAN but with only the two onboard Ethernet ports.  Being
> ignorant, we looked to them for design guidance (and they were excellent
> in all other regards) and were not cautioned about sharing these
> interfaces.  Because these are light duty, we intentionally broke the
> cardinal rule of not using a dedicated SAN network for them.  That's not
> so much the problem. However, because they are running KVM, the
> interfaces are bridged (actually bonded and bridged using tlb as alb
> breaks with bridging in its current implementation - but bonding is not
> the issue).  Under any appreciable load, the iSCSI connections time out.
> We've tried varying the noop time out values but with no success.  We do
> not have the time to test rigorously but assume this is why throughput
> did not scale at all.  disktest with -K10 achieved the same throughput
> as disktest with -K1.  Oh well, the price of tuition.

Uhm, so there was virtualization in the mix.. I didn't realize that earlier..

Did you benchmark from the host or from the guest? 

So yeah.. the RAID-setup is working now, if I understood you correctly.. 
but the multipath setup is still problematic? 

-- Pasi