[dm-devel] Shell Scripts or Arbitrary Priority Callouts?

Tue Mar 24 11:57:08 UTC 2009

On Tue, Mar 24, 2009 at 07:02:41AM -0400, John A. Sullivan III wrote:
> > > <snip>
> > > I'm trying to spend a little time on this today and am really feeling my
> > > ignorance on the way iSCSI works :(  It looks like linux-iscsi supports
> > > MC/S but has not been in active development and will not even compile on
> > > my 2.6.27 kernel.
> > > 
> > > To simplify matters, I did put each SAN interface on a separate network.
> > > Thus, all the different sessions.  If I place them all on the same
> > > network and use the iface parameters of open-iscsi, does that eliminate
> > > the out-of-order problem and allow me to achieve the performance
> > > scalability I'm seeking from dm-multipath in multibus mode? Thanks -
> > > John
> > 
> > If you use ifaces feature of open-iscsi, you still get separate sessions.
> > 
> > open-iscsi just does not support MC/s :(
> > 
> > I think core-iscsi does support MC/s.. 
> > 
> > Then you again you should play with the different multipath settings, and
> > tweak how often IOs are split to different paths etc.. maybe that helps.
> > 
> > -- Pasi
> <snip>
> I think we're pretty much at the end of our options here but I document
> what I've found thus far for closure.
> 
> Indeed, there seems to be no way around the session problem.  Core-iscsi
> does seem to support MC/s but has not been updated in years.  It did not
> compile with my 2.6.27 kernel and, given that others seem to have had
> the same problem, I did not spend a lot of time troubleshooting it.
> 

Core-iscsi developer seems to be active developing at least the 
new iSCSI target (LIO target).. I think he has been testing it with
core-iscsi, so maybe there's newer version somewhere? 

> We did play with the multipath rr_min_io settings and smaller always
> seemed to be better until we got into very large numbers of session.  We
> were testing on a dual quad core AMD Shanghai 2378 system with 32 GB
> RAM, a quad port Intel e1000 card and two on-board nvidia forcedeth
> ports with disktest using 4K blocks to mimic the file system using
> sequential reads (and some sequential writes).
> 

Nice hardware. Btw are you using jumbo frames or flow control for iSCSI
traffic? 

> With a single thread, there was no difference at all - only about 12.79
> MB/s no matter what we did.  With 10 threads and only two interfaces,
> there was only a slight difference between rr=1 (81.2B/s), rr=10 (78.87)
> and rr=100 (80).
> 
> However, when we opened to three and four interfaces, there was a huge
> jump for rr=1 (100.4, 105.95) versus rr=10 (80.5, 80.75) and rr=100
> (74.3, 77.6).
> 
> At 100 threads on three or four ports, the best performance shifted to
> rr=10 (327 MB/s, 335) rather than rr=1 (291.7, 290.1) or rr=100 (216.3).
> At 400 threads, rr=100 started to overtake rr=10 slightly.
> 
> This was using all e1000 interfaces. Our first four port test included
> one of the on board ports and performance was dramatically less than
> three e1000 ports.  Subsequent testing tweaking forcedeth parameters
> from defaults yielded no improvement.
> 
> After solving the I/O scheduler problem, dm RAID0 behaved better.  It
> still did not give us anywhere near a fourfold increase (four disks on
> four separate ports) but only marginal improvement (14.3 MB/s) using c=8
> (to fit into a jumbo packet, match the zvol block size on the back end
> and be two block sizes).  It did, however, give the best balance of
> performance being just slightly slower than rr=1 at 10 threads and
> slightly slower than rr=10 at 100 threads though not scaling as well to
> 400 threads.
> 

When you used dm RAID0 you didn't have any multipath configuration, right? 

What kind of stripe size and other settings you had for RAID0?

What kind of performance do you get using just a single iscsi session (and
thus just a single path), no multipathing, no DM RAID0 ? Just a filesystem
directly on top of the iscsi /dev/sd? device.

> Thus, collective throughput is acceptable but individual throughput is
> still awful.
> 

Sounds like there's some other problem if invidual throughput is bad? Or did
you mean performance with a single disktest IO thread is bad, but using multiple
disktest threads it's good.. that would make more sense :) 

-- Pasi