[dm-devel] Shell Scripts or Arbitrary Priority Callouts?

Tue Mar 24 15:43:20 UTC 2009

I greatly appreciate the help.  I'll answer in the thread below as well
as consolidating answers to the questions posed in your other email.

On Tue, 2009-03-24 at 17:01 +0200, Pasi Kärkkäinen wrote:
> On Tue, Mar 24, 2009 at 08:21:45AM -0400, John A. Sullivan III wrote:
> > > 
> > > Core-iscsi developer seems to be active developing at least the 
> > > new iSCSI target (LIO target).. I think he has been testing it with
> > > core-iscsi, so maybe there's newer version somewhere? 
> > > 
> > > > We did play with the multipath rr_min_io settings and smaller always
> > > > seemed to be better until we got into very large numbers of session.  We
> > > > were testing on a dual quad core AMD Shanghai 2378 system with 32 GB
> > > > RAM, a quad port Intel e1000 card and two on-board nvidia forcedeth
> > > > ports with disktest using 4K blocks to mimic the file system using
> > > > sequential reads (and some sequential writes).
> > > > 
> > > 
> > > Nice hardware. Btw are you using jumbo frames or flow control for iSCSI
> > > traffic? 
> > > 
> 
> Dunno if you noticed this.. :) 
We are actually quite enthusiastic about the environment and the
project.  We hope to have many of these hosting about 400 VServer guests
running virtual desktops from the X2Go project.  It's not my project but
I don't mind plugging them as I think it is a great technology.

We are using jumbo frames.  The ProCurve 2810 switches explicitly state
to NOT use flow control and jumbo frames simultaneously.  We tried it
anyway but with poor results.
> 
> 
> > > > 
> > > 
> > > When you used dm RAID0 you didn't have any multipath configuration, right? 
> > Correct although we also did test successfully with multipath in
> > failover mode and RAID0.
> > > 
> 
> OK.
> 
> > > What kind of stripe size and other settings you had for RAID0?
> > Chunk size was 8KB with four disks.  
> > > 
> 
> Did you try with much bigger sizes.. 128 kB ?
We tried slightly larger sizes - 16KB and 32KB I believe and observed
performance degradation.  In fact, in some scenarios 4KB chunk sizes
gave us better performance than 8KB.
> 
> > > What kind of performance do you get using just a single iscsi session (and
> > > thus just a single path), no multipathing, no DM RAID0 ? Just a filesystem
> > > directly on top of the iscsi /dev/sd? device.
> > Miserable - same roughly 12 MB/s.
> 
> OK, Here's your problem. Was this btw reads or writes? Did you tune
> readahead-settings? 
12MBps is sequential reading but sequential writing is not much
different.  We did tweak readahead to 1024. We did not want to go much
larger in order to maintain balance with the various data patterns -
some of which are random and some of which may not read linearly.
> 
> Can paste your iSCSI session settings negotiated with the target? 
Pardon my ignorance :( but, other than packet traces, how do I show the
final negotiated settings?
> 
> > > 
> > > Sounds like there's some other problem if invidual throughput is bad? Or did
> > > you mean performance with a single disktest IO thread is bad, but using multiple
> > > disktest threads it's good.. that would make more sense :) 
> > Yes, the latter.  Single thread (I assume mimicking a single disk
> > operation, e.g., copying a large file) is miserable - much slower than
> > local disk despite the availability of huge bandwidth.  We start
> > utilizing the bandwidth when multiplying concurrent disk activity into
> > the hundreds.
> > 
> > I am guessing the single thread performance problem is an open-iscsi
> > issue but I was hoping multipath would help us work around it by
> > utilizing multiple sessions per disk operation.  I suppose that is where
> > we run into the command ordering problem unless there is something else
> > afoot.  Thanks - John
> 
> You should be able to get many times the throughput you get now.. just with
> a single path/session.
> 
> What kind of latency do you have from the initiator to the target/storage? 
> 
> Try with for example 4 kB ping:
> ping -s 4096 <ip_of_the_iscsi_target>
We have about 400 micro seconds - that seems a bit high :(
rtt min/avg/max/mdev = 0.275/0.337/0.398/0.047 ms

> 
> 1000ms divided by the roundtrip you get from ping should give you maximum
> possible IOPS using a single path.. 
> 
1000 / 0.4 = 2500
> 4 kB * IOPS == max bandwidth you can achieve.
2500 * 4KB = 10 MBps
Hmm . . . seems like what we are getting.  Is that an abnormally high
latency? We have tried playing with interrupt coalescing on the
initiator side but without significant effect.  Thanks for putting
together the formula for me.  Not only does it help me understand but it
means I can work on addressing the latency issue without setting up and
running disk tests.

I would love to use larger block sizes as you suggest in your other
email but, on AMD64, I believe we are stuck with 4KB.  I've not seen any
way to change it and would gladly do so if someone knows how.

CFQ was indeed a problem.  It would not scale with increasing the number
of threads.  noop, deadline, and anticipatory all fared much better.  We
are currently using noop for the iSCSI targets.  Thanks again - John
> 
> -- Pasi
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
-- 
John A. Sullivan III
Open Source Development Corporation
+1 207-985-7880
jsullivan at opensourcedevel.com

http://www.spiritualoutreach.com
Making Christianity intelligible to secular society