[dm-devel] multipath: Path checks on open-iscsi software initiators

Mike Snitzer snitzer at gmail.com
Tue Feb 9 04:45:35 UTC 2010


On Mon, Feb 8, 2010 at 8:18 PM, Daniel Stodden
<daniel.stodden at citrix.com> wrote:
>
> Hi.
>
> I've recently been spending some time tracing path checks on iSCSI
> targets.
>
> Samples described here were taken with the directio checker on a netapp
> lun, but I believe the target kind doesn't matter here, since most of
> what I find is rather driven by the initiator side.
>
> So what I see is:
>
> 1. The directio checker issues its aio read on sector0.
>
> 2. The request obviously will block until iscsi is giving up on it.
>  This typically happens not before target pings (noop-out ops)
>  issued internally by the initiator time out. Look like:
>
>  iscsid: Nop-out timedout after 15 seconds on connection 1:0
>  state (3). Dropping session.
>
>  (period and timeouts depend on the configuration at hand).
>
> 3. Session failure still won't unblock the read. This is because the
>  iscsi session will enter recovery mode, to avoid failing the
>  data path right away. The device will enter blocked state during
>  that period.
>
>  Since I'm provoking a complete failure, this will time out as well,
>  but only later:
>
>  iscsi: session recovery timed out after 15 secs
>
>  (again, timeouts are iscsid.conf-dependent)
>
> 4. This will finally unblock the directio check with EIO,
>   triggering the path failure.
>
>
> My main issue is that a device sitting on a software iscsi initiator
>
>  a) performs its own path failure detection and
>  b) defers data path operations to mask failures,
>    which obviously counteracts a checker based on
>    data path operations.
>
> Kernels somewhere during the 2.6.2x series apparently started to move
> part of the session checks into the kernel (apparently including the
> noop-out itself, but I don't). One side effect of that is that session
> state can be queried via sysfs.
>
> So right now I'm mainly wondering if a multipath failure driven rather
> by polling session state that a data read wouldn't be more effective?
>
> I've only been browsing part of the iscsi code by now, but I don't see
> how data path failures wouldn't relate to session state.
>
> There's some code attached below to demonstrate that. It presently jumps
> through some extra loops to reverse-map fd back to the block device
> node, but the basic thing was relatively straightforward to implement.
>
> Thanks in advance for about any input on that matter.
>
> Cheers,
> Daniel
>

You might look at the multipath-tools patch included in a fairly
recent dm-devel mail titled "[PATCH] Update path_offline() to return
device status"

The committed patch is available here:
http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=88c75172cf56e




More information about the dm-devel mailing list