[dm-devel] multipath: Path checks on open-iscsi software initiators
Mike Snitzer
snitzer at gmail.com
Tue Feb 9 04:45:35 UTC 2010
On Mon, Feb 8, 2010 at 8:18 PM, Daniel Stodden
<daniel.stodden at citrix.com> wrote:
>
> Hi.
>
> I've recently been spending some time tracing path checks on iSCSI
> targets.
>
> Samples described here were taken with the directio checker on a netapp
> lun, but I believe the target kind doesn't matter here, since most of
> what I find is rather driven by the initiator side.
>
> So what I see is:
>
> 1. The directio checker issues its aio read on sector0.
>
> 2. The request obviously will block until iscsi is giving up on it.
> This typically happens not before target pings (noop-out ops)
> issued internally by the initiator time out. Look like:
>
> iscsid: Nop-out timedout after 15 seconds on connection 1:0
> state (3). Dropping session.
>
> (period and timeouts depend on the configuration at hand).
>
> 3. Session failure still won't unblock the read. This is because the
> iscsi session will enter recovery mode, to avoid failing the
> data path right away. The device will enter blocked state during
> that period.
>
> Since I'm provoking a complete failure, this will time out as well,
> but only later:
>
> iscsi: session recovery timed out after 15 secs
>
> (again, timeouts are iscsid.conf-dependent)
>
> 4. This will finally unblock the directio check with EIO,
> triggering the path failure.
>
>
> My main issue is that a device sitting on a software iscsi initiator
>
> a) performs its own path failure detection and
> b) defers data path operations to mask failures,
> which obviously counteracts a checker based on
> data path operations.
>
> Kernels somewhere during the 2.6.2x series apparently started to move
> part of the session checks into the kernel (apparently including the
> noop-out itself, but I don't). One side effect of that is that session
> state can be queried via sysfs.
>
> So right now I'm mainly wondering if a multipath failure driven rather
> by polling session state that a data read wouldn't be more effective?
>
> I've only been browsing part of the iscsi code by now, but I don't see
> how data path failures wouldn't relate to session state.
>
> There's some code attached below to demonstrate that. It presently jumps
> through some extra loops to reverse-map fd back to the block device
> node, but the basic thing was relatively straightforward to implement.
>
> Thanks in advance for about any input on that matter.
>
> Cheers,
> Daniel
>
You might look at the multipath-tools patch included in a fairly
recent dm-devel mail titled "[PATCH] Update path_offline() to return
device status"
The committed patch is available here:
http://git.kernel.org/gitweb.cgi?p=linux/storage/multipath-tools/.git;a=commit;h=88c75172cf56e
More information about the dm-devel
mailing list