[dm-devel] ALUA state unavailable and device discovery

Brian Bunker brian at purestorage.com
Wed Mar 17 19:39:16 UTC 2021


Hello All,

There seems to be an incompatibility in the Linux SCSI code between SCSI disk
discovery and the ALUA state unavailable. From the SPC specification if you use
ALUA state unavailable you also set the peripheral qualifier for that path.

While in the unavailable primary target port asymmetric access state, the device
server shall support those of the following commands that it supports while in the 
active/optimized state:
a) INQUIRY (the peripheral qualifier (see 6.6.2) shall be set to 001b)
...

The problem with that is that it limits when the host can discover disks or reboot.
In order for an sd device to be created, the PQ must be 0. This seems to come from the
scsi_bus_match function in scsi_sysfs.c.

return (sdp->inq_periph_qual == SCSI_INQ_PQ_CON)? 1: 0;

So it only will return 1, if the PQ is 0. 

As a result if a SCSI device is discovered while the ALUA state for that path is
in the unavailable state, an sd device will not be created. An sg device will but
not an sd one. As a result multipath will not create a dm device. Or, if an existing
dm device exists, a path for this newly discovered device will not be created. This
means when the device moves out of unavailable to an active ALUA state, or even
standby, there is no device, so no path to change the state of in multipath's dm
device.

For this reason the ALUA state standby looks attractive since it doesn't have
the PQ requirement. But looking at the commands required for support in the standby
ALUA state, there are some that are difficult to support in the disconnected peer
state, most notably persistent reservations, where not having access to a peer
can result in an inability to keep a consistent state when and if the path again
becomes available. The unavailable ALUA state has the right command list to support
in being disconnected from the source of truth, but the PQ requirement is the
trade off.

Is the PQ check here because of INQUIRY requests sent to non-existent LUNs leading
to sd devices being created?

In response to an INQUIRY command received by an incorrect logical unit, the SCSI
target device shall return the INQUIRY data with the peripheral qualifier set to the
value defined in 6.6.2.

As a test, I changed this line to this to allow sd to create devices where the
peripheral qualifier is not 011b as opposed to needing to be 000b.

return (sdp->inq_periph_qual != SCSI_INQ_PQ_NOT_CAP)? 1: 0;

This does allow an sd device to be created and multipath to create a path for it
in a dm device.

3624a93706a10c27f300a496100011010 dm-2 PURE    ,FlashArray      
size=2.0T features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=0 status=enabled
 `- 7:0:0:1 sdb 8:16 failed undef running

It is in the failed state, but when it comes back to an online ALUA state, the path
will return to active. There is an inconsistency since if the device was in any other
state than unavailable when it was discovered and then transitions to the unavailable
state, the device is already created so it can be transitioned in multipath and all
is good.

Is there a way to handle both unintended consequence and the ALUA unavailable state?

Thanks,
Brian

Brian Bunker
SW Eng
brian at purestorage.com







More information about the dm-devel mailing list