[dm-devel] FW: How to avoid lots of read ios to passive paths of active-pass ive storage devices?

Steve Lord lord at xfs.org
Fri Sep 2 15:34:35 UTC 2005


goggin, edward wrote:
> Reposting since it didn't get much response initially and the issue came up
> again
> in yesterday's multipath conference call.
> 

>>
>>While this isn't an issue now, it could become one later
>>when/if linux hosts are configured with hundreds/thousands
>>of passive paths.
>>
> 

There are already issues out there, these were not directly with
multipath, but should serve as examples.

An SGI Altix box with 8 HBA ports connected via a fabric to 4 Engenio dual
controller raids (2 ports on each controller). From what I recall there were 4
LUNs assigned to each controller on each raid. I think there were 1024 paths
in the complete configuration. This was actually a very small version of
the planned production system which has multiple hosts and several thousand
LUNs.

Path ping pong during partition table scanning took several hours
to resolve itself (we gave up waiting and went home for the day).

The issue was made worth by attempts at parallelism and retries in
the logic. Multiple device reads were issued in parallel
via udev to all the different paths to devices, these reads
did retries on failures. Since a trespass (or automatic volume
transfer, depending on your terminology), causes a failure on
the active path on this raid, end result was it takes a lot of
I/O failures before one actually works.

Once all this completed, various volume manager components then
came along and tried to look for their metadata at the other
end of the LUNs. The same chaos ensues.

Engenio has actually added code to their raid firmware which
lets you turn off automatic transfers within the first few
blocks of the disk. This deals with partition scanning for the
most part. There is no code to deal with metadata scanning
at the end of luns, just don't do it.

There are Linux SANs in production where the reboot of a
single node in a fabric causes all the active nodes to suffer
major performance problems as paths get moved out from under
them.

In the RDAC mode of operation instead of the path ping pong
issue, you still end up with slow I/O failures on the
standby paths. Nowhere near as bad, but still painful
once you scale things up.

Steve

p.s. is anyone working on multipath modules for Engenio devices?




More information about the dm-devel mailing list