[libvirt] [PATCH 0/1] Multipath pool support

Dave Allan dallan at redhat.com
Wed Aug 19 19:58:35 UTC 2009


Dave Allan wrote:
> Daniel P. Berrange wrote:
>> On Thu, Jul 23, 2009 at 02:53:48PM -0400, Dave Allan wrote:
>>> Daniel P. Berrange wrote:
>>>>> It doesn't currently allow configuration of multipathing, so for
>>>>> now setting the multipath configuration will have to continue to be
>>>>> done as part of the host system build.
>>>>>
>>>>> Example XML to create the pool is:
>>>>>
>>>>> <pool type="mpath"> <name>mpath</name> <target>
>>>>> <path>/dev/mapper</path> </target> </pool>
>>>> So this is in essence a 'singleton' pool, since there's only really
>>>> one of them per host. There is also no quanity of storage associated
>>>> with a mpath pool - it is simply dealing with volumes from other
>>>> pools. This falls into the same conceptual bucket as things like
>>>> DM-RAID, MD-RAID and even loopback device management.
>>> It is a singleton pool, in that there is only one dm instance per host.
>>>  With regard to capacity, the dm devices have capacity, and their
>>> constituent devices could be members of other pools.  Can you elaborate
>>> on what you see as the implications of those points?
>>
>> The storage pool vs storage volume concept was modelled around the idea
>> that you have some storage source, and it is sub-divided into a number
>> of volumes
>>
>> With multipath pool you have no storage source - the source is the
>> SCSI/iSCSI pool which actually provides the underlying block provides
>> which are the LUN paths. So by having a explicit storage pool for
>> multipath, there's an implicit dependancy between 2 pools. If you
>> refresh a SCSI pool, you must then refresh a multipath pool too.
>> Or if you add a SCSI/iSCSI pool you must also refresh the multipath
>> pool. There's also the issue of tracking the assoication between
>> multipath volumes and the pools to ensure you don't remove a pool
>> that's providing a multipath volume thats still in use.
> 
> The problem of hierarchical relationships among pools can exist with the 
> other pools as well, since one could create a logical pool on top of a 
> block device that's part of an iSCSI or other pool.  It's also possible 
> that a hierarchical pool relationship might not exist with the multipath 
> pool if a user didn't create pools for HBAs.
> 
>>>> The question I've never been able to satisfactorily answer myself is
>>>> whether these things(mpath,raid,loopback) should be living in the
>>>> storage pool APIs, or in the host device APIs.
>>>>
>>>> I also wonder people determine the assoication between the volumes in
>>>> the mpath pool, and the volumes for each corresponding path. eg, how
>>>> do they determine that /dev/mapper/dm-4 multipath device is
>>>> associated with devices from the SCSI storage pool 'xyz'. The storage
>>>> volume APIs & XML format don't really have a way to express this
>>>> relationship.
>>> It's not difficult to query to find out what devices are parents of a
>>> given device, but what is the use case for finding out the pools of the
>>> parent devices?
>>
>> Say you have 3 SCSI NPIV pools configured, and a multipath pool.
>> You want to remove one of the SCSI pools, and know that the
>> multipath devices X, Y & Z are in use. You need to determine which
>> of the SCSI pools contains the underlying block devices for these
>> multipath devices before you can safely remove that SCSI pool.
> 
> Ok, that makes sense, but this problem exists with any hierarchical pool 
> so users are already dealing with it.
> 
>>>> The host device APIs have a much more limited set of operations
>>>> (list, create, delete) but this may well be all that's needed for
>>>> things like raid/mpath/loopback devices, and with its XML format
>>>> being capability based we could add a multipath capability under
>>>> which we list the constituent paths of each device.
>>> If we decide to implement creation and destruction of multipath devices,
>>> I would think the node device APIs would be the place to do it.
>>
>> If we intend to do creation/deletion of multipath devices in the
>> node device  APIs, then we esentially get listing of multipath
>> devices in the node device APIs for free. So do we need a dedicated
>> storage pool for multipath too ?
> 
> Isn't the general idea that storage pools are how people should be 
> managing storage?  We shouldn't make people use a separate API to 
> enumerate one type of storage.
> 
>> I have a feeling that the DeviceKit impl of the node devive APIs (which
>> is currently disabled by default, may already be reporting on all
>> device mapper block devices - the HAL impl does not.
> 
> That may be--there's a fairly wide gap between the two sets of 
> functionality.
> 
>>>> Now, if my understanding is correct, then if multipath is active it
>>>> should automatically create multipath devices for each unique LUN on
>>>> a storage array. DM does SCSI queries to determine which block
>>>> devices are paths to the same underlying LUN.
>>> That's basically correct, and the administrator can configure which
>>> devices have multipath devices created.
>>>
>>>> Taking a simple iSCSI storage pool
>>>>
>>>> <pool type='iscsi'> <name>virtimages</name> <source> <host
>>>> name="iscsi.example.com"/> <device path="demo-target"/> </source>
>>>> <target> <path>/dev/disk/by-path</path> </target> </pool>
>>>>
>>>> this example would show you each individual block device, generating
>>>> paths under /dev/disk/by-path.
>>>>
>>>> Now, we decide we want to make use of multipath for this particular
>>>> pool. We should be able to just change the target path, to point to
>>>> /dev/mpath,
>>>>
>>>> <pool type='iscsi'> <name>virtimages</name> <source> <host
>>>> name="iscsi.example.com"/> <device path="demo-target"/> </source>
>>>> <target> <path>/dev/mpath</path> </target> </pool>
>>>>
>>>> and have it give us back the unique multipath enabled LUNs, instead
>>>> of each individual block device.
>>> The problem with this approach is that dm devices are not SCSI devices,
>>> so putting them in a SCSI pool seems wrong.  iSCSI pools have always
>>> contained volumes which are iSCSI block devices, directory pools have
>>> always had volumes which are files.  We shouldn't break that assumption
>>> unless we have a good reason.  It's not impossible to do what you
>>> describe, but I don't understand why it's a benefit.
>>
>> What is a SCSI device though ? Under Linux these days everything appears
>> to be a SCIS device whether it is SCSI or not, eg PATA, SATA, USB. So
>> there can be no assumption that a SCSI HBA pool gives you SCSI devices.
>> If an application using a pool expects volumes to have particular
>> SCSI capabilities (peristent reservations for example), then the only
>> way is for it to query the device, or try the capability it wants and
>> handle failure. The best libvirt can guarentee is that a SCSI, disk,
>> iSCSI  & logical pools will give back block devices,  while fs / netfs
>> pools will give back plain files.
>> The one downside I realize with my suggestion here, is that a single
>> multipath device may have many paths, and each path may go via a
>> separate HBA, which would mean separate SCSI pool. So in fact I think
>> we shouldn't expose multipath in normal SCSI pools after all :-)
> 
> Agreed, let's keep the existing pools the way they are.
> 
>> I'm still inclined to think we can do the 'list' operation in node device
>> APis though
> 
> Again, I think using the node device APIs as the only support for 
> multipath devices is contrary to how we're leading people to believe 
> storage should be managed with libvirt.
> 
>>>>> The target element is ignored, as it is by the disk pool, but the
>>>>> config code rejects the XML if it does not exist.  That behavior
>>>>> should obviously be cleaned up, but I think that should be done in
>>>>> a separate patch, as it's really a bug in the config code, not
>>>>> related to the addition of the new pool type.
>>>> The target element is not ignored by the disk pool. This is used to
>>>> form the stable device paths via virStorageBackendStablePath() for
>>>> all block device based pools.
>>> Hmm--on my system the path I specify shows up in the pool XML, but is
>>> unused as far as I can tell.  I can hand it something totally bogus and
>>> it doesn't complain.  I think your next point is very good, though, so
>>> I'll make the target element meaningful in the multipath case and we can
>>> investigate the disk behavior separately.
>>
>> Normally a disk pool will give you back volumes whose path name
>> is /dev/sdXX. If you give the pool a target path if /dev/disk/by-uuid
>> then the volumes will get paths like 
>> /dev/disk/by-uuid/b0509f5a-2824-4090-9da2-d0f0ff4ace0e
>> Since it is possible that some volumes may not have stable paths
>> though, we fall back to /dev/sdXXX if one can't be formed.
>>
>> We should probably explicitly reject bogus target paths which don't
>> even exist on disk though. Only allow targets under /dev, where the
>> given target exists
> 
> That sounds good.
> 
> Dave

Dan,

Ping, what are your thoughts on this stuff?

Dave




More information about the libvir-list mailing list