[libvirt] [PATCH REPOST v2 0/3] Add callback mech for qemu and nodedev

John Ferlan jferlan at redhat.com
Wed Feb 22 03:45:05 UTC 2017

On 02/21/2017 12:03 PM, Daniel P. Berrange wrote:
> On Tue, Feb 21, 2017 at 11:33:25AM -0500, John Ferlan wrote:
>> Repost: http://www.redhat.com/archives/libvir-list/2017-February/msg00501.html
>> to update to top of branch as of commit id '5ad03b9db2'
> BTW, could you include the full cover letter in each new version rather
> than making people follow links all the way back to v1 to find info
> about the patch series goals.

OK - I'll try to remember.

> IIUC, the intention here is that we automatically create NPIV devices
> when starting guests and delete them when stopping guests. I can see
> some appeal in this, but at the same time I'm not convinced we should
> add such a feature.

A bit more than that - create the vHBA and assign the LUN's to the guest
as they are discovered and remove them as they are removed (events from
udev). This is a mechanism/idea from Paolo. The RHV team would be the
primary consumer and IIRC they don't use storage pools.

> AFAICT, the node device APIs already allow a management application to
> achieve the same end goal without needing this integration. Yes, it
> would simplify usage of NPIV on the surface, but the cost of doing this
> is that it ends a specific usage policy for NPIV in the libvirt code and
> makes error handling harder. In particular it is possible to get into a
> situation where a VM fails to start and we're also unable to clear up
> the NPIV device we just auto-created. Now this could be said to apply
> to pretty much everything we do during guest startup, but in most cases
> the failure is harmless or gets auto-cleaned up by the kernel (ie the
> tap devices get auto-deleted when the FD is closed, or SELinux labels
> get reset next time a VM wants that file, locks are released when we
> close the virtlockd file handle, etc).   NPIV is significantly more
> complicated and more likely to hit failure scenarios due to fact that
> it involves interactions with off-node hardware resources.

I agree with your points. The "purpose" of libvirt taking care of it
would be to let libvirt handle all those nasty and odd failure or
integration issues - including migration.  Of course from a libvirt
perspective, I'd rather take the 'scsi_hostX' vHBA and just pass that
through to QEMU directly to allow it (or the guest) to find the LUN's,
but that's push the problem the other way.

I said early on that this is something that could be done by the upper
layers that would be able to receive the add/remove lun events whether
they created a storage pool just for that purpose or they created the
vHBA themselves. It's probably even in the bz's on this.

> Is there some aspect of NPIV mgmt that can only be achieved if libvirt
> is explicitly managing the device lifecycle during VM start/stop, as
> opposed to having the mgmt app manage it ?

Beyond the upper layers not needing to handle anything other than
creating the vHBA for the domain and letting libvirt handle the rest.

> If OpenStack were to provide NPIV support I think it'd probably end
> up dealing with device setup explicitly via the node device APIs
> rather than relying on libvirt to create/delete them. That way it
> can track the lifecycle of NPIV devices explicitly, and if it is not
> possible to delete them at time of QEMU shutdown for some reason, it
> can easily arrange to delete them later.
> Overall I think one of the more successful aspects of libvirt's design
> has been the way we minimise the addition of usage policy decisions, in
> favour of providing mechanisms that applications can use to implement
> policies. This has had a cost in that applications need todo more work
> themselves, but on balance I still think it is a win to avoid adding
> policy driven features to libvirt.
> A key question is just where "autocreation/delete of NPIV devices" falls
> in the line between mechanism & policy, since the line is not entirely
> black & white. I tend towards it being policy though, since it is just
> providing a less general purpose way todo something that can be achieved
> already via the node device APIs.
> Regards,
> Daniel

I understand - to a degree I guess I had assumed some of these type
discussions had taken place by those that wanted the feature added.

One other good thing that's come out of these changes is a bit more
testing for vHBA creation via nodedev/storage pool and quite a bit of
code cleanup once/if most of the patches I posted earlier in the week
are accepted.


(FWIW: I'll have limited access to email over the next couple of days...)

More information about the libvir-list mailing list