[PATCH] libvirt: add memory failure event

Peter Krempa pkrempa at redhat.com
Mon Oct 12 08:00:25 UTC 2020


On Sat, Oct 10, 2020 at 14:56:43 +0800, zhenwei pi wrote:
> Since QEMU 5.2 (commit-77b285f7f6), QEMU supports 'memory failure'
> event, posts event to monitor if hitting a hardware memory error.

I've noticed that you've introduced this to qemu. Is there a possibility
that the event could return more data? Current design of the libvirt
event you are proposing is not extensible and thus if you expect to add
anythingin the future the design will need to change to e.g. use
virTypedParameter or something like that.

Additionally could you please elaborate how this event is supposed to be
used? I didn't really get it from the commit message of the qemu commit.

> Several changes in this patch:
>   Add a new event 'memory failure' for libvirt domain.
>   Implement memory failure event handling for QEMU from QMP.
>   Also implement virsh command callback functions.

See below. We don't like to see "all-in-one" patches.

> 
> Test case:
> ~# virsh event stretch --event memory-failure
> event 'memory-failure' for domain stretch:
> recipient: guest
> action: inject
> flags:
>         action required: 0
>         recursive: 0
> events received: 1

This doesn't say how you trigger the error for testing.


> 
> Signed-off-by: zhenwei pi <pizhenwei at bytedance.com>
> ---

We require that changes are split into sensible smaller patches:

>  examples/c/misc/event-test.c        | 17 ++++++++
>  tools/virsh-domain.c                | 37 ++++++++++++++++

Virsh and client changes must be separata

>  include/libvirt/libvirt-domain.h    | 84 +++++++++++++++++++++++++++++++++++++
>  src/conf/domain_event.c             | 82 ++++++++++++++++++++++++++++++++++++
>  src/conf/domain_event.h             | 12 ++++++
>  src/remote/remote_daemon_dispatch.c | 33 +++++++++++++++
>  src/remote/remote_driver.c          | 35 ++++++++++++++++
>  src/remote/remote_protocol.x        | 21 +++++++++-
>  src/remote_protocol-structs         | 12 ++++++

Public API must be separate

>  src/libvirt_private.syms            |  2 +
>  src/qemu/qemu_domain.c              |  1 +
>  src/qemu/qemu_domain.h              |  1 +
>  src/qemu/qemu_driver.c              | 57 +++++++++++++++++++++++++
>  src/qemu/qemu_process.c             | 28 +++++++++++++

qemu impl then goes in separately.

>  src/qemu/qemu_monitor.c             | 21 +++++++++-
>  src/qemu/qemu_monitor.h             | 39 +++++++++++++++++
>  src/qemu/qemu_monitor_json.c        | 50 ++++++++++++++++++++++

Monitor usually is also separated, although not required.

>  17 files changed, 530 insertions(+), 2 deletions(-)

Please note that the tree _must_ compile after every single patch so
make sure that they are in sane order and contain appropriate changes.


A brief review follows, the patch is rather massive so I might overlook
some things:

> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
> index 77f9116675..a9170d9a7e 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -3196,6 +3196,66 @@ typedef enum {

[...]

> +typedef struct _virDomainMemoryFailureFlags virDomainMemoryFailureFlags;
> +typedef virDomainMemoryFailureFlags *virDomainMemoryFailureFlagsPtr;

Usually types ending in Flags are enums in our code base. Additionally
this type is not used externally. I'd go with ...FailureProps, or drop
it completely and pass the "flags" in as arguments of the callback as
you won't be able to extend it this way.

> +struct _virDomainMemoryFailureFlags {
> +    /* whether a memory failure event is action-required or action-optional
> +     * (e.g. a failure during memory scrub). */
> +    int action_required;
> +
> +    /* whether the failure occurred while the previous failure was still in
> +     * progress. */
> +    int recursive;

Note that public structs are not considered extensible in our API as it
would break remote protocol and the ABI of the library, so this can't be
used as means to extend the event in the future.


[...]

> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index 8ef812cd94..aecd947836 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -4292,6 +4292,59 @@ processGuestCrashloadedEvent(virQEMUDriverPtr driver,
>  }
>  
>  
> +static void
> +processMemoryFailureEvent(virQEMUDriverPtr driver,
> +                          virDomainObjPtr vm,
> +                          qemuMonitorEventMemoryFailurePtr mfp)
> +{
> +    virObjectEventPtr event = NULL;
> +    virDomainMemoryFailureRecipientType recipient;
> +    virDomainMemoryFailureActionType action;
> +    virDomainMemoryFailureFlags flags;
> +
> +    switch (mfp->recipient) {
> +    case QEMU_MONITOR_MEMORY_FAILURE_RECIPIENT_HYPERVISOR:
> +        recipient = VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_HYPERVISOR;
> +        break;
> +    case QEMU_MONITOR_MEMORY_FAILURE_RECIPIENT_GUEST:
> +        recipient = VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_GUEST;
> +        break;
> +    case QEMU_MONITOR_MEMORY_FAILURE_RECIPIENT_LAST:
> +    default:
> +        virReportError(VIR_ERR_INVALID_ARG, "%s",
> +                       _("requested unknown memory failure recipient"));
> +        return;
> +    }
> +
> +    switch (mfp->action) {
> +    case QEMU_MONITOR_MEMORY_FAILURE_ACTION_IGNORE:
> +        action = VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_IGNORE;
> +        break;
> +    case QEMU_MONITOR_MEMORY_FAILURE_ACTION_INJECT:
> +        action = VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_INJECT;
> +        break;
> +    case QEMU_MONITOR_MEMORY_FAILURE_ACTION_FATAL:
> +        action = VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_FATAL;
> +        break;
> +    case QEMU_MONITOR_MEMORY_FAILURE_ACTION_RESET:
> +        action = VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_RESET;
> +        break;
> +    case QEMU_MONITOR_MEMORY_FAILURE_ACTION_LAST:
> +    default:
> +        virReportError(VIR_ERR_INVALID_ARG, "%s",
> +                       _("requested unknown memory failure action"));
> +        return;
> +    }
> +
> +    flags.action_required = mfp->action_required;
> +    flags.recursive = mfp->recursive;
> +    event = virDomainEventMemoryFailureNewFromObj(vm, recipient, action,
> +                                                  &flags);
> +
> +    virObjectEventStateQueue(driver->domainEventState, event);

So all this function does is translation from the qemu monitor flags to
the public API flags. See below ...


> diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
> index 8c991fefbb..189b789bb8 100644
> --- a/src/qemu/qemu_monitor.c
> +++ b/src/qemu/qemu_monitor.c
> @@ -159,7 +159,6 @@ static int qemuMonitorOnceInit(void)
>  
>  VIR_ONCE_GLOBAL_INIT(qemuMonitor);
>  
> -

Irrelevalnt whitespace change.

>  VIR_ENUM_IMPL(qemuMonitorMigrationStatus,
>                QEMU_MONITOR_MIGRATION_STATUS_LAST,
>                "inactive", "setup",

[...]


> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 6b5de29fdb..abcbab0f06 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -1878,6 +1878,33 @@ qemuProcessHandleGuestCrashloaded(qemuMonitorPtr mon G_GNUC_UNUSED,
>  }
>  
>  
> +static int
> +qemuProcessHandleMemoryFailure(qemuMonitorPtr mon G_GNUC_UNUSED,
> +                               virDomainObjPtr vm,
> +                               qemuMonitorEventMemoryFailurePtr mfp,
> +                               void *opaque)
> +{
> +    virQEMUDriverPtr driver = opaque;
> +    struct qemuProcessEvent *processEvent;
> +
> +    virObjectLock(vm);
> +    processEvent = g_new0(struct qemuProcessEvent, 1);
> +
> +    processEvent->eventType = QEMU_PROCESS_EVENT_MEMORY_FAILURE;
> +    processEvent->data = mfp;
> +    processEvent->vm = virObjectRef(vm);
> +
> +    if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) {
> +        virObjectUnref(vm);
> +        qemuProcessEventFree(processEvent);

Looking at the code for the function handling QEMU_PROCESS_EVENT_MEMORY_FAILURE
in another thread I didn't see anything that would require a domain job,
this means that handling this event via the processing thread actually
isn't needed and could be done directly here.


[...]

> diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x
> index f4d6147676..a3fda24807 100644
> --- a/src/remote/remote_protocol.x
> +++ b/src/remote/remote_protocol.x
> @@ -3469,6 +3469,19 @@ struct remote_domain_event_callback_metadata_change_msg {
>      remote_string nsuri;
>  };
>  
> +struct remote_domain_event_memory_failure_flags {
> +    int action_required;
> +    int recursive;
> +};
> +
> +struct remote_domain_event_memory_failure_msg {
> +    int callbackID;
> +    remote_nonnull_domain dom;
> +    int recipient;
> +    int action;
> +    remote_domain_event_memory_failure_flags flags;

As noted above, none of this can be changed in the future.

> +};
> +
>  struct remote_connect_secret_event_register_any_args {
>      int eventID;
>      remote_secret secret;
> @@ -6668,5 +6681,11 @@ enum remote_procedure {
>       * @priority: high
>       * @acl: domain:read
>       */
> -    REMOTE_PROC_DOMAIN_BACKUP_GET_XML_DESC = 422
> +    REMOTE_PROC_DOMAIN_BACKUP_GET_XML_DESC = 422,
> +
> +    /**
> +     * @generate: both
> +     * @acl: none
> +     */
> +    REMOTE_PROC_DOMAIN_EVENT_MEMORY_FAILURE = 423
>  };

[...]

> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> index 8f11393197..7c6b19a54b 100644
> --- a/tools/virsh-domain.c
> +++ b/tools/virsh-domain.c
> @@ -13590,6 +13590,41 @@ virshEventBlockThresholdPrint(virConnectPtr conn G_GNUC_UNUSED,
>  }
>  
>  
> +VIR_ENUM_DECL(virshEventMemoryFailureRecipientType);
> +VIR_ENUM_IMPL(virshEventMemoryFailureRecipientType,
> +              VIR_DOMAIN_EVENT_MEMORY_FAILURE_RECIPIENT_LAST,
> +              N_("hypervisor"),
> +              N_("guest"));
> +
> +VIR_ENUM_DECL(virshEventMemoryFailureActionType);
> +VIR_ENUM_IMPL(virshEventMemoryFailureActionType,
> +              VIR_DOMAIN_EVENT_MEMORY_FAILURE_ACTION_LAST,
> +              N_("ignore"),
> +              N_("inject"),
> +              N_("fatal"),
> +              N_("reset"));
> +
> +static void
> +virshEventMemoryFailurePrint(virConnectPtr conn G_GNUC_UNUSED,
> +                             virDomainPtr dom,
> +                             virDomainMemoryFailureRecipientType recipient,
> +                             virDomainMemoryFailureActionType action,
> +                             virDomainMemoryFailureFlagsPtr flags,
> +                             void *opaque)
> +{
> +    g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER;
> +
> +    virBufferAsprintf(&buf, _("event 'memory-failure' for domain %s:\n"
> +                              "recipient: %s\naction: %s\nflags:\n"
> +                              "\taction required: %d\n\trecursive: %d\n"),
> +                      virDomainGetName(dom),
> +                      UNKNOWNSTR(virshEventMemoryFailureRecipientTypeTypeToString(recipient)),
> +                      UNKNOWNSTR(virshEventMemoryFailureActionTypeTypeToString(action)),
> +                      !!(flags->action_required), !!(flags->recursive));

Ideally split this into multiple virBufferAsprintf calls.





More information about the libvir-list mailing list