[libvirt] Re: [PATCH/RFC] kvm/qemu vs libvirtd restarts

Daniel P. Berrange berrange at redhat.com
Tue Oct 7 16:37:52 UTC 2008


Moving thread to libvir-list....

On Tue, Oct 07, 2008 at 06:12:33PM +0200, Guido G?nther wrote:
> On Fri, Sep 26, 2008 at 04:47:05PM +0100, Daniel P. Berrange wrote:
> > On Fri, Sep 26, 2008 at 05:41:02PM +0200, Guido G?nther wrote:
> > 
> > We've got this problem sorted in the 'lxc' driver and need to apply
> > the same approach to the QEMU driver. The problem was always finding
> > the Pseduo-TTY/PID for the monitor after restart, and associating the
> > live config with it. We've "solved" that in LXC driver by savingt the
> > live XML & PID to /var/run/libvirt/lxc/. It basically needs the same
> > approach to be applied to the QEMU daemons. In addition the DNSmasq
> > deamon needs adapting for simiarl reasons.
>
> Attached is a very early patch. It keeps the qemu instances running and
> writes out their states upon shutdown. Reconnecting (and therefore sending
> monitor commands, etc.) works but since stdout/stderr don't get picked
> up again yet we don't handle virEventAddhandle. I'll grab them from
> /proc/<pid>/fd/{1,2} later. Network state is missing too, but since this
> is moving into a separate file it's becoming a different matter anyway.

First off, thanks for working on this! A few general comments, and then
I'll put rest inline..

For stdout/err, it need to be setup so that the logfile is dup()'d onto
the QEMU process' file descriptors directly, and get libvirt outof the
business of I/O forwarding. This avoids any complications with re-attaching
to stdout/err, and loosing any log data while libvirt isn't running.

This also needs to be implemented with the assumption that the daemon
can & will crash any time. So putting the state-saving hooks into the
qemudShutdown() method won't be sufficient. We need to write out the
state we need when initially starting the VM, and then update it any
time we hot-plug/unplug devices.

Ignoring network state is fine - we can address that separately..

> Subject: [PATCH] let qemu/kvm survive libvirtd restarts
> 
> ---
>  src/domain_conf.c |    1 +
>  src/domain_conf.h |    1 +
>  src/qemu_conf.h   |    1 +
>  src/qemu_driver.c |  264 +++++++++++++++++++++++++++++++++++++++++++++++++----
>  src/util.c        |    4 +-
>  src/util.h        |    1 +
>  6 files changed, 254 insertions(+), 18 deletions(-)
> 
> diff --git a/src/domain_conf.c b/src/domain_conf.c
> index 6a35064..aa102f6 100644
> --- a/src/domain_conf.c
> +++ b/src/domain_conf.c
> @@ -420,6 +420,7 @@ void virDomainObjFree(virDomainObjPtr dom)
>      virDomainDefFree(dom->def);
>      virDomainDefFree(dom->newDef);
>  
> +    VIR_FREE(dom->monitorpath);
>      VIR_FREE(dom->vcpupids);
>  
>      VIR_FREE(dom);
> diff --git a/src/domain_conf.h b/src/domain_conf.h
> index 632e5ad..1ac1561 100644
> --- a/src/domain_conf.h
> +++ b/src/domain_conf.h
> @@ -448,6 +448,7 @@ struct _virDomainObj {
>      int stdout_fd;
>      int stderr_fd;
>      int monitor;
> +    char *monitorpath;

I think we can avoid needing this. If we write out the staste the
moment the VM is started, we don't need to carry this around in
memory. Alternatively, since we're writing all stdout/err data to
the logfile, we could simply re-parse the log to extract the 
monitor path upon restart.

>      int logfile;
>      int pid;
>      int state;
> diff --git a/src/qemu_conf.h b/src/qemu_conf.h
> index 88dfade..f47582e 100644
> --- a/src/qemu_conf.h
> +++ b/src/qemu_conf.h
> @@ -62,6 +62,7 @@ struct qemud_driver {
>      char *networkConfigDir;
>      char *networkAutostartDir;
>      char *logDir;
> +    char *stateDir;
>      unsigned int vncTLS : 1;
>      unsigned int vncTLSx509verify : 1;
>      char *vncTLSx509certdir;
> diff --git a/src/qemu_driver.c b/src/qemu_driver.c
> index 806608d..87789a9 100644
> --- a/src/qemu_driver.c
> +++ b/src/qemu_driver.c
> @@ -172,6 +172,181 @@ void qemudAutostartConfigs(struct qemud_driver *driver) {
>  }
>  
>  #ifdef WITH_LIBVIRTD
> +
> +static int
> +qemudFileWriteMonitor(const char *dir,
> +                      const char *name,
> +                      const char *path)
> +{
> +    int rc;
> +    int fd;
> +    FILE *file = NULL;
> +    char *monitorfile = NULL;
> +
> +    if ((rc = virFileMakePath(dir)))
> +        goto cleanup;
> +
> +    if (asprintf(&monitorfile, "%s/%s.monitor", dir, name) < 0) {
> +        rc = ENOMEM;
> +        goto cleanup;
> +    }
> +
> +    if ((fd = open(monitorfile,
> +                   O_WRONLY | O_CREAT | O_TRUNC,
> +                   S_IRUSR | S_IWUSR)) < 0) {
> +        rc = errno;
> +        goto cleanup;
> +    }
> +
> +    if (!(file = fdopen(fd, "w"))) {
> +        rc = errno;
> +        close(fd);
> +        goto cleanup;
> +    }
> +
> +    if (fprintf(file, "%s", path) < 0) {
> +        rc = errno;
> +        goto cleanup;
> +    }
> +
> +    rc = 0;
> +
> +cleanup:
> +    if (file &&
> +        fclose(file) < 0) {
> +        rc = errno;
> +    }
> +
> +    VIR_FREE(monitorfile);
> +    return rc;
> +}
> +
> +static int 
> +qemudFileReadMonitor(const char *dir,
> +                     const char *name,
> +                     char **path)
> +{
> +    int rc;
> +    FILE *file;
> +    char *monitorfile = NULL;
> +
> +    if (asprintf(&monitorfile, "%s/%s.monitor", dir, name) < 0) {
> +        rc = ENOMEM;
> +        goto cleanup;
> +    }
> +
> +    if (!(file = fopen(monitorfile, "r"))) {
> +        rc = errno;
> +        goto cleanup;
> +    }
> +
> +    if (fscanf(file, "%as", path) != 1) {
> +        rc = EINVAL;
> +        goto cleanup;
> +    }
> +
> +    if (fclose(file) < 0) {
> +        rc = errno;
> +        goto cleanup;
> +    }
> +
> +    rc = 0;
> +
> + cleanup:
> +    VIR_FREE(monitorfile);
> +    return rc;
> +}

If we re-parse the logfile to extract the monitiro PTY path, this is 
unneccessary. If not, this can be simplied by just calling the 
convenient  virFileReadAll() method.

> +
> +static int 
> +qemudFileDeleteMonitor(const char *dir,
> +                       const char *name)
> +{
> +    int rc = 0;
> +    char *pidfile = NULL;
> +
> +    if (asprintf(&pidfile, "%s/%s.monitor", dir, name) < 0) {
> +        rc = errno;
> +        goto cleanup;
> +    }
> +
> +    if (unlink(pidfile) < 0 && errno != ENOENT)
> +        rc = errno;
> +
> +cleanup:
> +    VIR_FREE(pidfile);
> +    return rc;
> +}
> +
> +
> +static int qemudOpenMonitor(virConnectPtr conn,
> +                            struct qemud_driver *driver,
> +                            virDomainObjPtr vm,
> +                            const char *monitor, 
> +                            int reconnect);
> +/** 
> + * qemudReconnectVMs
> + *
> + * Reconnect running vms to the daemon process
> + */
> +static int
> +qemudReconnectVMs(struct qemud_driver *driver)
> +{
> +    virDomainObjPtr vm = driver->domains;
> +
> +    while (vm) {
> +        virDomainDefPtr tmp;
> +        char *config = NULL;
> +        char *monitor = NULL;
> +        int rc;
> +
> +        /* Read pid */
> +        if ((rc = virFileReadPid(driver->stateDir, vm->def->name, &vm->pid)) == 0) {
> +            virFileDeletePid(driver->stateDir, vm->def->name);
> +            DEBUG("Found pid %d for '%s'", vm->pid, vm->def->name);
> +        } else
> +            goto next;
> +
> +        if ((config = virDomainConfigFile(NULL,
> +                                          driver->stateDir,
> +                                          vm->def->name)) == NULL) {
> +            qemudLog(QEMUD_ERR, _("Failed to read back domain config for %s\n"),
> +                     vm->def->name);
> +            goto next;
> +        }
> +
> +        /* Try and load the config */
> +        tmp = virDomainDefParseFile(NULL, driver->caps, config);
> +        unlink(config);
> +        if (tmp) {
> +            vm->newDef = vm->def;
> +            vm->def = tmp;
> +        }
> +
> +        /* read back monitor path */
> +        if ((rc = qemudFileReadMonitor(driver->stateDir, vm->def->name, &monitor)) != 0) {
> +            qemudLog(QEMUD_ERR, _("Failed to read back monitor path for %s: %s\n"),
> +                     vm->def->name, strerror(rc));
> +            goto next;
> +        }
> +        qemudFileDeleteMonitor(driver->stateDir, vm->def->name);
> +
> +        /* FIXME: reconnect stdout/stderr via /proc/pid/fd/ */
> +
> +        if ((rc = qemudOpenMonitor(NULL, driver, vm, monitor, 1)) != 0) {
> +            qemudLog(QEMUD_ERR, _("Failed to reconnect to monitor for %s: %d\n"),
> +                     vm->def->name, rc);
> +            goto next;
> +        }
> +        vm->def->id = driver->nextvmid++;

The ID of a vm must never change for until it is rebooted. Simply
don't overwrite the existing vm->def->id that we jjust loaded off
disk. And instead adjust the nextvmid field, eg

    if (vm->def->id >= driver->nextvmid)
      driver->nextvmid = vm->def->id + 1;


> +        vm->state = VIR_DOMAIN_RUNNING;
> +next:
> +        VIR_FREE(config);
> +        VIR_FREE(monitor);
> +        vm = vm->next;
> +    }
> +    return 0;
> +}
> +
>  /**
>   * qemudStartup:
>   *
> @@ -197,6 +372,10 @@ qemudStartup(void) {
>  
>          if ((base = strdup (SYSCONF_DIR "/libvirt")) == NULL)
>              goto out_of_memory;
> +
> +        if (asprintf (&qemu_driver->stateDir, 
> +                      "%s/run/libvirt/qemu/", LOCAL_STATE_DIR) == -1)
> +            goto out_of_memory;
>      } else {
>          if (!(pw = getpwuid(uid))) {
>              qemudLog(QEMUD_ERR, _("Failed to find user record for uid '%d': %s\n"),
> @@ -210,6 +389,10 @@ qemudStartup(void) {
>  
>          if (asprintf (&base, "%s/.libvirt", pw->pw_dir) == -1)
>              goto out_of_memory;
> +
> +        if (asprintf (&qemu_driver->stateDir,
> +                      "%s/run/libvirt/qemu/", base) == -1)
> +            goto out_of_memory;
>      }
>  
>      /* Configuration paths are either ~/.libvirt/qemu/... (session) or
> @@ -250,6 +433,8 @@ qemudStartup(void) {
>          qemudShutdown();
>          return -1;
>      }
> +    qemudReconnectVMs(qemu_driver);
> +
>      if (virNetworkLoadAllConfigs(NULL,
>                                   &qemu_driver->networks,
>                                   qemu_driver->networkConfigDir,
> @@ -329,6 +514,34 @@ qemudActive(void) {
>      return 0;
>  }
>  
> +/** 
> + * qemudSaveDomainState
> + *
> + * Save the full state of a running domain to statedir
> + *
> + * Returns 0 on success
> + */
> +static int
> +qemudSaveDomainState(struct qemud_driver * driver, virDomainObjPtr vm) {
> +    int ret;
> +
> +    if ((ret = virFileWritePid(driver->stateDir, vm->def->name, vm->pid)) != 0) {
> +        qemudLog(QEMUD_ERR, _("Unable to safe pidfile for %s: %s\n"),
> +                            vm->def->name, strerror(ret));
> +         return ret;
> +    }
> +    if ((ret = virDomainSaveConfig(NULL, driver->stateDir, vm->def))) {
> +        qemudLog(QEMUD_ERR, _("Unable to save domain %s\n"), vm->def->name);
> +        return ret;
> +    }
> +    if ((ret = qemudFileWriteMonitor(driver->stateDir, vm->def->name, vm->monitorpath)) != 0) {
> +        qemudLog(QEMUD_ERR, _("Unable to monitor file for %s: %s\n"),
> +                            vm->def->name, strerror(ret));
> +        return ret;
> +    }
> +    return 0;
> +}
> +

This will need to be called at time of VM creation, and the
XSL config will need updating whether live config changes.

>  /**
>   * qemudShutdown:
>   *
> @@ -349,10 +562,14 @@ qemudShutdown(void) {
>      while (vm) {
>          virDomainObjPtr next = vm->next;
>          if (virDomainIsActive(vm))
> +#if 1
> +            qemudSaveDomainState(qemu_driver, vm);
> +#else
>              qemudShutdownVMDaemon(NULL, qemu_driver, vm);
>          if (!vm->persistent)
>              virDomainRemoveInactive(&qemu_driver->domains,
>                                      vm);
> +#endif
>          vm = next;
>      }
>  
> @@ -389,6 +606,7 @@ qemudShutdown(void) {
>      VIR_FREE(qemu_driver->networkConfigDir);
>      VIR_FREE(qemu_driver->networkAutostartDir);
>      VIR_FREE(qemu_driver->vncTLSx509certdir);
> +    VIR_FREE(qemu_driver->stateDir);
>  
>      if (qemu_driver->brctl)
>          brShutdown(qemu_driver->brctl);
> @@ -499,7 +717,7 @@ qemudCheckMonitorPrompt(virConnectPtr conn ATTRIBUTE_UNUSED,
>  static int qemudOpenMonitor(virConnectPtr conn,
>                              struct qemud_driver *driver,
>                              virDomainObjPtr vm,
> -                            const char *monitor) {
> +                            const char *monitor, int reconnect) {
>      int monfd;
>      char buf[1024];
>      int ret = -1;
> @@ -520,11 +738,26 @@ static int qemudOpenMonitor(virConnectPtr conn,
>          goto error;
>      }
>  
> -    ret = qemudReadMonitorOutput(conn,
> -                                 driver, vm, monfd,
> -                                 buf, sizeof(buf),
> -                                 qemudCheckMonitorPrompt,
> -                                 "monitor");
> +    if (!reconnect) {
> +        ret = qemudReadMonitorOutput(conn,
> +                                     driver, vm, monfd,
> +                                     buf, sizeof(buf),
> +                                     qemudCheckMonitorPrompt,
> +                                     "monitor");
> +
> +    } else {
> +        vm->monitor = monfd;
> +        ret = 0;
> +    }
> +
> +    if (ret != 0)
> +         goto error;
> +
> +    if (!(vm->monitorpath = strdup(monitor))) {
> +        qemudReportError(conn, NULL, NULL, VIR_ERR_NO_MEMORY,
> +                         "%s", _("failed to allocate space for monitor path"));
> +        goto error;
> +    }
>  
>      /* Keep monitor open upon success */
>      if (ret == 0)
> @@ -623,7 +856,7 @@ qemudFindCharDevicePTYs(virConnectPtr conn,
>      }
>  
>      /* Got them all, so now open the monitor console */
> -    ret = qemudOpenMonitor(conn, driver, vm, monitor);
> +    ret = qemudOpenMonitor(conn, driver, vm, monitor, 0);
>  
>  cleanup:
>      VIR_FREE(monitor);
> @@ -966,12 +1199,11 @@ static int qemudStartVMDaemon(virConnectPtr conn,
>  
>      ret = virExec(conn, argv, NULL, &keepfd, &vm->pid,
>                    vm->stdin_fd, &vm->stdout_fd, &vm->stderr_fd,
> -                  VIR_EXEC_NONBLOCK);
> +                  VIR_EXEC_NONBLOCK | VIR_EXEC_SETSID);
>      if (ret == 0) {
>          vm->def->id = driver->nextvmid++;
>          vm->state = migrateFrom ? VIR_DOMAIN_PAUSED : VIR_DOMAIN_RUNNING;
>      }
> -
>      for (i = 0 ; argv[i] ; i++)
>          VIR_FREE(argv[i]);
>      VIR_FREE(argv);
> @@ -1037,7 +1269,9 @@ static void qemudShutdownVMDaemon(virConnectPtr conn ATTRIBUTE_UNUSED,
>  
>      qemudLog(QEMUD_INFO, _("Shutting down VM '%s'\n"), vm->def->name);
>  
> -    kill(vm->pid, SIGTERM);
> +    if (kill(vm->pid, SIGTERM) < 0)
> +        qemudLog(QEMUD_ERROR, _("Failed to send SIGTERM to %s (%d): %s\n"),
> +                 vm->def->name, vm->pid, strerror(errno));
>  
>      qemudVMData(driver, vm, vm->stdout_fd);
>      qemudVMData(driver, vm, vm->stderr_fd);
> @@ -1057,13 +1291,9 @@ static void qemudShutdownVMDaemon(virConnectPtr conn ATTRIBUTE_UNUSED,
>      vm->stderr_fd = -1;
>      vm->monitor = -1;
>  
> -    if (waitpid(vm->pid, NULL, WNOHANG) != vm->pid) {
> -        kill(vm->pid, SIGKILL);
> -        if (waitpid(vm->pid, NULL, 0) != vm->pid) {
> -            qemudLog(QEMUD_WARN,
> -                     "%s", _("Got unexpected pid, damn\n"));
> -        }
> -    }
> +    /* shut if off for sure */
> +    kill(vm->pid, SIGKILL);
> +    virFileDeletePid(driver->stateDir, vm->def->name);
>  
>      vm->pid = -1;
>      vm->def->id = -1;
> diff --git a/src/util.c b/src/util.c
> index ca14be1..442c810 100644
> --- a/src/util.c
> +++ b/src/util.c
> @@ -299,14 +299,16 @@ virExec(virConnectPtr conn,
>               !FD_ISSET(i, keepfd)))
>              close(i);
>  
> -    if (flags & VIR_EXEC_DAEMON) {
> +    if (flags & VIR_EXEC_DAEMON || flags & VIR_EXEC_SETSID) {
>          if (setsid() < 0) {
>              ReportError(conn, VIR_ERR_INTERNAL_ERROR,
>                          _("cannot become session leader: %s"),
>                          strerror(errno));
>              _exit(1);
>          }
> +    }
>  
> +    if (flags & VIR_EXEC_DAEMON) {
>          if (chdir("/") < 0) {
>              ReportError(conn, VIR_ERR_INTERNAL_ERROR,
>                          _("cannot change to root directory: %s"),
> diff --git a/src/util.h b/src/util.h
> index 093ef46..8fbe2cd 100644
> --- a/src/util.h
> +++ b/src/util.h
> @@ -32,6 +32,7 @@ enum {
>      VIR_EXEC_NONE   = 0,
>      VIR_EXEC_NONBLOCK = (1 << 0),
>      VIR_EXEC_DAEMON = (1 << 1),
> +    VIR_EXEC_SETSID = (1 << 2),
>  };

Shouldn't we simply be starting all the QEMU VMs with VIR_EXEC_DAEMON
so they totally disassociate themselves from libvirtd right away. They
will be disassociated anyway if the libvirtd is ever restarted, so best
to daemonize from time they are started, to avoid any surprising changes
in behaviour


Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list