[Libguestfs] [libguestfs PATCH 7/7] lib/launch-direct: support networking with passt

Richard W.M. Jones rjones at redhat.com
Fri Jul 14 09:53:45 UTC 2023


On Thu, Jul 13, 2023 at 07:10:52PM +0200, Laszlo Ersek wrote:
> On QEMU 7.2.0+, if "passt" is available, ask QEMU for passt ("stream")
> rather than SLIRP ("user") networking.
> 
> For this, we need to run passt ourselves. Given that passt daemonizes by
> default, start it with our traditional function guestfs_int_cmd_run(). Ask
> passt to save its PID file, because in case something goes wrong before
> we're completely sure the appliance (i.e. QEMU) is up and running, we'll
> need to kill passt, the *grandchild*, ourselves.
> 
> Pass "--one-off" to passt (same as libvirt). This way, once we have proof
> that QEMU has connected to passt (because the appliance shows signs of
> life), we need not clean up passt ourselves -- once QEMU exits, passt will
> see an EOF on the unix domain socket, and exit as well.
> 
> Passt is way more flexible than SLIRP, and passt normally intends to
> imitate the host environment in the guest as much as possible. This means
> that, when switching from SLIRP to passt, the guest would see changes to
> the following:
> 
> - guest IP address,
> 
> - guest subnet mask,
> 
> - host (= gateway) IP address,
> 
> - host (= gateway) MAC address.
> 
> Extract the SLIRP defaults into the new macros NETWORK_GW_IP and
> NETWORK_GW_MAC, and pass them explicitly to passt. In particular,
> "tests/rsync/test-rsync.sh" fails without setting the host address
> (NETWORK_GW_IP) properly.

What really matters is that programs like 'dnf' and 'apt' can be run.
Nothing that uses libguestfs networking ought to depend on the exact
IP address or other details of the current SLIRP implementation.  So
these changes are fine.  (rsync is a weird feature that we should have
pushed back on and not added ...)

> (These artifacts can be verified in the appliance with "virt-rescue
> --network", by running "ip addr", "ip route", and "ip neighbor" at the
> virt-rescue prompt. There are four scenarios: two libguest backends, times
> passt being installed or not installed.)
> 
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2184967
> Signed-off-by: Laszlo Ersek <lersek at redhat.com>
> ---
>  lib/guestfs-internal.h |  26 ++++
>  lib/launch-direct.c    | 147 +++++++++++++++++++-
>  2 files changed, 168 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/guestfs-internal.h b/lib/guestfs-internal.h
> index 9ba4d4ad46cf..9f4f800e6d6e 100644
> --- a/lib/guestfs-internal.h
> +++ b/lib/guestfs-internal.h
> @@ -158,6 +158,32 @@ cleanup_mutex_unlock (pthread_mutex_t **ptr)
>  #define NETWORK_ADDRESS "169.254.2.15"
>  #define NETWORK_PREFIX  "16"
>  
> +/* The IP address and the MAC address of the host, as seen from the appliance.
> + *
> + * NETWORK_GW_IP should be the same as NETWORK_ADDRESS, only replacing ".15" in
> + * the rightmost octet with ".2".  NETWORK_GW_MAC cannot be changed.  These
> + * restrictions are a consequence of the following landscape:
> + *
> + * libguestfs backend  userspace network stack  restrictions
> + * ------------------  -----------------------  --------------------------------
> + * direct              passt                    None; both NETWORK_GW_IP and
> + *                                              NETWORK_GW_MAC can be set on the
> + *                                              passt command line.
> + *
> + * direct              SLIRP                    SLIRP hard-codes NETWORK_GW_MAC.
> + *
> + * libvirt             passt                    The domain XML does not expose
> + *                                              either knob (RHBZ#2222766), even
> + *                                              though passt could accept both.
> + *
> + * libvirt             SLIRP                    The domain XML does not expose
> + *                                              either knob (RHBZ#2222766), and
> + *                                              SLIRP hard-codes NETWORK_GW_MAC
> + *                                              anyway.
> + */
> +#define NETWORK_GW_IP   "169.254.2.2"
> +#define NETWORK_GW_MAC  "52:56:00:00:00:02"
> +
>  /* Guestfs handle and associated structures. */
>  
>  /* State. */
> diff --git a/lib/launch-direct.c b/lib/launch-direct.c
> index 3f46f0509736..8d6ad025a4e1 100644
> --- a/lib/launch-direct.c
> +++ b/lib/launch-direct.c
> @@ -32,6 +32,7 @@
>  #include <errno.h>
>  #include <fcntl.h>
>  #include <sys/types.h>
> +#include <sys/wait.h>
>  #include <sys/time.h>
>  #include <sys/resource.h>
>  #include <sys/stat.h>
> @@ -48,6 +49,7 @@
>  #include "guestfs-internal.h"
>  #include "guestfs_protocol.h"
>  #include "qemuopts.h"
> +#include "ignore-value.h"
>  
>  /* Per-handle data. */
>  struct backend_direct_data {
> @@ -333,6 +335,110 @@ add_drives (guestfs_h *g, struct backend_direct_data *data,
>    return 0;
>  }
>  
> +/**
> + * Launch passt such that it daemonizes.
> + *
> + * On error, -1 is returned; C<passt_pid> and C<sockpath> are not modified.
> + *
> + * On success, 0 is returned.  C<passt_pid> contains the PID of the passt
> + * background process.  C<sockpath> contains the pathname of the unix domain
> + * socket where passt will accept a single connection.
> + */
> +static int
> +launch_passt (guestfs_h *g, long *passt_pid, char (*sockpath)[UNIX_PATH_MAX])
> +{
> +  int rc;
> +  char sockpath_local[sizeof *sockpath];
> +  char *pid_path;
> +  struct command *cmd;
> +  int passt_status;
> +  int passt_exit;
> +  char *pid_str;
> +  long passt_pid_local;
> +  char *endptr;
> +
> +  rc = -1;
> +  if (guestfs_int_create_socketname (g, "passt.sock", &sockpath_local) == -1)
> +    return rc;
> +
> +  pid_path = guestfs_int_make_pid_path (g, "passt");
> +  if (pid_path == NULL)
> +    return rc;
> +
> +  cmd = guestfs_int_new_command (g);
> +  if (cmd == NULL)
> +    goto free_pid_path;
> +
> +  guestfs_int_cmd_add_arg (cmd, "passt");
> +  guestfs_int_cmd_add_arg (cmd, "--one-off");
> +  guestfs_int_cmd_add_arg (cmd, "--socket");
> +  guestfs_int_cmd_add_arg (cmd, sockpath_local);
> +  guestfs_int_cmd_add_arg (cmd, "--pid");
> +  guestfs_int_cmd_add_arg (cmd, pid_path);
> +  guestfs_int_cmd_add_arg (cmd, "--address");
> +  guestfs_int_cmd_add_arg (cmd, NETWORK_ADDRESS);
> +  guestfs_int_cmd_add_arg (cmd, "--netmask");
> +  guestfs_int_cmd_add_arg (cmd, NETWORK_PREFIX);
> +  guestfs_int_cmd_add_arg (cmd, "--mac-addr");
> +  guestfs_int_cmd_add_arg (cmd, NETWORK_GW_MAC);
> +  guestfs_int_cmd_add_arg (cmd, "--gateway");
> +  guestfs_int_cmd_add_arg (cmd, NETWORK_GW_IP);
> +
> +  passt_status = guestfs_int_cmd_run (cmd);
> +  if (passt_status == -1)
> +    /* guestfs_int_cmd_run() reports errors internally, so just bail here */
> +    goto close_cmd;
> +
> +  if (WIFSIGNALED (passt_status)) {
> +    error (g, _("passt was killed with signal %d"), WTERMSIG (passt_status));
> +    goto close_cmd;
> +  }
> +
> +  assert (WIFEXITED (passt_status));

This has a bit of a smell to it.  There are definitely other
non-signalled ways that the wait could have failed.  I think
we should turn this into an internal error, something like:

if (!WIFEXITED (passt_status)) {
  error (g, "internal error: unexpected exit status from passt (%d)",
         passt_status);
  goto close_cmd;
}

> +  passt_exit = WEXITSTATUS (passt_status);
> +  if (passt_exit != 0) {
> +    error (g, _("passt exited with status %d"), passt_exit);
> +    goto close_cmd;
> +  }
> +
> +  /* At this point passt has forked into the background, dropped privileges, and
> +   * written a PID file.  Due to "--one-off", passt will exit once our QEMU
> +   * appliance disappears (forcibly or cleanly); however, we still need the
> +   * passt PID *temporarily*, so we can kill passt in case we encounter an error
> +   * *before* starting the appliance.
> +   */
> +  if (guestfs_int_read_whole_file (g, pid_path, &pid_str, NULL) == -1)
> +    /* Any error has been reported internally, so just bail.  We can't kill
> +     * passt here because we've failed to get its PID in the first place...
> +     */
> +    goto close_cmd;

Does passt definitely write the PID before exiting in the parent?  I
suppose it must, but if not then we would have a race condition here,
and would need to instead loop for a bit waiting for the PID to
appear.

> +  errno = 0;
> +  passt_pid_local = strtol (pid_str, &endptr, 10);
> +  if (endptr == pid_str || (*endptr != '\0' && *endptr != '\n') || errno != 0 ||
> +      passt_pid_local <= 1) {
> +    /* Same thing, we can't kill passt just yet. */
> +    error (g, _("failed to parse passt PID from '%s'"), pid_path);
> +    goto free_pid_str;
> +  }
> +
> +  /* We're done. */
> +  *passt_pid = passt_pid_local;
> +  ignore_value (strcpy (*sockpath, sockpath_local));
> +  rc = 0;
> +
> +free_pid_str:
> +  free (pid_str);
> +
> +close_cmd:
> +  guestfs_int_cmd_close (cmd);
> +
> +free_pid_path:
> +  free (pid_path);
> +
> +  return rc;
> +}
> +
>  static int
>  launch_direct (guestfs_h *g, void *datav, const char *arg)
>  {
> @@ -340,6 +446,7 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
>    struct qemuopts *qopts = NULL;
>    int daemon_accept_sock = -1, console_sock = -1;
>    int r;
> +  long passt_pid = -1;
>    int flags;
>    int sv[2];
>    struct sockaddr_un addr;
> @@ -658,11 +765,30 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
>  
>    /* Enable user networking. */
>    if (g->enable_network) {
> -    start_list ("-netdev") {
> -      append_list ("user");
> -      append_list ("id=usernet");
> -      append_list ("net=" NETWORK_ADDRESS "/" NETWORK_PREFIX);
> -    } end_list ();
> +    /* If qemu is 7.2.0+ and "passt" is available, ask for passt rather
> +     * than SLIRP.  RHBZ#2184967.
> +     */
> +    if (guestfs_int_version_ge (&data->qemu_version, 7, 2, 0) &&
> +        guestfs_int_passt_runnable (g)) {
> +      char passt_sock[UNIX_PATH_MAX];
> +
> +      if (launch_passt (g, &passt_pid, &passt_sock) == -1)
> +        goto cleanup0;
> +
> +      start_list ("-netdev") {
> +        append_list ("stream");
> +        append_list ("id=usernet");
> +        append_list ("addr.type=unix");
> +        append_list_format ("addr.path=%s", passt_sock);
> +      } end_list ();
> +    }
> +    else {
> +      start_list ("-netdev") {
> +        append_list ("user");
> +        append_list ("id=usernet");
> +        append_list ("net=" NETWORK_ADDRESS "/" NETWORK_PREFIX);
> +      } end_list ();
> +    }
>      start_list ("-device") {
>        append_list (VIRTIO_DEVICE_NAME ("virtio-net"));
>        append_list ("netdev=usernet");
> @@ -893,6 +1019,15 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
>  
>    debug (g, "appliance is up");
>  
> +  /* From this point onward, even if we fail, QEMU terminating (forcefully or
> +   * gracefully) will cause passt to go away as well.  Note that we can't
> +   * precisely tell whether QEMU managed to open the passt socket before QEMU
> +   * failed.  Therefore, err on the side of killing passt needlessly, rather
> +   * than not killing it when needed -- that's why we re-set "passt_pid" to (-1)
> +   * only this late during QEMU startup verification.
> +   */
> +  passt_pid = -1;

It's a reasonable trade-off, but shame we cannot open the Unix fd and
pass it to qemu.

>    /* This is possible in some really strange situations, such as
>     * guestfsd starts up OK but then qemu immediately exits.  Check for
>     * it because the caller is probably expecting to be able to send
> @@ -924,6 +1059,8 @@ launch_direct (guestfs_h *g, void *datav, const char *arg)
>    data->qemu_data = NULL;
>  
>   cleanup0:
> +  if (passt_pid != -1)
> +    kill (passt_pid, SIGTERM);
>    if (qopts != NULL)
>      qemuopts_free (qopts);
>    if (daemon_accept_sock >= 0)

Looks generally good to me.  If there are no issues with comments I
made above then you can consider it reviewed.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v


More information about the Libguestfs mailing list