[libvirt] PATCH: 3/7:

Daniel P. Berrange berrange at redhat.com
Tue Aug 5 16:14:33 UTC 2008


Every LXC container has 2 processes, the leader of the actual container
and a helper process to do the I/O forwarding. The LXC driver is written
to allow libvirtd to be restarted without needing to shutdown the active
containers. For this it uses a PID file in /var/run/libvirt/lxc/NAME.pid
The driver also uses  SIGCHLD to detect when a container has terminated.

Here-in lies the problem - if you restart libvirtd, the container process
gets re-parented to init, and so libvirtd will never get any further
SIGCHLD signals for it.

This patch attempts to address this problem by changing the relationship
between libvirtd and the container processes. Instad of there being two
processes which are siblings, the I/O process becomes the parent of the
actual container.

So the general idea is:

  - libvirtd LXC driver spawns a 'controller' process - this immediately
    double-forks itself into the background, making itself process leader,
    changing its root directory to /, and redirecting its stdin to /dev/null
    and its stdout/err to /var/log/libvirt/lxc/NAME.log.

  - The 'controller' process inherits a UNIX domain socket from libvirtd
    which has had bind() called against /var/run/libvirt/lxc/NAME.sock

  - Once it has backgrounded itself, the controller calls accept() on
    the socket, blocking until a client connects. If this fails for any
    reason the controller exits.

  - THe 'controller' also writes a file to /var/run/libvirt/lxc/NAME.pid
    containing its PID.

  - Immediately after forking the 'controller' process, the libvirtd LXC
    driver, calls connect() on /var/run/libvirt/lxc/NAME.sock. And does
    a blocking read of sizeof(pid_t)  bytes.

  - They now do a handshake, consisting of simply sending & receving
    a single byte. This basically is to ensure the libvirt driver blocks
    until the controller has finished writing its PID file

At this point in time, the libvirtd LXC driver knows what the controller
process' PID is. This becomes the 'ID' of the virDomain object associated
with this. If anything goes wrong from here-on, the libvirtd LXC driver
also now knows what PID it has to kill off. The UNIX socket to the controller
process is kept open, and registered with the libvirtd event loop for POLLHUP
events. This means the LXC driver can get notification when the controller
terminates, without needing to rely on SIGCHLD events.

  - Now the 'controller' has told libvirtd what its PID is, it goes ahead
    and starts the real 'container' process.

  - When the 'container' is up and running, the controller goes into the
    event loop where it handles I/O from the PTYs. It also keeps an eye
    out for SIGHUP on the client socket to the libvirtd daemon.

  - If the client goes away, it'll know it needs to accept() a new client
    (ie the libvirtd daemon starting up again). Upon a new client connecting
    it'll do the PID handshake again, so that libvirtd knows what the container
    ID is again.

  - When libvirtd starts up, it reads the container configs from /etc/libvirt/lxc
    and for each one, tries to connect to /var/run/libvirt/lxc/NAME.sock, and
    read the PID file. This lets it figure out which containers are running.

Notice that throughout this, libvirtd's LXC driver doesn't need to know the
PID of the actual container process - only that of the controller process.
Think of the controller as serving the equivalent role of QEMU in context of
KVM. QEMU provides the backend device model for KVM. When QEMU dies, the guest
domain goes away. Well the controller provides the 'backend' device model for
the container. - though in this case it really only the text console backend.
When the controller dies, the container goes away.

So architecturally the LXC driver is now very closely aligned with the QEMU
driver. The main differences are that LXC can handle libvirtd restarts, and
that the LXC driver simply forks() the controller. One could imagine these
distinctions going away - the QEMU driver can get a restart capability in
much the same way as the LXC restart works. We may want to make the controller
process a proper standalone binary which can be directly exec'd, rather than
just forked off from libvirtd.

 lxc_conf.c       |  195 ----------------
 lxc_conf.h       |   12 
 lxc_container.c  |   39 +--
 lxc_container.h  |    8 
 lxc_controller.c |  349 +++++++++++++++++++++++++++-
 lxc_controller.h |   12 
 lxc_driver.c     |  662 ++++++++++++++++++++++++++-----------------------------
 util.c           |  158 +++++++++++++
 util.h           |   13 +
 9 files changed, 870 insertions(+), 578 deletions(-)


Daniel

diff -r 8093fb566748 src/lxc_conf.c
--- a/src/lxc_conf.c	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/lxc_conf.c	Tue Aug 05 12:13:24 2008 +0100
@@ -833,25 +833,24 @@
     strncpy(vm->configFileBase, file, PATH_MAX);
     vm->configFile[PATH_MAX-1] = '\0';
 
-    if (lxcLoadTtyPid(driver, vm) < 0) {
-        DEBUG0("failed to load tty pid");
-    }
-
     return vm;
 }
 
 int lxcLoadDriverConfig(lxc_driver_t *driver)
 {
     /* Set the container configuration directory */
-    driver->configDir = strdup(SYSCONF_DIR "/libvirt/lxc");
-    if (NULL == driver->configDir) {
-        lxcError(NULL, NULL, VIR_ERR_NO_MEMORY, "configDir");
-        return -1;
-    }
-
-    driver->stateDir = strdup(LOCAL_STATE_DIR "/run/libvirt/lxc");
+    if ((driver->configDir = strdup(SYSCONF_DIR "/libvirt/lxc")) == NULL)
+        goto no_memory;
+    if ((driver->stateDir = strdup(LOCAL_STATE_DIR "/run/libvirt/lxc")) == NULL)
+        goto no_memory;
+    if ((driver->logDir = strdup(LOCAL_STATE_DIR "/log/libvirt/lxc")) == NULL)
+        goto no_memory;
 
     return 0;
+
+no_memory:
+    lxcError(NULL, NULL, VIR_ERR_NO_MEMORY, "configDir");
+    return -1;
 }
 
 int lxcLoadContainerConfigFile(lxc_driver_t *driver,
@@ -1012,9 +1011,7 @@
     curNet = vmdef->nets;
     while (curNet) {
         nextNet = curNet->next;
-        printf("Freeing %s:%s\n", curNet->parentVeth, curNet->containerVeth);
         VIR_FREE(curNet->parentVeth);
-        VIR_FREE(curNet->containerVeth);
         VIR_FREE(curNet->txName);
         VIR_FREE(curNet);
         curNet = nextNet;
@@ -1106,176 +1103,4 @@
     return 0;
 }
 
-/**
- * lxcStoreTtyPid:
- * @driver: pointer to driver
- * @vm: Ptr to VM
- *
- * Stores the pid of the tty forward process contained in vm->pid
- * LOCAL_STATE_DIR/run/libvirt/lxc/{container_name}.pid
- *
- * Returns 0 on success or -1 in case of error
- */
-int lxcStoreTtyPid(const lxc_driver_t *driver, lxc_vm_t *vm)
-{
-    int rc = -1;
-    int fd;
-    FILE *file = NULL;
-
-    if (vm->ttyPidFile[0] == 0x00) {
-        if ((rc = virFileMakePath(driver->stateDir))) {
-            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("cannot create lxc state directory %s: %s"),
-                     driver->stateDir, strerror(rc));
-            goto error_out;
-        }
-
-        if (virFileBuildPath(driver->stateDir, vm->def->name, ".pid",
-                             vm->ttyPidFile, PATH_MAX) < 0) {
-            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("cannot construct tty pid file path"));
-            goto error_out;
-        }
-    }
-
-    if ((fd = open(vm->ttyPidFile,
-                   O_WRONLY | O_CREAT | O_TRUNC,
-                   S_IRUSR | S_IWUSR)) < 0) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("cannot create tty pid file %s: %s"),
-                 vm->ttyPidFile, strerror(errno));
-        goto error_out;
-    }
-
-    if (!(file = fdopen(fd, "w"))) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("cannot fdopen tty pid file %s: %s"),
-                 vm->ttyPidFile, strerror(errno));
-
-        if (close(fd) < 0) {
-            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("failed to close tty pid file %s: %s"),
-                     vm->ttyPidFile, strerror(errno));
-        }
-
-        goto error_out;
-    }
-
-    if (fprintf(file, "%d", vm->pid) < 0) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("cannot write tty pid file %s: %s"),
-                 vm->ttyPidFile, strerror(errno));
-
-        goto fclose_error_out;
-    }
-
-    rc = 0;
-
-fclose_error_out:
-    if (fclose(file) < 0) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("failed to close tty pid file %s: %s"),
-                 vm->ttyPidFile, strerror(errno));
-    }
-
-error_out:
-    return rc;
-}
-
-/**
- * lxcLoadTtyPid:
- * @driver: pointer to driver
- * @vm: Ptr to VM
- *
- * Loads the pid of the tty forward process from the pid file.
- * LOCAL_STATE_DIR/run/libvirt/lxc/{container_name}.pid
- *
- * Returns
- * > 0 - pid of tty process
- *   0 - no tty pid file
- *  -1 - error
- */
-int lxcLoadTtyPid(const lxc_driver_t *driver, lxc_vm_t *vm)
-{
-    int rc = -1;
-    FILE *file;
-
-    if (vm->ttyPidFile[0] == 0x00) {
-        if ((rc = virFileMakePath(driver->stateDir))) {
-            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("cannot create lxc state directory %s: %s"),
-                     driver->stateDir, strerror(rc));
-            goto cleanup;
-        }
-
-        if (virFileBuildPath(driver->stateDir, vm->def->name, ".pid",
-                             vm->ttyPidFile, PATH_MAX) < 0) {
-            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("cannot construct tty pid file path"));
-            goto cleanup;
-        }
-    }
-
-    if (!(file = fopen(vm->ttyPidFile, "r"))) {
-        if (ENOENT == errno) {
-            rc = 0;
-            goto cleanup;
-        }
-
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("cannot open tty pid file %s: %s"),
-                 vm->ttyPidFile, strerror(errno));
-        goto cleanup;
-    }
-
-    if (fscanf(file, "%d", &(vm->pid)) < 0) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("cannot read tty pid file %s: %s"),
-                 vm->ttyPidFile, strerror(errno));
-        goto cleanup;
-    }
-
-    if (fclose(file) < 0) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("failed to close tty pid file %s: %s"),
-                 vm->ttyPidFile, strerror(errno));
-        goto cleanup;
-    }
-
-    rc = vm->pid;
-
- cleanup:
-    return rc;
-}
-
-/**
- * lxcDeleteTtyPid:
- * @vm: Ptr to VM
- *
- * Unlinks the tty pid file for the vm
- * LOCAL_STATE_DIR/run/libvirt/lxc/{container_name}.pid
- *
- * Returns on 0 success or -1 in case of error
- */
-int lxcDeleteTtyPidFile(const lxc_vm_t *vm)
-{
-    if (vm->ttyPidFile[0] == 0x00) {
-        goto no_file;
-    }
-
-    if (unlink(vm->ttyPidFile) < 0) {
-        if (errno == ENOENT) {
-            goto no_file;
-        }
-
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("cannot remove ttyPidFile %s: %s"), vm->ttyPidFile,
-                 strerror(errno));
-        return -1;
-    }
-
-no_file:
-    return 0;
-}
-
 #endif /* WITH_LXC */
diff -r 8093fb566748 src/lxc_conf.h
--- a/src/lxc_conf.h	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/lxc_conf.h	Tue Aug 05 12:13:24 2008 +0100
@@ -46,7 +46,6 @@
 struct __lxc_net_def {
     int type;
     char *parentVeth;       /* veth device in parent namespace */
-    char *containerVeth;    /* veth device in container namespace */
     char *txName;           /* bridge or network name */
 
     lxc_net_def_t *next;
@@ -87,11 +86,10 @@
 struct __lxc_vm {
     int pid;
     int state;
+    int monitor;
 
     char configFile[PATH_MAX];
     char configFileBase[PATH_MAX];
-
-    char ttyPidFile[PATH_MAX];
 
     lxc_vm_def_t *def;
 
@@ -103,8 +101,9 @@
     lxc_vm_t *vms;
     int nactivevms;
     int ninactivevms;
-    char* configDir;
-    char* stateDir;
+    char *configDir;
+    char *stateDir;
+    char *logDir;
     int have_netns;
 };
 
@@ -154,9 +153,6 @@
                     lxc_driver_t *driver,
                     const char *configFile,
                     const char *name);
-int lxcStoreTtyPid(const lxc_driver_t *driver, lxc_vm_t *vm);
-int lxcLoadTtyPid(const lxc_driver_t *driver, lxc_vm_t *vm);
-int lxcDeleteTtyPidFile(const lxc_vm_t *vm);
 
 void lxcError(virConnectPtr conn,
               virDomainPtr dom,
diff -r 8093fb566748 src/lxc_container.c
--- a/src/lxc_container.c	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/lxc_container.c	Tue Aug 05 12:13:24 2008 +0100
@@ -69,6 +69,8 @@
 typedef struct __lxc_child_argv lxc_child_argv_t;
 struct __lxc_child_argv {
     lxc_vm_def_t *config;
+    int nveths;
+    char **veths;
     int monitor;
     char *ttyPath;
 };
@@ -171,8 +173,7 @@
  *
  * Returns 0 on success or -1 in case of error
  */
-int lxcContainerSendContinue(virConnectPtr conn,
-                             int control)
+int lxcContainerSendContinue(int control)
 {
     int rc = -1;
     lxc_message_t msg = LXC_CONTINUE_MSG;
@@ -180,7 +181,7 @@
 
     writeCount = safewrite(control, &msg, sizeof(msg));
     if (writeCount != sizeof(msg)) {
-        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
                  _("unable to send container continue message: %s"),
                  strerror(errno));
         goto error_out;
@@ -230,21 +231,21 @@
  *
  * Returns 0 on success or nonzero in case of error
  */
-static int lxcContainerEnableInterfaces(const lxc_vm_def_t *def)
+static int lxcContainerEnableInterfaces(int nveths,
+                                        char **veths)
 {
-    int rc = 0;
-    const lxc_net_def_t *net;
+    int rc = 0, i;
 
-    for (net = def->nets; net; net = net->next) {
-        DEBUG("Enabling %s", net->containerVeth);
-        rc =  vethInterfaceUpOrDown(net->containerVeth, 1);
+    for (i = 0 ; i < nveths ; i++) {
+        DEBUG("Enabling %s", veths[i]);
+        rc =  vethInterfaceUpOrDown(veths[i], 1);
         if (0 != rc) {
             goto error_out;
         }
     }
 
     /* enable lo device only if there were other net devices */
-    if (def->nets)
+    if (veths)
         rc = vethInterfaceUpOrDown("lo", 1);
 
 error_out:
@@ -311,7 +312,7 @@
         return -1;
 
     /* enable interfaces */
-    if (lxcContainerEnableInterfaces(vmDef) < 0)
+    if (lxcContainerEnableInterfaces(argv->nveths, argv->veths) < 0)
         return -1;
 
     /* this function will only return if an error occured */
@@ -320,7 +321,6 @@
 
 /**
  * lxcContainerStart:
- * @conn: pointer to connection
  * @driver: pointer to driver structure
  * @vm: pointer to virtual machine structure
  *
@@ -328,8 +328,9 @@
  *
  * Returns PID of container on success or -1 in case of error
  */
-int lxcContainerStart(virConnectPtr conn,
-                      lxc_vm_def_t *def,
+int lxcContainerStart(lxc_vm_def_t *def,
+                      int nveths,
+                      char **veths,
                       int control,
                       char *ttyPath)
 {
@@ -337,12 +338,11 @@
     int flags;
     int stacksize = getpagesize() * 4;
     char *stack, *stacktop;
-    lxc_child_argv_t args = { def, control, ttyPath };
+    lxc_child_argv_t args = { def, nveths, veths, control, ttyPath };
 
     /* allocate a stack for the container */
     if (VIR_ALLOC_N(stack, stacksize) < 0) {
-        lxcError(conn, NULL, VIR_ERR_NO_MEMORY,
-                 _("unable to allocate container stack"));
+        lxcError(NULL, NULL, VIR_ERR_NO_MEMORY, NULL);
         return -1;
     }
     stacktop = stack + stacksize;
@@ -357,7 +357,7 @@
     DEBUG("clone() returned, %d", pid);
 
     if (pid < 0) {
-        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
                  _("clone() failed, %s"), strerror(errno));
         return -1;
     }
@@ -379,8 +379,9 @@
     char *stack;
     int childStatus;
 
-    if (features & LXC_CONTAINER_FEATURE_NET)
+    if (features & LXC_CONTAINER_FEATURE_NET) {
         flags |= CLONE_NEWNET;
+    }
 
     if (VIR_ALLOC_N(stack, getpagesize() * 4) < 0) {
         DEBUG0("Unable to allocate stack");
diff -r 8093fb566748 src/lxc_container.h
--- a/src/lxc_container.h	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/lxc_container.h	Tue Aug 05 12:13:24 2008 +0100
@@ -32,11 +32,11 @@
     LXC_CONTAINER_FEATURE_NET = (1 << 0),
 };
 
-int lxcContainerSendContinue(virConnectPtr conn,
-                             int control);
+int lxcContainerSendContinue(int control);
 
-int lxcContainerStart(virConnectPtr conn,
-                      lxc_vm_def_t *def,
+int lxcContainerStart(lxc_vm_def_t *def,
+                      int nveths,
+                      char **veths,
                       int control,
                       char *ttyPath);
 
diff -r 8093fb566748 src/lxc_controller.c
--- a/src/lxc_controller.c	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/lxc_controller.c	Tue Aug 05 12:13:24 2008 +0100
@@ -26,16 +26,59 @@
 #ifdef WITH_LXC
 
 #include <sys/epoll.h>
+#include <sys/wait.h>
+#include <sys/socket.h>
 #include <unistd.h>
+#include <paths.h>
+#include <fcntl.h>
 
 #include "internal.h"
 #include "util.h"
 
 #include "lxc_conf.h"
+#include "lxc_container.h"
 #include "lxc_controller.h"
+#include "veth.h"
+#include "memory.h"
+#include "util.h"
 
 
 #define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
+#define DEBUG0(msg) VIR_DEBUG(__FILE__, "%s", msg)
+
+
+#define LXC_HANDSHAKE_REQUEST 'm'
+#define LXC_HANDSHAKE_REPLY 'r'
+
+int lxcControllerClientHandshake(int monitor)
+{
+    char c;
+    if (saferead(monitor, &c, sizeof(c)) != sizeof(c))
+        return -1;
+    if (c != LXC_HANDSHAKE_REQUEST) {
+        errno = EINVAL;
+        return -1;
+    }
+    c = LXC_HANDSHAKE_REPLY;
+    if (safewrite(monitor, &c, sizeof(c)) != sizeof(c))
+        return -1;
+    return 0;
+}
+
+static int lxcControllerServerHandshake(int monitor)
+{
+    char c = LXC_HANDSHAKE_REQUEST;
+    if (safewrite(monitor, &c, sizeof(c)) != sizeof(c))
+        return -1;
+    if (saferead(monitor, &c, sizeof(c)) != sizeof(c))
+        return -1;
+    if (c != LXC_HANDSHAKE_REPLY) {
+        errno = EINVAL;
+        return -1;
+    }
+    return 0;
+}
+
 
 /**
  * lxcFdForward:
@@ -91,7 +134,10 @@
  *
  * Returns 0 on success or -1 in case of error
  */
-int lxcControllerMain(int appPty, int contPty)
+static int lxcControllerMain(int monitor,
+                             int client,
+                             int appPty,
+                             int contPty)
 {
     int rc = -1;
     int epollFd;
@@ -120,15 +166,29 @@
     memset(&epollEvent, 0x00, sizeof(epollEvent));
     epollEvent.events = EPOLLIN|EPOLLET;    /* edge triggered */
     epollEvent.data.fd = appPty;
-    epollEvent.data.u32 = 0;                /* fdArray position */
     if (0 > epoll_ctl(epollFd, EPOLL_CTL_ADD, appPty, &epollEvent)) {
         lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
                  _("epoll_ctl(appPty) failed: %s"), strerror(errno));
         goto cleanup;
     }
     epollEvent.data.fd = contPty;
-    epollEvent.data.u32 = 1;                /* fdArray position */
     if (0 > epoll_ctl(epollFd, EPOLL_CTL_ADD, contPty, &epollEvent)) {
+        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("epoll_ctl(contPty) failed: %s"), strerror(errno));
+        goto cleanup;
+    }
+
+    epollEvent.events = EPOLLIN;
+    epollEvent.data.fd = monitor;
+    if (0 > epoll_ctl(epollFd, EPOLL_CTL_ADD, monitor, &epollEvent)) {
+        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("epoll_ctl(contPty) failed: %s"), strerror(errno));
+        goto cleanup;
+    }
+
+    epollEvent.events = EPOLLHUP;
+    epollEvent.data.fd = client;
+    if (0 > epoll_ctl(epollFd, EPOLL_CTL_ADD, client, &epollEvent)) {
         lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
                  _("epoll_ctl(contPty) failed: %s"), strerror(errno));
         goto cleanup;
@@ -138,23 +198,46 @@
         /* if active fd's, return if no events, else wait forever */
         timeout = (numActive > 0) ? 0 : -1;
         numEvents = epoll_wait(epollFd, &epollEvent, 1, timeout);
-        if (0 < numEvents) {
-            if (epollEvent.events & EPOLLIN) {
-                curFdOff = epollEvent.data.u32;
-                if (!fdArray[curFdOff].active) {
-                    fdArray[curFdOff].active = 1;
-                    ++numActive;
+        if (numEvents > 0) {
+            if (epollEvent.data.fd == monitor) {
+                int fd = accept(monitor, NULL, 0);
+                if (client != -1 || /* Already connected, so kick new one out */
+                    lxcControllerServerHandshake(client) < 0) {
+                    close(fd);
+                    continue;
                 }
-
-            } else if (epollEvent.events & EPOLLHUP) {
-                DEBUG("EPOLLHUP from fd %d", epollEvent.data.fd);
-                continue;
+                client = fd;
+                epollEvent.events = EPOLLHUP;
+                epollEvent.data.fd = client;
+                if (0 > epoll_ctl(epollFd, EPOLL_CTL_ADD, client, &epollEvent)) {
+                    lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                             _("epoll_ctl(contPty) failed: %s"), strerror(errno));
+                    goto cleanup;
+                }
+            } else if (client != -1 && epollEvent.data.fd == client) {
+                if (0 > epoll_ctl(epollFd, EPOLL_CTL_DEL, client, &epollEvent)) {
+                    lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                             _("epoll_ctl(contPty) failed: %s"), strerror(errno));
+                    goto cleanup;
+                }
+                close(client);
+                client = -1;
             } else {
-                lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                         _("error event %d"), epollEvent.events);
-                goto cleanup;
+                if (epollEvent.events & EPOLLIN) {
+                    curFdOff = epollEvent.data.fd == appPty ? 0 : 1;
+                    if (!fdArray[curFdOff].active) {
+                        fdArray[curFdOff].active = 1;
+                        ++numActive;
+                    }
+                } else if (epollEvent.events & EPOLLHUP) {
+                    DEBUG("EPOLLHUP from fd %d", epollEvent.data.fd);
+                    continue;
+                } else {
+                    lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                             _("error event %d"), epollEvent.events);
+                    goto cleanup;
+                }
             }
-
         } else if (0 == numEvents) {
             if (2 == numActive) {
                 /* both fds active, toggle between the two */
@@ -202,4 +285,236 @@
     return rc;
 }
 
+
+
+/**
+ * lxcControllerMoveInterfaces
+ * @nveths: number of interfaces
+ * @veths: interface names
+ * @container: pid of container
+ *
+ * Moves network interfaces into a container's namespace
+ *
+ * Returns 0 on success or -1 in case of error
+ */
+static int lxcControllerMoveInterfaces(int nveths,
+                                       char **veths,
+                                       pid_t container)
+{
+    int i;
+    for (i = 0 ; i < nveths ; i++)
+        if (moveInterfaceToNetNs(veths[i], container) < 0) {
+            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                     _("failed to move interface %s to ns %d"),
+                     veths[i], container);
+            return -1;
+        }
+
+    return 0;
+}
+
+
+/**
+ * lxcCleanupInterfaces:
+ * @conn: pointer to connection
+ * @vm: pointer to virtual machine structure
+ *
+ * Cleans up the container interfaces by deleting the veth device pairs.
+ *
+ * Returns 0 on success or -1 in case of error
+ */
+static int lxcControllerCleanupInterfaces(int nveths,
+                                          char **veths)
+{
+    int i;
+    for (i = 0 ; i < nveths ; i++)
+        if (vethDelete(veths[i]) < 0)
+            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                     _("failed to delete veth: %s"), veths[i]);
+            /* will continue to try to cleanup any other interfaces */
+
+    return 0;
+}
+
+
+static int
+lxcControllerRun(const char *stateDir,
+                 lxc_vm_def_t *def,
+                 int nveths,
+                 char **veths,
+                 int monitor,
+                 int client,
+                 int appPty)
+{
+    int rc = -1;
+    int control[2] = { -1, -1};
+    int containerPty;
+    char *containerPtyPath;
+    pid_t container = -1;
+
+    if (socketpair(PF_UNIX, SOCK_STREAM, 0, control) < 0) {
+        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("sockpair failed: %s"), strerror(errno));
+        goto cleanup;
+    }
+
+    if (virFileOpenTty(&containerPty,
+                       &containerPtyPath,
+                       0) < 0) {
+        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to allocate tty: %s"), strerror(errno));
+        goto cleanup;
+    }
+
+    if ((container = lxcContainerStart(def,
+                                       nveths,
+                                       veths,
+                                       control[1],
+                                       containerPtyPath)) < 0)
+        goto cleanup;
+    close(control[1]);
+    control[1] = -1;
+
+    if (lxcControllerMoveInterfaces(nveths, veths, container) < 0)
+        goto cleanup;
+
+    if (lxcContainerSendContinue(control[0]) < 0)
+        goto cleanup;
+
+    rc = lxcControllerMain(monitor, client, appPty, containerPty);
+
+cleanup:
+    if (control[0] != -1)
+        close(control[0]);
+    if (control[1] != -1)
+        close(control[1]);
+    VIR_FREE(containerPtyPath);
+    if (containerPty != -1)
+        close(containerPty);
+
+    kill(container, SIGTERM);
+    waitpid(container, NULL, 0);
+    lxcControllerCleanupInterfaces(nveths, veths);
+    virFileDeletePid(stateDir, def->name);
+    return -1;
+}
+
+
+int lxcControllerStart(const char *stateDir,
+                       lxc_vm_def_t *def,
+                       int nveths,
+                       char **veths,
+                       int monitor,
+                       int appPty,
+                       int logfd)
+{
+    pid_t pid;
+    int rc;
+    int status, null;
+    int open_max, i;
+    int client;
+
+    if ((pid = fork()) < 0)
+        return -1;
+
+    if (pid > 0) {
+        /* Original caller waits for first child to exit */
+        while (1) {
+            rc = waitpid(pid, &status, 0);
+            if (rc < 0) {
+                if (errno == EINTR)
+                    continue;
+                return -1;
+            }
+            if (rc != pid) {
+                fprintf(stderr,
+                        _("Unexpected pid %d != %d from waitpid\n"),
+                        rc, pid);
+                return -1;
+            }
+            if (WIFEXITED(status) &&
+                WEXITSTATUS(status) == 0)
+                return 0;
+            else {
+                fprintf(stderr,
+                        _("Unexpected status %d from pid %d\n"),
+                        status, pid);
+                return -1;
+            }
+        }
+    }
+
+    /* First child is running here */
+
+    if (chdir("/") < 0) {
+        fprintf(stderr, _("Unable to change to root dir: %s\n"),
+                strerror(errno));
+        _exit(-1);
+    }
+
+    if (setsid() < 0) {
+        fprintf(stderr, _("Unable to become session leader: %s\n"),
+                strerror(errno));
+        _exit(-1);
+    }
+
+    if ((null = open(_PATH_DEVNULL, O_RDONLY)) < 0) {
+        fprintf(stderr, _("Unable to open %s: %s\n"),
+                _PATH_DEVNULL, strerror(errno));
+        _exit(-1);
+    }
+
+    open_max = sysconf (_SC_OPEN_MAX);
+    for (i = 0; i < open_max; i++)
+        if (i != appPty &&
+            i != monitor &&
+            i != logfd &&
+            i != null)
+            close(i);
+
+    if (dup2(null, STDIN_FILENO) < 0 ||
+        dup2(logfd, STDOUT_FILENO) < 0 ||
+        dup2(logfd, STDERR_FILENO) < 0) {
+        fprintf(stderr, _("Unable to redirect stdio: %s\n"),
+                strerror(errno));
+        _exit(-1);
+    }
+
+    close(null);
+    close(logfd);
+
+    /* Now fork the real controller process */
+    if ((pid = fork()) < 0) {
+        fprintf(stderr, _("Unable to fork controller: %s\n"),
+                strerror(errno));
+        _exit(-1);
+    }
+
+    if (pid > 0) {
+        /* First child now exits */
+        _exit(0);
+    }
+
+    /* This is real controller running... */
+    if ((rc = virFileWritePid(stateDir, def->name, getpid())) != 0) {
+        fprintf(stderr, _("Unable to write pid file: %s\n"),
+                strerror(rc));
+        _exit(-1);
+    }
+
+    /* Accept initial client which is the libvirtd daemon */
+    if ((client = accept(monitor, NULL, 0)) < 0 ||
+        lxcControllerServerHandshake(client) < 0) {
+        fprintf(stderr, _("Failed connection from LXC driver: %s\n"),
+                strerror(errno));
+        _exit(-1);
+    }
+
+    /* Controlling libvirtd LXC driver now knows
+       what our PID is, and is able to cleanup after
+       us from now on */
+    _exit(lxcControllerRun(stateDir, def, nveths, veths, monitor, client, appPty));
+}
+
+
 #endif
diff -r 8093fb566748 src/lxc_controller.h
--- a/src/lxc_controller.h	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/lxc_controller.h	Tue Aug 05 12:13:24 2008 +0100
@@ -26,7 +26,17 @@
 
 #ifdef WITH_LXC
 
-int lxcControllerMain(int appPty, int contPty);
+#include "lxc_conf.h"
+
+int lxcControllerClientHandshake(int monitor);
+
+int lxcControllerStart(const char *stateDir,
+                       lxc_vm_def_t *def,
+                       int nveths,
+                       char **veths,
+                       int monitor,
+                       int appPty,
+                       int logfd);
 
 #endif /* WITH_LXC */
 
diff -r 8093fb566748 src/lxc_driver.c
--- a/src/lxc_driver.c	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/lxc_driver.c	Tue Aug 05 12:13:24 2008 +0100
@@ -31,22 +31,23 @@
 #include <stdbool.h>
 #include <string.h>
 #include <sys/types.h>
-#include <termios.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/poll.h>
 #include <unistd.h>
 #include <wait.h>
 
+#include "internal.h"
 #include "lxc_conf.h"
 #include "lxc_container.h"
 #include "lxc_driver.h"
 #include "lxc_controller.h"
-#include "driver.h"
-#include "internal.h"
 #include "memory.h"
 #include "util.h"
-#include "memory.h"
 #include "bridge.h"
-#include "qemu_conf.h"
 #include "veth.h"
+#include "event.h"
+
 
 /* debug macros */
 #define DEBUG(fmt,...) VIR_DEBUG(__FILE__, fmt, __VA_ARGS__)
@@ -284,8 +285,6 @@
 
     vm->configFile[0] = '\0';
 
-    lxcDeleteTtyPidFile(vm);
-
     lxcRemoveInactiveVM(driver, vm);
 
     return 0;
@@ -339,10 +338,59 @@
     return lxcGenerateXML(dom->conn, driver, vm, vm->def);
 }
 
+
+/**
+ * lxcVmCleanup:
+ * @vm: Ptr to VM to clean up
+ *
+ * waitpid() on the container process.  kill and wait the tty process
+ * This is called by boh lxcDomainDestroy and lxcSigHandler when a
+ * container exits.
+ *
+ * Returns 0 on success or -1 in case of error
+ */
+static int lxcVMCleanup(virConnectPtr conn,
+                        lxc_driver_t *driver,
+                        lxc_vm_t * vm)
+{
+    int rc = -1;
+    int waitRc;
+    int childStatus = -1;
+
+    while (((waitRc = waitpid(vm->pid, &childStatus, 0)) == -1) &&
+           errno == EINTR);
+
+    if ((waitRc != vm->pid) && (errno != ECHILD)) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("waitpid failed to wait for container %d: %d %s"),
+                 vm->pid, waitRc, strerror(errno));
+    }
+
+    rc = 0;
+
+    if (WIFEXITED(childStatus)) {
+        rc = WEXITSTATUS(childStatus);
+        DEBUG("container exited with rc: %d", rc);
+    }
+
+    virEventRemoveHandle(vm->monitor);
+    close(vm->monitor);
+
+    virFileDeletePid(driver->stateDir, vm->def->name);
+
+    vm->state = VIR_DOMAIN_SHUTOFF;
+    vm->pid = -1;
+    vm->def->id = -1;
+    vm->monitor = -1;
+    driver->nactivevms--;
+    driver->ninactivevms++;
+
+    return rc;
+}
+
 /**
  * lxcSetupInterfaces:
- * @conn: pointer to connection
- * @vm: pointer to virtual machine structure
+ * @def: pointer to virtual machine structure
  *
  * Sets up the container interfaces by creating the veth device pairs and
  * attaching the parent end to the appropriate bridge.  The container end
@@ -351,24 +399,21 @@
  * Returns 0 on success or -1 in case of error
  */
 static int lxcSetupInterfaces(virConnectPtr conn,
-                              lxc_vm_t *vm)
+                              lxc_vm_def_t *def,
+                              int *nveths,
+                              char ***veths)
 {
     int rc = -1;
-    lxc_driver_t *driver = conn->privateData;
-    struct qemud_driver *networkDriver =
-        (struct qemud_driver *)(conn->networkPrivateData);
-    lxc_net_def_t *net = vm->def->nets;
-    char* bridge;
+    lxc_net_def_t *net;
+    char *bridge = NULL;
     char parentVeth[PATH_MAX] = "";
     char containerVeth[PATH_MAX] = "";
+    brControl *brctl = NULL;
 
-    if ((vm->def->nets != NULL) && (driver->have_netns == 0)) {
-        lxcError(conn, NULL, VIR_ERR_NO_SUPPORT,
-                 _("System lacks NETNS support"));
+    if (brInit(&brctl) != 0)
         return -1;
-    }
 
-    for (net = vm->def->nets; net; net = net->next) {
+    for (net = def->nets; net; net = net->next) {
         if (LXC_NET_NETWORK == net->type) {
             virNetworkPtr network = virNetworkLookupByName(conn, net->txName);
             if (!network) {
@@ -378,7 +423,6 @@
             bridge = virNetworkGetBridgeName(network);
 
             virNetworkFree(network);
-
         } else {
             bridge = net->txName;
         }
@@ -394,9 +438,6 @@
         if (NULL != net->parentVeth) {
             strcpy(parentVeth, net->parentVeth);
         }
-        if (NULL != net->containerVeth) {
-            strcpy(containerVeth, net->containerVeth);
-        }
         DEBUG("parentVeth: %s, containerVeth: %s", parentVeth, containerVeth);
         if (0 != (rc = vethCreate(parentVeth, PATH_MAX, containerVeth, PATH_MAX))) {
             lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
@@ -406,24 +447,18 @@
         if (NULL == net->parentVeth) {
             net->parentVeth = strdup(parentVeth);
         }
-        if (NULL == net->containerVeth) {
-            net->containerVeth = strdup(containerVeth);
-        }
+        if (VIR_REALLOC_N(*veths, (*nveths)+1) < 0)
+            goto error_exit;
+        if (((*veths)[(*nveths)++] = strdup(containerVeth)) == NULL)
+            goto error_exit;
 
-        if ((NULL == net->parentVeth) || (NULL == net->containerVeth)) {
-            lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+        if (NULL == net->parentVeth) {
+            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
                      _("failed to allocate veth names"));
             goto error_exit;
         }
 
-        if (!(networkDriver->brctl) && (rc = brInit(&(networkDriver->brctl)))) {
-            lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("cannot initialize bridge support: %s"),
-                     strerror(rc));
-            goto error_exit;
-        }
-
-        if (0 != (rc = brAddInterface(networkDriver->brctl, bridge, parentVeth))) {
+        if (0 != (rc = brAddInterface(brctl, bridge, parentVeth))) {
             lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
                      _("failed to add %s device to %s: %s"),
                      parentVeth,
@@ -433,7 +468,7 @@
         }
 
         if (0 != (rc = vethInterfaceUpOrDown(parentVeth, 1))) {
-            lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
                      _("failed to enable parent ns veth device: %d"), rc);
             goto error_exit;
         }
@@ -443,136 +478,151 @@
     rc = 0;
 
 error_exit:
+    brShutdown(brctl);
     return rc;
 }
 
-/**
- * lxcMoveInterfacesToNetNs:
- * @conn: pointer to connection
- * @vm: pointer to virtual machine structure
- *
- * Starts a container process by calling clone() with the namespace flags
- *
- * Returns 0 on success or -1 in case of error
- */
-static int lxcMoveInterfacesToNetNs(virConnectPtr conn,
-                                    const lxc_vm_t *vm)
+static int lxcMonitorServer(virConnectPtr conn,
+                            lxc_driver_t * driver,
+                            lxc_vm_t *vm)
 {
-    int rc = -1;
-    lxc_net_def_t *net;
+    char *sockpath = NULL;
+    int fd;
+    struct sockaddr_un addr;
 
-    for (net = vm->def->nets; net; net = net->next) {
-        if (0 != moveInterfaceToNetNs(net->containerVeth, vm->def->id)) {
+    if (asprintf(&sockpath, "%s/%s.sock",
+                 driver->stateDir, vm->def->name) < 0) {
+        lxcError(conn, NULL, VIR_ERR_NO_MEMORY, NULL);
+        return -1;
+    }
+
+    if ((fd = socket(PF_UNIX, SOCK_STREAM, 0)) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to create server socket: %s"),
+                 strerror(errno));
+        goto error;
+    }
+
+    unlink(sockpath);
+    memset(&addr, 0, sizeof(addr));
+    addr.sun_family = AF_UNIX;
+    strncpy(addr.sun_path, sockpath, sizeof(addr.sun_path));
+
+    if (bind(fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to bind server socket: %s"),
+                 strerror(errno));
+        goto error;
+    }
+    if (listen(fd, 30 /* backlog */ ) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to listen server socket: %s"),
+                 strerror(errno));
+        goto error;
+        return (-1);
+    }
+
+    VIR_FREE(sockpath);
+    return fd;
+
+error:
+    VIR_FREE(sockpath);
+    if (fd != -1)
+        close(fd);
+    return -1;
+}
+
+static int lxcMonitorClient(virConnectPtr conn,
+                            lxc_driver_t * driver,
+                            lxc_vm_t *vm)
+{
+    char *sockpath = NULL;
+    int fd;
+    struct sockaddr_un addr;
+
+    if (asprintf(&sockpath, "%s/%s.sock",
+                 driver->stateDir, vm->def->name) < 0) {
+        lxcError(conn, NULL, VIR_ERR_NO_MEMORY, NULL);
+        return -1;
+    }
+
+    if ((fd = socket(PF_UNIX, SOCK_STREAM, 0)) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to create client socket: %s"),
+                 strerror(errno));
+        goto error;
+    }
+
+    memset(&addr, 0, sizeof(addr));
+    addr.sun_family = AF_UNIX;
+    strncpy(addr.sun_path, sockpath, sizeof(addr.sun_path));
+
+    if (connect(fd, (struct sockaddr *) &addr, sizeof(addr)) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to connect to client socket: %s"),
+                 strerror(errno));
+        goto error;
+    }
+
+    if (lxcControllerClientHandshake(fd) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to handshake with client: %s"),
+                 strerror(errno));
+        goto error;
+    }
+
+    VIR_FREE(sockpath);
+    return fd;
+
+error:
+    VIR_FREE(sockpath);
+    if (fd != -1)
+        close(fd);
+    return -1;
+}
+
+
+static int lxcVmTerminate(virConnectPtr conn,
+                          lxc_driver_t *driver,
+                          lxc_vm_t *vm,
+                          int signum)
+{
+    if (signum == 0)
+        signum = SIGINT;
+
+    if (kill(vm->pid, signum) < 0) {
+        if (errno != ESRCH) {
             lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("failed to move interface %s to ns %d"),
-                     net->containerVeth, vm->def->id);
-            goto error_exit;
+                     _("failed to kill pid %d: %s"),
+                     vm->pid, strerror(errno));
+            return -1;
         }
     }
 
-    rc = 0;
+    vm->state = VIR_DOMAIN_SHUTDOWN;
 
-error_exit:
-    return rc;
+    return lxcVMCleanup(conn, driver, vm);
 }
 
-/**
- * lxcCleanupInterfaces:
- * @conn: pointer to connection
- * @vm: pointer to virtual machine structure
- *
- * Cleans up the container interfaces by deleting the veth device pairs.
- *
- * Returns 0 on success or -1 in case of error
- */
-static int lxcCleanupInterfaces(const lxc_vm_t *vm)
+static void lxcMonitorEvent(int fd,
+                            int events ATTRIBUTE_UNUSED,
+                            void *data)
 {
-    int rc = -1;
-    lxc_net_def_t *net;
+    lxc_driver_t *driver = data;
+    lxc_vm_t *vm = driver->vms;
 
-    for (net = vm->def->nets; net; net = net->next) {
-        if (0 != (rc = vethDelete(net->parentVeth))) {
-            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("failed to delete veth: %s"), net->parentVeth);
-            /* will continue to try to cleanup any other interfaces */
-        }
+    while (vm) {
+        if (vm->monitor == fd)
+            break;
+        vm = vm->next;
+    }
+    if (!vm) {
+        virEventRemoveHandle(fd);
+        return;
     }
 
-    return 0;
-}
-
-
-/**
- * lxcOpenTty:
- * @conn: pointer to connection
- * @ttymaster: pointer to int.  On success, set to fd for master end
- * @ttyName: On success, will point to string slave end of tty.  Caller
- * must free when done (such as in lxcFreeVM).
- *
- * Opens and configures container tty.
- *
- * Returns 0 on success or -1 in case of error
- */
-static int lxcOpenTty(virConnectPtr conn,
-                      int *ttymaster,
-                      char **ttyName,
-                      int rawmode)
-{
-    int rc = -1;
-
-    *ttymaster = posix_openpt(O_RDWR|O_NOCTTY|O_NONBLOCK);
-    if (*ttymaster < 0) {
-        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("posix_openpt failed: %s"), strerror(errno));
-        goto cleanup;
-    }
-
-    if (unlockpt(*ttymaster) < 0) {
-        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("unlockpt failed: %s"), strerror(errno));
-        goto cleanup;
-    }
-
-    if (rawmode) {
-        struct termios ttyAttr;
-        if (tcgetattr(*ttymaster, &ttyAttr) < 0) {
-            lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                     "tcgetattr() failed: %s", strerror(errno));
-            goto cleanup;
-        }
-
-        cfmakeraw(&ttyAttr);
-
-        if (tcsetattr(*ttymaster, TCSADRAIN, &ttyAttr) < 0) {
-            lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                     "tcsetattr failed: %s", strerror(errno));
-            goto cleanup;
-        }
-    }
-
-    if (ttyName) {
-        char tempTtyName[PATH_MAX];
-        if (0 != ptsname_r(*ttymaster, tempTtyName, sizeof(tempTtyName))) {
-            lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("ptsname_r failed: %s"), strerror(errno));
-            goto cleanup;
-        }
-
-        if ((*ttyName = strdup(tempTtyName)) == NULL) {
-            lxcError(conn, NULL, VIR_ERR_NO_MEMORY, NULL);
-            goto cleanup;
-        }
-    }
-
-    rc = 0;
-
-cleanup:
-    if (rc != 0 &&
-        *ttymaster != -1) {
-        close(*ttymaster);
-    }
-
-    return rc;
+    if (lxcVmTerminate(NULL, driver, vm, SIGINT) < 0)
+        virEventRemoveHandle(fd);
 }
 
 
@@ -590,81 +640,106 @@
                       lxc_driver_t * driver,
                       lxc_vm_t * vm)
 {
-    int rc = -1;
-    int sockpair[2];
-    int containerTty, parentTty;
-    char *containerTtyPath = NULL;
+    int rc = -1, i;
+    int monitor;
+    int parentTty;
+    char *logfile = NULL;
+    int logfd = -1;
+    int nveths = 0;
+    char **veths = NULL;
+
+    if (virFileMakePath(driver->logDir) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("cannot create log directory %s: %s"),
+                 driver->logDir, strerror(rc));
+        return -1;
+    }
+
+    if (asprintf(&logfile, "%s/%s.log",
+                 driver->logDir, vm->def->name) < 0) {
+        lxcError(conn, NULL, VIR_ERR_NO_MEMORY, NULL);
+        return -1;
+    }
+
+    if ((monitor = lxcMonitorServer(conn, driver, vm)) < 0)
+        goto cleanup;
 
     /* open parent tty */
     VIR_FREE(vm->def->tty);
-    if (lxcOpenTty(conn, &parentTty, &vm->def->tty, 1) < 0) {
-        goto cleanup;
-    }
-
-    /* open container tty */
-    if (lxcOpenTty(conn, &containerTty, &containerTtyPath, 0) < 0) {
-        goto cleanup;
-    }
-
-    /* fork process to handle the tty io forwarding */
-    if ((vm->pid = fork()) < 0) {
+    if (virFileOpenTty(&parentTty, &vm->def->tty, 1) < 0) {
         lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("unable to fork tty forwarding process: %s"),
+                 _("failed to allocate tty: %s"),
                  strerror(errno));
         goto cleanup;
     }
 
-    if (vm->pid  == 0) {
-        /* child process calls forward routine */
-        lxcControllerMain(parentTty, containerTty);
-    }
+    if (lxcSetupInterfaces(conn, vm->def, &nveths, &veths) != 0)
+        goto cleanup;
 
-    if (lxcStoreTtyPid(driver, vm)) {
-        DEBUG0("unable to store tty pid");
-    }
-
-    close(parentTty);
-    close(containerTty);
-
-    if (0 != (rc = lxcSetupInterfaces(conn, vm))) {
+    if ((logfd = open(logfile, O_WRONLY | O_TRUNC | O_CREAT,
+             S_IRUSR|S_IWUSR)) < 0) {
+        lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
+                 _("failed to open %s: %s"), logfile,
+                 strerror(errno));
         goto cleanup;
     }
 
-    /* create a socket pair to send continue message to the container once */
-    /* we've completed the post clone configuration */
-    if (0 != socketpair(PF_UNIX, SOCK_STREAM, 0, sockpair)) {
+    if (lxcControllerStart(driver->stateDir,
+                           vm->def, nveths, veths,
+                           monitor, parentTty, logfd) < 0)
+        goto cleanup;
+    /* Close the server side of the monitor, now owned
+     * by the controller process */
+    close(monitor);
+    monitor = -1;
+
+    /* Connect to the controller as a client *first* because
+     * this will block until the child has written their
+     * pid file out to disk */
+    if ((vm->monitor = lxcMonitorClient(conn, driver, vm)) < 0)
+        goto cleanup;
+
+    /* And get its pid */
+    if ((rc = virFileReadPid(driver->stateDir, vm->def->name, &vm->pid)) != 0) {
         lxcError(conn, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("sockpair failed: %s"), strerror(errno));
+                 _("Failed to read pid file %s/%s.pid: %s"),
+                 driver->stateDir, vm->def->name, strerror(errno));
+        rc = -1;
         goto cleanup;
     }
 
-    /* check this rc */
-
-    vm->def->id = lxcContainerStart(conn,
-                                    vm->def,
-                                    sockpair[1],
-                                    containerTtyPath);
-    if (vm->def->id == -1)
-        goto cleanup;
-    lxcSaveConfig(conn, driver, vm, vm->def);
-
-    rc = lxcMoveInterfacesToNetNs(conn, vm);
-    if (rc != 0)
-        goto cleanup;
-
-    rc = lxcContainerSendContinue(conn, sockpair[0]);
-    if (rc != 0)
-        goto cleanup;
-
+    vm->def->id = vm->pid;
     vm->state = VIR_DOMAIN_RUNNING;
     driver->ninactivevms--;
     driver->nactivevms++;
 
+    if (virEventAddHandle(vm->monitor,
+                          POLLERR | POLLHUP,
+                          lxcMonitorEvent,
+                          driver) < 0) {
+        lxcVmTerminate(conn, driver, vm, 0);
+        goto cleanup;
+    }
+
+    rc = 0;
+
 cleanup:
-    close(sockpair[0]);
-    close(sockpair[1]);
-    VIR_FREE(containerTtyPath);
-
+    for (i = 0 ; i < nveths ; i++) {
+        if (rc != 0)
+            vethDelete(veths[i]);
+        VIR_FREE(veths[i]);
+    }
+    if (monitor != -1)
+        close(monitor);
+    if (rc != 0 && vm->monitor != -1) {
+        close(vm->monitor);
+        vm->monitor = -1;
+    }
+    if (parentTty != -1)
+        close(parentTty);
+    if (logfd != -1)
+        close(logfd);
+    VIR_FREE(logfile);
     return rc;
 }
 
@@ -752,105 +827,18 @@
  */
 static int lxcDomainShutdown(virDomainPtr dom)
 {
-    int rc = -1;
     lxc_driver_t *driver = (lxc_driver_t*)dom->conn->privateData;
     lxc_vm_t *vm = lxcFindVMByID(driver, dom->id);
 
     if (!vm) {
         lxcError(dom->conn, dom, VIR_ERR_INVALID_DOMAIN,
                  _("no domain with id %d"), dom->id);
-        goto error_out;
+        return -1;
     }
 
-    if (0 > (kill(vm->def->id, SIGINT))) {
-        if (ESRCH != errno) {
-            lxcError(dom->conn, dom, VIR_ERR_INTERNAL_ERROR,
-                     _("sending SIGTERM failed: %s"), strerror(errno));
-
-            goto error_out;
-        }
-    }
-
-    vm->state = VIR_DOMAIN_SHUTDOWN;
-
-    rc = 0;
-
-error_out:
-    return rc;
+    return lxcVmTerminate(dom->conn, driver, vm, 0);
 }
 
-/**
- * lxcVmCleanup:
- * @vm: Ptr to VM to clean up
- *
- * waitpid() on the container process.  kill and wait the tty process
- * This is called by boh lxcDomainDestroy and lxcSigHandler when a
- * container exits.
- *
- * Returns 0 on success or -1 in case of error
- */
-static int lxcVMCleanup(lxc_driver_t *driver, lxc_vm_t * vm)
-{
-    int rc = -1;
-    int waitRc;
-    int childStatus = -1;
-
-    /* if this fails, we'll continue.  it will report any errors */
-    lxcCleanupInterfaces(vm);
-
-    while (((waitRc = waitpid(vm->def->id, &childStatus, 0)) == -1) &&
-           errno == EINTR);
-
-    if ((waitRc != vm->def->id) && (errno != ECHILD)) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("waitpid failed to wait for container %d: %d %s"),
-                 vm->def->id, waitRc, strerror(errno));
-        goto kill_tty;
-    }
-
-    rc = 0;
-
-    if (WIFEXITED(childStatus)) {
-        rc = WEXITSTATUS(childStatus);
-        DEBUG("container exited with rc: %d", rc);
-    }
-
-kill_tty:
-    if (2 > vm->pid) {
-        DEBUG("not killing tty process with pid %d", vm->pid);
-        goto tty_error_out;
-    }
-
-    if (0 > (kill(vm->pid, SIGKILL))) {
-        if (ESRCH != errno) {
-            lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                     _("sending SIGKILL to tty process failed: %s"),
-                     strerror(errno));
-
-            goto tty_error_out;
-        }
-    }
-
-    while (((waitRc = waitpid(vm->pid, &childStatus, 0)) == -1) &&
-           errno == EINTR);
-
-    if ((waitRc != vm->pid) && (errno != ECHILD)) {
-        lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR,
-                 _("waitpid failed to wait for tty %d: %d %s"),
-                 vm->pid, waitRc, strerror(errno));
-    }
-
-tty_error_out:
-    vm->state = VIR_DOMAIN_SHUTOFF;
-    vm->pid = -1;
-    lxcDeleteTtyPidFile(vm);
-    vm->def->id = -1;
-    driver->nactivevms--;
-    driver->ninactivevms++;
-    lxcSaveConfig(NULL, driver, vm, vm->def);
-
-    return rc;
- }
 
 /**
  * lxcDomainDestroy:
@@ -862,31 +850,16 @@
  */
 static int lxcDomainDestroy(virDomainPtr dom)
 {
-    int rc = -1;
     lxc_driver_t *driver = (lxc_driver_t*)dom->conn->privateData;
     lxc_vm_t *vm = lxcFindVMByID(driver, dom->id);
 
     if (!vm) {
         lxcError(dom->conn, dom, VIR_ERR_INVALID_DOMAIN,
                  _("no domain with id %d"), dom->id);
-        goto error_out;
+        return -1;
     }
 
-    if (0 > (kill(vm->def->id, SIGKILL))) {
-        if (ESRCH != errno) {
-            lxcError(dom->conn, dom, VIR_ERR_INTERNAL_ERROR,
-                     _("sending SIGKILL failed: %s"), strerror(errno));
-
-            goto error_out;
-        }
-    }
-
-    vm->state = VIR_DOMAIN_SHUTDOWN;
-
-    rc = lxcVMCleanup(driver, vm);
-
-error_out:
-    return rc;
+    return lxcVmTerminate(dom->conn, driver, vm, SIGKILL);
 }
 
 static int lxcCheckNetNsSupport(void)
@@ -907,6 +880,7 @@
 static int lxcStartup(void)
 {
     uid_t uid = getuid();
+    lxc_vm_t *vm;
 
     /* Check that the user is root */
     if (0 != uid) {
@@ -935,6 +909,36 @@
         return -1;
     }
 
+    vm = lxc_driver->vms;
+    while (vm) {
+        int rc;
+        if ((vm->monitor = lxcMonitorClient(NULL, lxc_driver, vm)) < 0) {
+            vm = vm->next;
+            continue;
+        }
+
+        /* Read pid from controller */
+        if ((rc = virFileReadPid(lxc_driver->stateDir, vm->def->name, &vm->pid)) != 0) {
+            close(vm->monitor);
+            vm->monitor = -1;
+            vm = vm->next;
+            continue;
+        }
+
+        if (vm->pid != 0) {
+            vm->def->id = vm->pid;
+            vm->state = VIR_DOMAIN_RUNNING;
+            lxc_driver->ninactivevms--;
+            lxc_driver->nactivevms++;
+        } else {
+            vm->def->id = -1;
+            close(vm->monitor);
+            vm->monitor = -1;
+        }
+
+        vm = vm->next;
+    }
+
     return 0;
 }
 
@@ -942,6 +946,7 @@
 {
     VIR_FREE(driver->configDir);
     VIR_FREE(driver->stateDir);
+    VIR_FREE(driver->logDir);
     VIR_FREE(driver);
 }
 
@@ -976,37 +981,6 @@
 
     /* Otherwise we're happy to deal with a shutdown */
     return 0;
-}
-
-/**
- * lxcSigHandler:
- * @siginfo: Pointer to siginfo_t structure
- *
- * Handles signals received by libvirtd.  Currently this is used to
- * catch SIGCHLD from an exiting container.
- *
- * Returns 0 on success or -1 in case of error
- */
-static int lxcSigHandler(siginfo_t *siginfo)
-{
-    int rc = -1;
-    lxc_vm_t *vm;
-
-    if (siginfo->si_signo == SIGCHLD) {
-        vm = lxcFindVMByID(lxc_driver, siginfo->si_pid);
-
-        if (NULL == vm) {
-            DEBUG("Ignoring SIGCHLD from non-container process %d\n",
-                  siginfo->si_pid);
-            goto cleanup;
-        }
-
-        rc = lxcVMCleanup(lxc_driver, vm);
-
-    }
-
-cleanup:
-    return rc;
 }
 
 
@@ -1079,7 +1053,7 @@
     lxcShutdown,
     NULL, /* reload */
     lxcActive,
-    lxcSigHandler
+    NULL,
 };
 
 int lxcRegister(void)
diff -r 8093fb566748 src/util.c
--- a/src/util.c	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/util.c	Tue Aug 05 12:13:24 2008 +0100
@@ -37,6 +37,7 @@
 #include <sys/wait.h>
 #endif
 #include <string.h>
+#include <termios.h>
 #include "c-ctype.h"
 
 #ifdef HAVE_PATHS_H
@@ -556,6 +557,163 @@
     return 0;
 }
 
+
+int virFileOpenTty(int *ttymaster,
+                   char **ttyName,
+                   int rawmode)
+{
+    int rc = -1;
+
+    if ((*ttymaster = posix_openpt(O_RDWR|O_NOCTTY|O_NONBLOCK)) < 0)
+        goto cleanup;
+
+    if (unlockpt(*ttymaster) < 0)
+        goto cleanup;
+
+    if (grantpt(*ttymaster) < 0)
+        goto cleanup;
+
+    if (rawmode) {
+        struct termios ttyAttr;
+        if (tcgetattr(*ttymaster, &ttyAttr) < 0)
+            goto cleanup;
+
+        cfmakeraw(&ttyAttr);
+
+        if (tcsetattr(*ttymaster, TCSADRAIN, &ttyAttr) < 0)
+            goto cleanup;
+    }
+
+    if (ttyName) {
+        char tempTtyName[PATH_MAX];
+        if (ptsname_r(*ttymaster, tempTtyName, sizeof(tempTtyName)) < 0)
+            goto cleanup;
+
+        if ((*ttyName = strdup(tempTtyName)) == NULL) {
+            errno = ENOMEM;
+            goto cleanup;
+        }
+    }
+
+    rc = 0;
+
+cleanup:
+    if (rc != 0 &&
+        *ttymaster != -1) {
+        close(*ttymaster);
+    }
+
+    return rc;
+
+}
+
+
+int virFileWritePid(const char *dir,
+                    const char *name,
+                    pid_t pid)
+{
+    int rc;
+    int fd;
+    FILE *file = NULL;
+    char *pidfile = NULL;
+
+    if ((rc = virFileMakePath(dir))) {
+        goto cleanup;
+    }
+
+    if (asprintf(&pidfile, "%s/%s.pid", dir, name) < 0) {
+        rc = ENOMEM;
+        goto cleanup;
+    }
+
+    if ((fd = open(pidfile,
+                   O_WRONLY | O_CREAT | O_TRUNC,
+                   S_IRUSR | S_IWUSR)) < 0) {
+        rc = errno;
+        goto cleanup;
+    }
+
+    if (!(file = fdopen(fd, "w"))) {
+        rc = errno;
+        close(fd);
+        goto cleanup;
+    }
+
+    if (fprintf(file, "%d", pid) < 0) {
+        rc = errno;
+        goto cleanup;
+    }
+
+    rc = 0;
+
+cleanup:
+    if (file &&
+        fclose(file) < 0) {
+        rc = errno;
+    }
+
+    VIR_FREE(pidfile);
+    return rc;
+}
+
+int virFileReadPid(const char *dir,
+                   const char *name,
+                   pid_t *pid)
+{
+    int rc;
+    FILE *file;
+    char *pidfile = NULL;
+    *pid = 0;
+    if (asprintf(&pidfile, "%s/%s.pid", dir, name) < 0) {
+        rc = ENOMEM;
+        goto cleanup;
+    }
+
+    if (!(file = fopen(pidfile, "r"))) {
+        rc = errno;
+        goto cleanup;
+    }
+
+    if (fscanf(file, "%d", pid) != 1) {
+        rc = EINVAL;
+        goto cleanup;
+    }
+
+    if (fclose(file) < 0) {
+        rc = errno;
+        goto cleanup;
+    }
+
+    rc = 0;
+
+ cleanup:
+    VIR_FREE(pidfile);
+    return rc;
+}
+
+int virFileDeletePid(const char *dir,
+                     const char *name)
+{
+    int rc = 0;
+    char *pidfile = NULL;
+
+    if (asprintf(&pidfile, "%s/%s.pid", dir, name) < 0) {
+        rc = errno;
+        goto cleanup;
+    }
+
+    if (unlink(pidfile) < 0 &&
+        errno != ENOENT) {
+        rc = errno;
+    }
+
+cleanup:
+    VIR_FREE(pidfile);
+    return rc;
+}
+
+
+
 /* Like strtol, but produce an "int" result, and check more carefully.
    Return 0 upon success;  return -1 to indicate failure.
    When END_PTR is NULL, the byte after the final valid digit must be NUL.
diff -r 8093fb566748 src/util.h
--- a/src/util.h	Fri Aug 01 14:47:33 2008 +0100
+++ b/src/util.h	Tue Aug 05 12:13:24 2008 +0100
@@ -58,6 +58,19 @@
                      char *buf,
                      unsigned int buflen);
 
+int virFileOpenTty(int *ttymaster,
+                   char **ttyName,
+                   int rawmode);
+
+
+int virFileWritePid(const char *dir,
+                    const char *name,
+                    pid_t pid);
+int virFileReadPid(const char *dir,
+                   const char *name,
+                   pid_t *pid);
+int virFileDeletePid(const char *dir,
+                     const char *name);
 
 int __virStrToLong_i(char const *s,
                      char **end_ptr,


-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list