[libvirt] [PATCH v3 8/9] rpc: pass listen FD to the daemon being started
Daniel P. Berrange
berrange at redhat.com
Thu Aug 14 11:09:34 UTC 2014
On Thu, Aug 14, 2014 at 01:02:28PM +0200, Martin Kletzander wrote:
> On Thu, Aug 14, 2014 at 10:26:41AM +0100, Daniel P. Berrange wrote:
> >On Thu, Aug 14, 2014 at 11:23:27AM +0200, Martin Kletzander wrote:
> >>On Wed, Aug 13, 2014 at 04:09:09PM +0100, Daniel P. Berrange wrote:
> >>>On Wed, Jul 23, 2014 at 04:27:12PM +0200, Martin Kletzander wrote:
> >>>>This eliminates the need for active waiting.
> >>>>
> >>>>Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=927369
> >>>>
> >>>>Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
> >>>>---
> >>>> src/rpc/virnetsocket.c | 102 ++++++++++++++++++++++++++++++++++++++++---------
> >>>> 1 file changed, 83 insertions(+), 19 deletions(-)
> >>>>
> >>>>diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
> [...]
> >>>>@@ -569,28 +574,82 @@ int virNetSocketNewConnectUNIX(const char *path,
> >>>> if (remoteAddr.data.un.sun_path[0] == '@')
> >>>> remoteAddr.data.un.sun_path[0] = '\0';
> >>>>
> >>>>- retry:
> >>>>- if (connect(fd, &remoteAddr.data.sa, remoteAddr.len) < 0) {
> >>>>- if ((errno == ECONNREFUSED ||
> >>>>- errno == ENOENT) &&
> >>>>- spawnDaemon && retries < 20) {
> >>>>- VIR_DEBUG("Connection refused for %s, trying to spawn %s",
> >>>>- path, binary);
> >>>>- if (retries == 0 &&
> >>>>- virNetSocketForkDaemon(binary) < 0)
> >>>>- goto error;
> >>>>+ if (spawnDaemon) {
> >>>>+ int err = 0;
> >>>>+ int rv = -1;
> >>>>+ int status = 0;
> >>>>+ pid_t pid = 0;
> >>>>
> >>>>- retries++;
> >>>>- usleep(1000 * 100 * retries);
> >>>>- goto retry;
> >>>>+ if ((passfd = socket(PF_UNIX, SOCK_STREAM, 0)) < 0) {
> >>>>+ virReportSystemError(errno, "%s", _("Failed to create socket"));
> >>>>+ goto error;
> >>>> }
> >>>>
> >>>>- virReportSystemError(errno,
> >>>>- _("Failed to connect socket to '%s'"),
> >>>>+ if (pipe2(errfd, O_CLOEXEC) < 0) {
> >>>>+ virReportSystemError(errno, "%s",
> >>>>+ _("Cannot create pipe for child"));
> >>>>+ goto error;
> >>>>+ }
> >>>>+
> >>>>+ /*
> >>>>+ * We have to fork() here, because umask() is set
> >>>>+ * per-process, chmod() is racy and fchmod() has undefined
> >>>>+ * behaviour on sockets according to POSIX, so it doesn't
> >>>>+ * work outside Linux.
> >>>>+ */
> >>>>+
> >>>>+ if ((pid = virFork()) < 0)
> >>>>+ goto error;
> >>>>+
> >>>>+ if (pid == 0) {
> >>>>+ VIR_FORCE_CLOSE(errfd[0]);
> >>>>+
> >>>>+ umask(0077);
> >>>>+ rv = bind(passfd, &remoteAddr.data.sa, remoteAddr.len);
> >>>>+ if (rv < 0) {
> >>>>+ ignore_value(safewrite(errfd[1], &errno, sizeof(int)));
> >>>>+ }
> >>>>+ VIR_FORCE_CLOSE(errfd[1]);
> >>>>+ _exit(rv < 0 ? EXIT_FAILURE : EXIT_SUCCESS);
> >>>>+ }
> >>>>+
> >>>>+ VIR_FORCE_CLOSE(errfd[1]);
> >>>>+ rv = virProcessWait(pid, &status, false);
> >>>>+ ignore_value(saferead(errfd[0], &err, sizeof(int)));
> >>>>+ VIR_FORCE_CLOSE(errfd[0]);
> >>>>+
> >>>>+ if (rv < 0 || status != EXIT_SUCCESS) {
> >>>>+ if (err) {
> >>>>+ virReportSystemError(err,
> >>>>+ _("Failed to bind socket to %s "
> >>>>+ "in child process"),
> >>>>+ path);
> >>>>+ } else {
> >>>>+ virReportError(VIR_ERR_INTERNAL_ERROR,
> >>>>+ _("Failed to bind socket to %s "
> >>>>+ "in child process"),
> >>>>+ path);
> >>>>+ }
> >>>>+ goto error;
> >>>>+ }
> >>>>+
> >>>>+ if (listen(passfd, 0) < 0) {
> >>>>+ virReportSystemError(errno, "%s",
> >>>>+ _("Failed to listen on socket that's about "
> >>>>+ "to be passed to the daemon"));
> >>>>+ goto error;
> >>>>+ }
> >>>
> >>>Perhaps I'm miss-reading the code, but isn't this block gonig to
> >>>result in failure if libvirtd is already running (& listening on
> >>>the socket we try to bind to) and we requested auto-spawn ?
> >>>
> >>>>+ }
> >>>>+
> >>>>+ if (connect(fd, &remoteAddr.data.sa, remoteAddr.len) < 0) {
> >>>>+ virReportSystemError(errno, _("Failed to connect socket to '%s'"),
> >>>> path);
> >>>> goto error;
> >>>> }
> >>>
> >>>Also I think there's still a race here because the already running
> >>>libvirtd daemon could be unfortunately shutting down right at the
> >>>same time we try to connect. So I think we still need to have a
> >>>re-try loop around the connect attempt to address that race. Obviously
> >>>it should be pretty damn rare now so won't have the problems that the
> >>>current loop has.
> >>>
> >>
> >>Wouldn't connect(), bind(), connect() be enough (for both issues)? Or
> >>do we need to try the dance few times again?
> >
> >We need to retry the whole block including daemon spawn to be able to
> >handle it if the existing daemon is in the process of shutting down
> >I believe.
> >
>
> I probably expressed myself badly with that connect, bind, connect.
> This was the difference I was talking about:
>
> diff --git i/src/rpc/virnetsocket.c w/src/rpc/virnetsocket.c
> index 46be541..a4d5dd5 100644
> --- i/src/rpc/virnetsocket.c
> +++ w/src/rpc/virnetsocket.c
> @@ -574,7 +574,12 @@ int virNetSocketNewConnectUNIX(const char *path,
> if (remoteAddr.data.un.sun_path[0] == '@')
> remoteAddr.data.un.sun_path[0] = '\0';
>
> - if (spawnDaemon) {
> + if (connect(fd, &remoteAddr.data.sa, remoteAddr.len) < 0 &&
> + !spawnDaemon) {
> + virReportSystemError(errno, _("Failed to connect socket to '%s'"),
> + path);
> + goto error;
> + } else if (spawnDaemon) {
> int err = 0;
> int rv = -1;
> int status = 0;
> @@ -639,16 +644,16 @@ int virNetSocketNewConnectUNIX(const char *path,
> "to be passed to the daemon"));
> goto error;
> }
> - }
>
> - if (connect(fd, &remoteAddr.data.sa, remoteAddr.len) < 0) {
> - virReportSystemError(errno, _("Failed to connect socket to '%s'"),
> - path);
> - goto error;
> - }
> + if (connect(fd, &remoteAddr.data.sa, remoteAddr.len) < 0) {
> + virReportSystemError(errno, _("Failed to connect socket to '%s'"),
> + path);
> + goto error;
> + }
>
> - if (spawnDaemon && virNetSocketForkDaemon(binary, passfd) < 0)
> - goto error;
> + if (virNetSocketForkDaemon(binary, passfd) < 0)
> + goto error;
> + }
>
> localAddr.len = sizeof(localAddr.data);
> if (getsockname(fd, &localAddr.data.sa, &localAddr.len) < 0) {
It is a bit hard to follow the diff-upon-diff :-) IIUC the logic is
1. Try to connect()
...fails because no daemon is running...
2. fd = socket()
3. bind(fd)
4. connect(fd)
5. spawn daemon with fd
What I'm unclear on is what happens if 2 separate processes are runing
steps 2/3/4 in parallel. IIUC with the changes here one process is
going to succeeed with bind() and the other process will get a failure.
That other process getting a failure should go back to step 1 and try
to connect to the socket that the first process successsfully bound
to. I don't think your code is handling that scenario, which I think
requires at least a single retry loop.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
More information about the libvir-list
mailing list