[libvirt] [PATCH 05/14] Rewrite pidfile handling to be crash safe

Daniel P. Berrange berrange at redhat.com
Mon Jul 11 09:30:21 UTC 2011


On Fri, Jul 08, 2011 at 04:33:28PM -0600, Eric Blake wrote:
> On 07/07/2011 08:17 AM, Daniel P. Berrange wrote:
> > From: "Daniel P. Berrange" <berrange at redhat.com>
> > 
> > If libvirtd crashes then the pidfile may not be cleaned up,
> > making a later restart fail, even though the original process
> > no longer exists.
> > 
> > Instead of simply using file creation as the test for successful
> > pidfile acquisition, actually take out a lock on the pidfile.
> > 
> > To avoid races, after locking the pidfile, verify that the
> > inode hasn't changed.
> > 
> > Also make sure the unprivileged libvirtd now acquires the
> > pidfile, instead of relying on the UNIX socket to ensure
> > mutual exclusion
> 
> Cool idea.
> 
> > -static int daemonWritePidFile(const char *pidFile, const char *argv0)
> > -{
> > +static int
> > +daemonAcquirePidFile(const char *argv0, const char *pidFile) {
> 
> Why'd you hoist the { to the previous line?  Our convention has been
> (slowly) leaning towards a function body starting on column 1.
> 
> >      int fd;
> > -    FILE *fh;
> >      char ebuf[1024];
> > +    unsigned long long pid = (unsigned long long)getpid();
> 
> See my comments in an earlier thread about our existing assumptions that
> pid_t fits in int.  In fact, I would be okay with:
> 
> verify(sizeof(pid_t) <= sizeof(int));
> 
> int pid = getpid();
> char pidstr[INT_BUFSIZE_BOUND(pid)];
> 
> ...
> snprintf(pidstr, sizeof(pidstr), "%u", pid);
> 
> 
> > +    char pidstr[INT_BUFSIZE_BOUND(pid)];
> >  
> >      if (pidFile[0] == '\0')
> >          return 0;
> >  
> > -    if ((fd = open(pidFile, O_WRONLY|O_CREAT|O_EXCL, 0644)) < 0) {
> > -        VIR_ERROR(_("Failed to open pid file '%s' : %s"),
> > -                  pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> > -        return -1;
> > -    }
> > +    while (1) {
> > +        struct stat a, b;
> > +        if ((fd = open(pidFile, O_WRONLY|O_CREAT, 0644)) < 0) {
> > +            VIR_ERROR(_("%s: Failed to open pid file '%s' : %s"),
> > +                      argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> > +            return -1;
> > +        }
> > +
> > +        if (fstat(fd, &b) < 0) {
> > +            VIR_ERROR(_("%s: Pid file '%s' disappeared: %s"),
> 
> Misleading error message.  fstat can indeed fail (although such failures
> are rare), but not because a file disappeared - after all, you have an
> fd open to the file.
> 
> > -    if (fprintf(fh, "%lu\n", (unsigned long)getpid()) < 0) {
> > -        VIR_ERROR(_("%s: Failed to write to pid file '%s' : %s"),
> > -                  argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> > -        VIR_FORCE_FCLOSE(fh);
> > -        return -1;
> > -    }
> > +    snprintf(pidstr, sizeof(pidstr), "%llu", pid);
> >  
> > -    if (VIR_FCLOSE(fh) == EOF) {
> > -        VIR_ERROR(_("%s: Failed to close pid file '%s' : %s"),
> > -                  argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> > -        return -1;
> > +    if (safewrite(fd, pidstr, strlen(pidstr)) < 0) {
> 
> Nice conversion from FILE* to write().
> 
> > @@ -1436,6 +1453,12 @@ int main(int argc, char **argv) {
> >          umask(old_umask);
> >      }
> >  
> > +    /* Try to claim the pidfile, existing if we can't */
> 
> s/existing/exiting/
> 
> > +    if ((pid_file_fd = daemonAcquirePidFile(argv[0], pid_file)) < 0) {
> > +        ret = VIR_DAEMON_ERR_PIDFILE;
> > +        goto cleanup;
> > +    }
> > +
> >      if (!(srv = virNetServerNew(config->min_workers,
> >                                  config->max_workers,
> >                                  config->max_clients,
> > @@ -1570,8 +1593,10 @@ cleanup:
> >          }
> >          VIR_FORCE_CLOSE(statuswrite);
> >      }
> > -    if (pid_file)
> > -        unlink (pid_file);
> > +    if (pid_file_fd != -1) {
> > +        unlink(pid_file);
> > +        VIR_FORCE_CLOSE(pid_file_fd);
> 
> Swap these two lines.  Not that flock (or even libvirtd) works on mingw,
> but mingw doesn't like unlink() on an open fd.

If you swap those two lines you open up a (small) race condition
in the locking scheme where you could unlink a pidfile that has now
been claimed by someone else. By unlinking while we still have it
open, we know we still hold the fcntl() lock.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|




More information about the libvir-list mailing list