[libvirt] [PATCH 05/14] Rewrite pidfile handling to be crash safe
Eric Blake
eblake at redhat.com
Fri Jul 8 22:33:28 UTC 2011
On 07/07/2011 08:17 AM, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" <berrange at redhat.com>
>
> If libvirtd crashes then the pidfile may not be cleaned up,
> making a later restart fail, even though the original process
> no longer exists.
>
> Instead of simply using file creation as the test for successful
> pidfile acquisition, actually take out a lock on the pidfile.
>
> To avoid races, after locking the pidfile, verify that the
> inode hasn't changed.
>
> Also make sure the unprivileged libvirtd now acquires the
> pidfile, instead of relying on the UNIX socket to ensure
> mutual exclusion
Cool idea.
> -static int daemonWritePidFile(const char *pidFile, const char *argv0)
> -{
> +static int
> +daemonAcquirePidFile(const char *argv0, const char *pidFile) {
Why'd you hoist the { to the previous line? Our convention has been
(slowly) leaning towards a function body starting on column 1.
> int fd;
> - FILE *fh;
> char ebuf[1024];
> + unsigned long long pid = (unsigned long long)getpid();
See my comments in an earlier thread about our existing assumptions that
pid_t fits in int. In fact, I would be okay with:
verify(sizeof(pid_t) <= sizeof(int));
int pid = getpid();
char pidstr[INT_BUFSIZE_BOUND(pid)];
...
snprintf(pidstr, sizeof(pidstr), "%u", pid);
> + char pidstr[INT_BUFSIZE_BOUND(pid)];
>
> if (pidFile[0] == '\0')
> return 0;
>
> - if ((fd = open(pidFile, O_WRONLY|O_CREAT|O_EXCL, 0644)) < 0) {
> - VIR_ERROR(_("Failed to open pid file '%s' : %s"),
> - pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> - return -1;
> - }
> + while (1) {
> + struct stat a, b;
> + if ((fd = open(pidFile, O_WRONLY|O_CREAT, 0644)) < 0) {
> + VIR_ERROR(_("%s: Failed to open pid file '%s' : %s"),
> + argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> + return -1;
> + }
> +
> + if (fstat(fd, &b) < 0) {
> + VIR_ERROR(_("%s: Pid file '%s' disappeared: %s"),
Misleading error message. fstat can indeed fail (although such failures
are rare), but not because a file disappeared - after all, you have an
fd open to the file.
> - if (fprintf(fh, "%lu\n", (unsigned long)getpid()) < 0) {
> - VIR_ERROR(_("%s: Failed to write to pid file '%s' : %s"),
> - argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> - VIR_FORCE_FCLOSE(fh);
> - return -1;
> - }
> + snprintf(pidstr, sizeof(pidstr), "%llu", pid);
>
> - if (VIR_FCLOSE(fh) == EOF) {
> - VIR_ERROR(_("%s: Failed to close pid file '%s' : %s"),
> - argv0, pidFile, virStrerror(errno, ebuf, sizeof ebuf));
> - return -1;
> + if (safewrite(fd, pidstr, strlen(pidstr)) < 0) {
Nice conversion from FILE* to write().
> @@ -1436,6 +1453,12 @@ int main(int argc, char **argv) {
> umask(old_umask);
> }
>
> + /* Try to claim the pidfile, existing if we can't */
s/existing/exiting/
> + if ((pid_file_fd = daemonAcquirePidFile(argv[0], pid_file)) < 0) {
> + ret = VIR_DAEMON_ERR_PIDFILE;
> + goto cleanup;
> + }
> +
> if (!(srv = virNetServerNew(config->min_workers,
> config->max_workers,
> config->max_clients,
> @@ -1570,8 +1593,10 @@ cleanup:
> }
> VIR_FORCE_CLOSE(statuswrite);
> }
> - if (pid_file)
> - unlink (pid_file);
> + if (pid_file_fd != -1) {
> + unlink(pid_file);
> + VIR_FORCE_CLOSE(pid_file_fd);
Swap these two lines. Not that flock (or even libvirtd) works on mingw,
but mingw doesn't like unlink() on an open fd.
--
Eric Blake eblake at redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 619 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20110708/c984f0f9/attachment-0001.sig>
More information about the libvir-list
mailing list