[PATCH] libvirt-guests.sh: fix log output with new systemd

Christian Ehrhardt christian.ehrhardt at canonical.com
Thu Apr 30 19:07:13 UTC 2020


On Thu, Apr 30, 2020 at 7:59 PM Daniel P. Berrangé <berrange at redhat.com> wrote:
>
> On Thu, Apr 30, 2020 at 07:36:54PM +0200, Christian Ehrhardt wrote:
> > On Thu, Apr 30, 2020 at 7:23 PM Daniel P. Berrangé <berrange at redhat.com> wrote:
> > >
> > > On Thu, Apr 30, 2020 at 06:28:08PM +0200, Christian Ehrhardt wrote:
> > > > On Thu, Apr 30, 2020 at 5:10 PM Daniel P. Berrangé <berrange at redhat.com> wrote:
> > > > >
> > > > > On Thu, Apr 30, 2020 at 04:58:25PM +0200, Christian Ehrhardt wrote:
> > > > > > On Thu, Apr 30, 2020 at 2:51 PM Daniel P. Berrangé <berrange at redhat.com> wrote:
> > > > > > Well it seems I have a reproducible symptom and a fix, but not the
> > > > > > explanation why the latter fixes the former.
> > > > > > I'll need to dive into some debug & analysis myself to explain it better.
> > > > > > I'll be back here once I got time to do that in depth check.
> > > > > >
> > > > > > Until then whoever is affected (should be everyone) can give it a
> > > > > > thought as well.
> > > > > > Repro is as easy as
> > > > > > One console:
> > > > > > $ journalctl -f -u libvirt-guests
> > > > > > Other console:
> > > > > > $ systemctl stop libvirt-guests
> > > > > > $ systemctl start libvirt-guests
> > > > > >
> > > > > > I see it with 245.4-4ubuntu3 (18.04) I'm not seeing it on
> > > > > > 237-3ubuntu10.39 (20.04).
> > > > > > Maybe it is a systemd bug after all?
> > > > > > I'd be interested to hear if that is/isn't clobbering output for
> > > > > > anyone else and what your systemd versions are?
> > > > >
> > > > > FWIW, it works correctly on Fedora 31 with systemd 243.
> > > >
> > > > Eoan with 242-7ubuntu3.7 is good as well.
> > > > I might need to try to get some interim versions from somewhere.
> > >
> > > I've reproduced on Fedora 33 rawhide with systemd 245 - the first
> > > place where it lists running guests is screwed up slightly:
> > >
> > > Apr 30 17:16:13 libvirt-fedora-rawhide systemd[1]: Stopping Suspend/Resume Running libvirt Guests...
> > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69903]: Running guests on default URI:
> > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Runningcore2
> > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69934]: Suspending guests on default URI...
> > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69892]: SSuspending core1: ...
> > > Apr 30 17:16:14 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Suspending core1: done
> > > Apr 30 17:16:14 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Suspending core2: ...
> > > Apr 30 17:16:15 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Suspending core2: done
> > > Apr 30 17:16:15 libvirt-fedora-rawhide systemd[1]: libvirt-guests.service: Succeeded.
> > > Apr 30 17:16:15 libvirt-fedora-rawhide systemd[1]: Stopped Suspend/Resume Running libvirt Guests.
> > >
> > > On resume it is even worse
> > >
> > > Apr 30 17:19:40 libvirt-fedora-rawhide systemd[1]: Starting Suspend/Resume Running libvirt Guests...
> > > Apr 30 17:19:40 libvirt-fedora-rawhide libvirt-guests.sh[70041]: Resuming guests on default URI...
> > > Apr 30 17:19:40 libvirt-fedora-rawhide libvirt-guests.sh[70030]: R
> > > Apr 30 17:19:40 libvirt-fedora-rawhide libvirt-guests.sh[70048]: R
> > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70048]: esuming guest core1:
> > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70079]: esum
> > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70030]: e
> > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70086]: e
> > > Apr 30 17:19:42 libvirt-fedora-rawhide libvirt-guests.sh[70086]: esuming guest core2:
> > > Apr 30 17:19:42 libvirt-fedora-rawhide libvirt-guests.sh[70119]: esum
> > > Apr 30 17:19:42 libvirt-fedora-rawhide libvirt-guests.sh[70030]: e
> > > Apr 30 17:19:42 libvirt-fedora-rawhide systemd[1]: Finished Suspend/Resume Running libvirt Guests.
> > >
> > >
> > > It is possible it isn't systemd related - could be other packages that
> > > are co-incidentally affecting it.
> >
> > Since launchpad holds all former builds still ready for download I
> > could easily test a few versions.
> >
> > I up/downgraded just the following packages on an otherwise unmodified system:
> > - libnss-systemd
> > - libpam-systemd
> > - libsystemd0
> > - systemd
> > - systemd-container
> > - systemd-sysv
> >
> > Those versions I did check:
> >
> > 245.4-4ubuntu3 bad
> > 245.2-1ubuntu1 bad
> > 244.3-1ubuntu1 bad
> > 244.1-0ubuntu3 bad (bad on retry)
> > 243-3ubuntu1 good (good (on retry)
> > 242-7ubuntu3 good
> >
> > Retry means that I went and installed from good -> bad -> good and the
> > behavior was the same.
> > So none of the installs fixed it to then be good forever. It was
> > consistently good <244 and bad >=244.
>
> Based on that, a possibly interesting change between v243..v244 was
> this:
>
> commit 09d0b46ab61bebafe5bdc1be95ee153dfb13d6bc
> Author: Lorenz Bauer <lmb at cloudflare.com>
> Date:   Mon Nov 4 16:35:46 2019 +0000
>
>     journal: refresh cached credentials of stdout streams

FYI - reported to systemd https://github.com/systemd/systemd/issues/15654

>     journald assumes that getsockopt(SO_PEERCRED) correctly identifies the
>     process on the remote end of the socket. However, this is incorrect
>     according to man 7 socket:
>
>         The returned  credentials  are  those that were in effect at the
>         time of the call to connect(2) or socketpair(2).
>
>     This becomes a problem when a new process inherits the stdout stream
>     from a parent. First, log messages from the child process will
>     be attributed to the parent. Second, the struct ucred used by journald
>     becomes invalid as soon as the parent exits. Further sendmsg calls then
>     fail with ENOENT. Logs for the child process then vanish from the journal.
>
>     Fix this by using recvmsg on the stdout stream, and refreshing the cached
>     struct ucred if SCM_CREDENTIALS indicate a new process.
>
>
> but I can't see an obvious mistake the code which would cause the
> behaviour we see.
>
> Regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
>


-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd





More information about the libvir-list mailing list