[libvirt-users] systemctl libvirt-guests.service fails to start during host boot

Andrea Bolognani abologna at redhat.com
Mon Sep 26 15:43:18 UTC 2016


On Thu, 2016-09-15 at 18:10 +0200, Benoit wrote:
> Hi Andrea,

Hi,

sorry for taking so long to reply.

I'm still unable to reproduce your failure, which of course
makes it very difficult to debug :(

But I might have a couple leads, so let's try and move
forward with that.

> > Since you're running an Arch variant, I'm going to assume
> > all your software is basically at the latest version all
> > around, isn't it?
> 
> Yep exactly, all software / kernel are in latest version

>From the log below it looks like you're running 2.1.0, which
might be the latest version your distribution provides, but
the same is not true when it comes to upstream development.

> > Can you try adding
>> >    [Unit]
> >    Requires=libvirtd.service
>> > to your libvirt-guests.service via 'systemctl edit' and
> > see if it helps? I don't think it should be needed for your
> > use case, but it's probably a good idea to have it there
> > anyways.
> 
> I have tried and as you expected it didn't changed anything, same issue.

Can you please provide the output of

  # systemctl cat libvirt-guests.service

and

  # systemctl show libvirt-guests.service

as well? Those could contain helpful information.

> I made more test and it is very reproductible, let me give you my steps 
> to reproduce it :
> 
> -I shutdown -r now my host.
> 
> -when it restarted, first thing to check is : systemctl status 
> libvirt-guests.service, I got :
> 
>   libvirt-guests.service - Suspend Active Libvirt Guests
>     Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; 
> enabled; vendor preset: disabled)

>From the output of systemctl it looks like you didn't add
the Requires=libvirtd.service relationship. Maybe you just
removed it after testing?

>     Active: failed (Result: exit-code) since Thu 2016-09-15 17:40:07 
> CEST; 1min 47s ago
>       Docs: man:libvirtd(8)
>             http://libvirt.org
>   Main PID: 329 (code=exited, status=1/FAILURE)
> 
> Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed 
> to connect to the hypervisor
> Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection
> Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket 
> to '/var/run/libvirt/libvirt-sock': No such file or directory
> Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed 
> to connect to the hypervisor
> Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection
> Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket 
> to '/var/run/libvirt/libvirt-sock': No such file or directory
> Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Main process exited, 
> code=exited, status=1/FAILURE
> Sep 15 17:40:07 systemd[1]: Failed to start Suspend Active Libvirt Guests.
> Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Unit entered failed 
> state.
> Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Failed with result 
> 'exit-code'.
> 
> Even if the service is in FAILED mode, my guests are started correctly.

Is it possible that you have configured some (all?) of your
guests to be autostarted? That would explain why the guests
are running after reboot even though libvirt-guests failed
to connect to libvirtd.

Even then, libvirt-guests is written in such a way that the
fact a guest has already been started should not result in
failure.

I've checked, and it looks to me like the socket is created
very early in the libvirtd startup process, way before
attempting to start autostarted guests, so I don't see how
that could cause issues.

Two things stand out from the log:

  1) Resuming guest : error: failed to connect to the hypervisor

     That progress message is supposed to contain the guest
     name right after "guest", but you have an empty string
     instead. That's consistent with the script being unable
     to connect to libvirtd

  2) the script can't connect to libvirtd

     which is extremely weird because, right before looping
     throught the guests and starting them, a connectivity
     check is performed to make sure libvirtd can be reached

Can you please provide a full, verbose libvirtd log[1]?

Can you also add[2]

  set -x

to

  /usr/libexec/libvirt-guests.sh

and then provide the output of

  # systemctl -b 0 -u libvirt-guests.service

after rebooting the host?

> If I do a systemctl restart libvirt-guests.service, everything is good :
> libvirt-guests.service - Suspend Active Libvirt Guests
>     Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; 
> enabled; vendor preset: disabled)
>     Active: active (exited) since Thu 2016-09-15 17:42:01 CEST; 9min ago
>       Docs: man:libvirtd(8)
>             http://libvirt.org
>   Main PID: 605 (code=exited, status=0/SUCCESS)
> 
> Sep 15 17:42:01 systemd[1]: Starting Suspend Active Libvirt Guests...
> Sep 15 17:42:01 systemd[1]: Started Suspend Active Libvirt Guests.
> 
> Now if I do a shutdown -r now again, I can see both in my libvirt logs + 
> in my guests log that an ACPI commands has been sent from libvirt and my 
> guests shutdown gently, so everything is ok.
> 
> When I start again, and try to do a systemctl status 
> libvirt-guests.service, I got the same FAILED status as before....
> 
> I made a research on my logs :
> 
> Sep 15 17:39:29 libvirt-guests.sh[1164]: Shutting down guests on default 
> URI...
> Sep 15 17:39:29 libvirt-guests.sh[1164]: Starting shutdown on guest: ape
> Sep 15 17:39:30 libvirt-guests.sh[1164]: Waiting for guest ape to shut 
> down, 30 seconds left

Looks like you changed at least a couple of settings here,
eg. the timeout. Can you please provide the output of

  $ grep -Ev '^(#|$)' /etc/sysconfig/libvirt-guests

so we can be sure we're on the same page?

> Sep 15 17:39:35 libvirt-guests.sh[1164]: Waiting for guest ape to shut 
> down, 25 seconds left
> Sep 15 17:39:36 libvirtd[326]: internal error: End of file from monitor
> Sep 15 17:39:36 libvirtd[326]: Cannot write data: Broken pipe

[...]

> We see the same error as systemctl status plus some error like " 
> internal error: End of file from monitor" and others type.
> Don't know if I have anything to take a look there ?

That might be a red herring, I think it's just the QEMU
process being killed during shutdown and thus ending
communication with libvirtd abruptly.

> Finally, I also notice some very strange thing:
> If I stop the service with systemctl stop libvirt-guests.service it 
> tooks a while to execute the stop command and fails.
> 
> Sep 15 18:00:44 libvirt-guests.sh[978]: Starting shutdown on guest: ape
> Sep 15 18:00:45 libvirt-guests.sh[978]: Waiting for guest ape to shut 
> down, 30 seconds left
> Sep 15 18:00:50 libvirt-guests.sh[978]: Waiting for guest ape to shut 
> down, 25 seconds left
> Sep 15 18:00:55 libvirt-guests.sh[978]: Waiting for guest ape to shut 
> down, 20 seconds left
> Sep 15 18:01:00 libvirt-guests.sh[978]: Waiting for guest ape to shut 
> down, 15 seconds left
> Sep 15 18:01:06 libvirt-guests.sh[978]: Waiting for guest ape to shut 
> down, 10 seconds left
> Sep 15 18:01:11 libvirt-guests.sh[978]: Waiting for guest ape to shut 
> down, 5 seconds left
> Sep 15 18:01:15 libvirt-guests.sh[978]: Shutdown of guest ape failed to 
> complete in time.
> 
> So my guests are not able to shutdown properly but when I shutdown -r 
> now my host I can see that my guests have been shutdown gently, I guess 
> there is something related on my issue....

It looks like at least one of the guests is taking awfully
long to shut down. What if you bump up the timeout? Does it
manage to shut down before reaching it?


[1] Follow instructions at http://wiki.libvirt.org/page/DebugLogs
[2] When editing the file, add the new instruction *right
    after* the first line. Of course you should make a backup
    copy of the file before tweaking it
-- 
Andrea Bolognani / Red Hat / Virtualization




More information about the libvirt-users mailing list