[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt-users] blockcommit of domain not successfull



On Tue, Jun 04, 2019 at 14:44:29 +0200, Lentes, Bernd wrote:
> Hi,

Hi, 

> 
> i have several domains running on a 2-node HA-cluster.
> Each night i create snapshots of the domains, after copying the consistent raw file to a CIFS server i blockcommit the changes into the raw files.
> That's running quite well.
> But recent the blockcommit didn't work for one domain:
> I create a logfile from the whole procedure:
> ===============================================================
>  ...
> Sat Jun  1 03:05:24 CEST 2019
> Target     Source
> ------------------------------------------------
> vdb        /mnt/snap/severin.sn
> hdc        -
> 
> /usr/bin/virsh blockcommit severin /mnt/snap/severin.sn --verbose --active --pivot
> Block commit: [  0 %]Block commit: [ 15 %]Block commit: [ 28 %]Block commit: [ 35 %]Block commit: [ 43 %]Block commit: [ 53 %]Block commit: [ 63 %]Block commit: [ 73 %]Block commit: [ 82 %]Block commit: [ 89 %]Block commit: [ 98 %]Block commit: [100 %]Target     Source
> ------------------------------------------------
> vdb        /mnt/snap/severin.sn
>  ...
> ==============================================================
> 
> The libvirtd-log says (it's UTC IIRC):
> =============================================================
>  ...
> 2019-05-31 20:31:34.481+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor
> 2019-06-01 01:05:32.233+0000: 4170: error : qemuMonitorIO:719 : internal error: End of file from qemu monitor

This message is printed if qemu crashes for some reason and then closes
the monitor socket unexpectedly.

> 2019-06-01 01:05:43.804+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:05:43.848+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:06:11.438+0000: 26112: warning : qemuDomainObjBeginJobInternal:4865 : Cannot start job (destroy, none) for doma
> in severin; current job is (modify, none) owned by (5372 remoteDispatchDomainBlockJobAbort, 0 <null>) for (39s, 0s)
> 2019-06-01 01:06:11.438+0000: 26112: error : qemuDomainObjBeginJobInternal:4877 : Timed out during operation: cannot acquire
> state change lock (held by remoteDispatchDomainBlockJobAbort)

So this means that the virDomainBlockJobAbort API which is also used for
--pivot got stuck for some time.

This is kind of strange if the VM crashed, there might also be a bug in
the synchronous block job handling, but it's hard to tell from this log.

> 2019-06-01 01:06:13.976+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:06:14.028+0000: 22596: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:06:44.165+0000: 5371: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:06:44.218+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:07:14.343+0000: 5369: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:07:14.387+0000: 22598: warning : qemuGetProcessInfo:1461 : cannot parse process status data
> 2019-06-01 01:07:44.495+0000: 22605: warning : qemuGetProcessInfo:1461 : cannot parse process status data
>  ...
> ===========================================================
> and "cannot parse process status data" continuously until the end of the logfile.
> 
> The syslog from the domain itself didn't reveal anything, it just continues to run.
> The libvirt log from the domains just says:
> qemu-system-x86_64: block/mirror.c:864: mirror_run: Assertion `((&bs->tracked_requests)->lh_first == ((void *)0))' failed.

So that's interresting. Usually assertion failure in qemu leads to
calling abort() and thus the vm would have crashed. Didn't you HA
solution restart it?

At any rate it would be really beneficial if you could collect debug
logs for libvirtd which also contain the monitor interactions with qemu:

https://wiki.libvirt.org/page/DebugLogs

The qemu assertion failure above should ideally be reported to qemu, but
if you are able to reproduce the problem with libvirtd debug logs
enabled I can extract more useful info from there which the qemu project
would ask you anyways.

Attachment: signature.asc
Description: PGP signature


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]