'migrate' says it worked but in reality it did not - centOS 9

lejeczek peljasz at yahoo.co.uk
Tue Jan 11 18:42:17 UTC 2022



On 11/01/2022 17:33, Daniel P. Berrangé wrote:
> On Tue, Jan 11, 2022 at 05:14:53PM +0000, lejeczek wrote:
>>
>> On 11/01/2022 16:36, Daniel P. Berrangé wrote:
>>> On Tue, Jan 11, 2022 at 04:30:11PM +0000, lejeczek wrote:
>>>> Hi guys.
>>>>
>>>> I have a peculiar situation where between boxes:
>>>> C->A
>>>> -> $ virsh migrate --unsafe --live c8kubermaster1
>>>> qemu+ssh://10.1.1.99/system
>>>> -> $ echo $?
>>>> 0
>>>> but above does _not_ happen, instead!! VM was stopped in started, but _not_
>>>> migrated LIVE
>>>>
>>>> A->C
>>>> -> $ virsh migrate --unsafe --live c8kubermaster1
>>>> qemu+ssh://10.1.1.100/system
>>>> -> $ echo $?
>>>> 0
>>>> indeed VM migrates live.
>>>>
>>>> box A & C have virtually identical OS stack,
>>>> HW difference is:
>>>> C = Ryzen 5 5600G
>>>> A = Ryzen 5 3600
>>>>
>>>> domain XML snippet where I think it matters:
>>>> ...
>>>>     </metadata>
>>>>     <memory unit='GiB'>4</memory>
>>>>     <currentMemory unit='GiB'>4</currentMemory>
>>>>     <vcpu placement='static'>2</vcpu>
>>>>     <resource>
>>>>       <partition>/machine</partition>
>>>>     </resource>
>>>>     <os>
>>>>       <type arch='x86_64' machine='pc-i440fx-rhel7.6.0'>hvm</type>
>>>>       <boot dev='hd'/>
>>>>     </os>
>>>>     <features>
>>>>       <acpi/>
>>>>       <apic/>
>>>>     </features>
>>>>     <cpu mode='custom' match='exact' check='full'>
>>>>       <model fallback='forbid'>EPYC-IBPB</model>
>>>>       <feature policy='require' name='ibpb'/>
>>>>       <feature policy='require' name='ssbd'/>
>>>>       <feature policy='require' name='virt-ssbd'/>
>>>>       <feature policy='disable' name='monitor'/>
>>>>       <feature policy='require' name='x2apic'/>
>>>>       <feature policy='require' name='hypervisor'/>
>>>>       <feature policy='disable' name='svm'/>
>>>>       <feature policy='require' name='topoext'/>
>>>>     </cpu>
>>>>     <clock offset='utc'>
>>>>       <timer name='rtc' tickpolicy='catchup'/>
>>>>       <timer name='pit' tickpolicy='delay'/>
>>>>       <timer name='hpet' present='no'/>
>>>>     </clock>
>>>>     <on_poweroff>destroy</on_poweroff>
>>>>     <on_reboot>restart</on_reboot>
>>>>     <on_crash>destroy</on_crash>
>>>>     <pm>
>>>>       <suspend-to-mem enabled='no'/>
>>>>       <suspend-to-disk enabled='no'/>
>>>>     </pm>
>>>>     <devices>
>>>>       <emulator>/usr/libexec/qemu-kvm</emulator>
>>>>       <disk type='file' device='disk'>
>>>> ...
>>>>
>>>> Initially I submitted a BZ against 'PCS' but continued to filled with it and
>>>> I find 'libvirt' might be the culprit(also?) here.
>>>> There is not much in logs, certainly nothing (with default verbosity) in
>>>> virtqemud.service
>>>> Is it that VM gets migrated but then is restarted on 'migrate_to' host? if
>>>> so then why?
>>>> How to start troubleshooting such 'monstrosity'? - all suggestions
>>>> appreciated.
>>> /var/log/libvirt/qemu/$GUEST.log  on both hosts should have more info
>>>
>> What if there is not much there neither?
>> migrate_to(host A) seems to show only config for qemu, no errors no
>> warnings.
>> migrate_from(host C) shows only:
>> ...
>> 2022-01-11 17:00:40.687+0000: initiating migration
>> 2022-01-11 17:00:43.413+0000: shutting down, reason=migrated
>> 2022-01-11T17:00:43.414063Z qemu-kvm: terminating on signal 15 from pid
>> 24022 (<unknown process>)
>>
>> no errors/warning but that 2nd line - ??
>>
>> Again, migrating back between the same two hosts - where LIVE succeeds
>> migrate_from(host A) also shows:
>> ...
>> 2022-01-11 17:10:27.921+0000: initiating migration
>> 2022-01-11 17:10:30.459+0000: shutting down, reason=migrated
>> 2022-01-11T17:10:30.460528Z qemu-kvm: terminating on signal 15 from pid
>> 73193 (<unknown process>
> Both those logs only show the state on the src QEMU during a
> migration op. There should be corresponding log for the dst
> QEMU at the same point in time.
both these logs are from two different hosts - did I fail to 
make it clear? - when these hosts are 'migrate_from' hosts.
migrate_to in both cases, when LIVE fails and when success, 
seem to show only qemu config, no errors no warning
> All tehse messages show that migration was successful from libvirt and
> QEMU's POV on the src.
>
> So I expect what'ps happening is that QEMU is crashing on the target
> host after migration has finished.
>
> Regards,
> Daniel
by qemu crash - do you mean that:
'qemu-kvm: terminating on signal'
? But as above, this shows for 'success' & 'failure'
Like I said, virtqemud.service shows nothing, no segfaults, 
nothing obvious.
Full picture, live migrations between hosts:
A<->B works
A,B->C works
C->A,B fails








More information about the libvirt-users mailing list