[libvirt-users] 100% CPU when using nested virtualization

Kashyap Chamarthy kchamart at redhat.com
Fri Mar 11 09:11:26 UTC 2016


On Thu, Mar 10, 2016 at 10:29:08PM -0500, Digimer wrote:
> Hi all,
> 
>   I got a new laptop recently and what worked before no longer works
> (Fedora 23 on the laptops in both cases)...
>
>   I'm trying to get nested virtualization to work because I use the VMs
> on the laptop to simulate an HA cluster that itself hosts VMs. I don't
> care much at all about the performance of the nested VM, it's just there
> so that I can work on the cluster's code.
>
> 
>   When I try to provision a VM inside a VM (the host VM is CentOS/RHEL
> 6.7), the CPU load spikes to such a high degree that my ssh session
> times out after a while. The VM appears in libvirtd (as viewed by
> virt-manager on another machine), but the VM itself never starts.

Just to clearly restate your environment (on the problematic newer
laptop):

    - Physical host (L0) == Fedora 23
    - Guest hypervisor (L1) == CentOS 6.7
    - Nested guest (L2) == ?  (I'd assume CentOS 6.7; correct me if I'm
      wrong)
    
Hmm, then it's relatively old libvirt/QEMU/Kernel versions on the host.
It's one of the downsides with Nested Virt -- the explosion of test
matrix involving different Kernel combinations.  FWIW, I had the most
productive experience when I run newest stable Kernel on the physical
host; even better experience if all levels are running it too.

>   In one case, the VM host remained somewhat functional and killing
> kvm/qemu/libvirtd didn't reduce the CPU load.
> 
>   The main difference between the setups is that the older laptop had a
> Sandy bridge(? Thinkpad W530) and the new laptop is a Broadwell
> (Thinkpad P70).

[A side note on Broadwell CPUs, you might've noticed by now: Intel
released a microcode update to remove TSX; if you're using the model
without the TSX, your guest hypervisor (L1) should be using  the CPU
model Haswell-noTSX.]

>   I've tried to loading vhost_net without much luck.

What is preventing you from loading `vhost_net`?  Assuming VHOST_NET is
compiled in your Kernel, `sudo modprobe vhost_net` fails for you?  If
so, in what way?

On Fedora 23, it is compiled in by default:

    $ grep CONFIG_VHOST_NET /boot/config-4.3.5-300.fc23.x86_64 
    CONFIG_VHOST_NET=m

> I have, of course, enabled nesting on the actual hardware:
>
> cat /sys/module/kvm_intel/parameters/nested
> Y
> 
>   Any tips on how to debug?
> 
>   I'm in quite a pickle with this, so any and all help is much
>   appreciated.

You might want to try a few things to identify where the problem might
be:

  - `mpstat` shows processor details and CPU utilization, probably you
    might want to run that to get a general view.  There are several
    values it presents: %guest (guest code); %usr (QEMU device
    emulation), etc.  Refer its man page.

  - `kvm_stat` (provided by 'qemu-kvm-tools' package on Fedora-based
    systems), a `top`-like tool to show runtime statics of KVM events.
    An example of what the results look like[1].

  - Or use the 'perf' to record nested virtualization related KVM
    events:

        $ perf record -a -e kvm:kvm_exit -e kvm:kvm_entry \
            -e kvm:kvm_nested_vmexit -e kvm:kvm_nested_vmrun
    
    Or the 'perf kvm' utlitiy.  See here[2] for more details.

[1]
https://kashyapc.fedorapeople.org/virt/kvm_stat-VMCS-Shadowing/kvm_stat-L0-VMCS-Shadowing-enabled.txt
[2] http://www.linux-kvm.org/page/Perf_events

        

-- 
/kashyap




More information about the libvirt-users mailing list