[vfio-users] Calling ALL VM experts - Seeking assistance with AMD FX CPUs

Wed Feb 1 11:16:50 UTC 2017

On 02/01/2017 04:28 AM, Alyx wrote:
> If I was to boot up the VM into Linux are there any tests I can do in a
> Linux VM environment to help figure out what the issue is? Since no options
> seem to reveal the problem and I presume Linux has more tools to deduce the
> specifics of this issue.

I have no experience with amd hardware but some ideas come to mind. Looks like
you tried all the regular techniques to improve performance and latency already.

There might be some source of latency in your actual hardware. You can test this
by booting a 4.9 kernel and use the new hwlat tracer. I suppose using a linux
live cd is good enough. Some of the exton live cds got a 4.9 kernel like this
http://linux.exton.net/?p=820

Boot a system with 4.9, make sure debugfs is mounted and run
"echo hwlat > /sys/kernel/debug/tracing/current_tracer"
You'll get lines like this in /sys/kernel/debug/tracing/trace

<...>-3728  [000] dn..   668.336945: #1     inner/outer(us):   31/9     ts:1484487305.141866647

The interesting part is 31/9. The two numbers tells you the max hardware latency
in us so in this example it's 31 us. If you get high numbers like 1000us the
latency might impact your system. The system will behave unusual while the hwlat
tracer is running. That's normal. Try to use various hardware features while the
tracer is running. If you have an integrated gpu try to use that. I get hwlat
spikes only when my intel igpu use hardware video decoding for movies. I assume
it steals memory bandwidth or something. Actually installing 4.9 on your real
system would be preferred because you can test the latency under your normal
workload.
https://lwn.net/Articles/703129/

Hardware latencies on the host probably isn't the answer because you would
notice that while using the host. If I understand your problem you only get
stalls in the VM. Something you could test is to use hwlat in the guest. Setup
the guest with pinning and reservation of cores for good latency and boot that
live cd. Run the test the same way and see what latencies you get. If you get
high latency in the guest but not on the host then the problem is unique to the
vm environment. When you boot that live cd you might want to append
"tsc=reliable" to the kernel command line in grub before booting. That will
force the use of TSC for timing in the guest which might reveal problems with
that timer.

If you do see high latency in the guest a possible explanation is that some
kernel thread hogs the cpu on the host. You can test this by running
"perf record -e "sched:sched_switch" -C 1,2,3" on the host while the vm is
running. The numbers are the cpu cores where you pinned the VM. This will record
every time a process is scheduled to run on those cores. Use the guest until you
observe the stall, stop perf and run
"perf report --fields=sample,overhead,cpu,comm". This will show all processes
that ran on those cores. You would expect to see things like "CPU #/KVM",
"kvm-pit", "swapper" and perhaps qemu but not much else. Swapper is apparently
the process that does busy waiting when the kernel doesn't have anything to do
so seeing that just means the kernel was idle. If you see stuff like kworker or
ksoftirqd that means some kthreads ran on the same core as your VM which could
result in the problems you describe.

If there are kworker threads meddling with the VM finding out what they do can
be tricky. Try running
"echo "workqueue:workqueue_queue_work" > /sys/kernel/debug/tracing/set_event"
on the host while the VM is running. This will trace every time a kworker runs.
Then look in the file "/sys/kernel/debug/tracing/per_cpu/cpu1/trace"
(assuming your VM runs on cpu 1). You will get lines like this:
kworker/u16:6-5206  [001] d..2  5351.981584: workqueue_queue_work: work struct=ffff88036a44b0d0 function=do_worker workqueue=ffff88041ba25c00 req_cpu=8 cpu=4294967295
The interesting part is function=do_worker. This tells us that the kernel
function "do_worker" has been running in a kworker on that cpu. "do_worker" is
not the most descriptive name so to find out what it is you have to grep the
kernel source. If you do you'll see that do_worker is in the file
drivers/md/dm-thin.c and is part of the device mapper thin target code used by
LVM thin partitions. What work is running in kworker threads is depending on
your setup and hardware so you'll have to draw your own conclusions depending on
what you find. I had problems with dm-thin but for you it could be anything.

Since you are using win10 you probably use the hypervclock in windows and there
shouldn't be any problems with that but since your guest stalls and then catch
up I thought the problem might be related to the timer used in the guest. I know
TSC might sometimes run backwards in the guest which can confuse the software in
the guest. I don't know how to reliably set the timer source in windows but on
linux you can check
/sys/devices/system/clocksource/clocksource0/available_clocksource for what
timers exists (you have to boot the guest with tsc=reliable for tsc to show up).
You can set a timer with echo "acpi_pm" > 
/sys/devices/system/clocksource/clocksource0/current_clocksource and then check
if you get any stalls in the guest. Perhaps a video like this could be helpful
to spot the stalls https://www.youtube.com/watch?v=cuXsupMuik4

The only other thing I can think of is that there are some differences in how
the hardware virtualisation work between intel and amd. I think I read somewhere
that disabling the hardware accelerated nested page tables on amd gives a
performance boot. But that doesn't make sense to me. The kvm_intel module got
various parameters for controlling things like this. I can list them from
/sys/module/kvm_intel/parameters/*. Check out what parameters the kvm_amd
modules has. Perhaps you'll find something interesting there.