Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

Mon Oct 12 16:54:28 UTC 2020

On Sun, Oct 11, 2020 at 10:11:39AM -0400, harry harry wrote:
> Hi Maxim,
> 
> Thanks much for your reply.
> 
> On Sun, Oct 11, 2020 at 3:29 AM Maxim Levitsky <mlevitsk at redhat.com> wrote:
> >
> > On Sun, 2020-10-11 at 01:26 -0400, harry harry wrote:
> > > Hi QEMU/KVM developers,
> > >
> > > I am sorry if my email disturbs you. I did an experiment and found the
> > > guest physical addresses (GPAs) are not the same as the corresponding
> > > host virtual addresses (HVAs). I am curious about why; I think they
> > > should be the same. I am very appreciated if you can give some
> > > comments and suggestions about 1) why GPAs and HVAs are not the same
> > > in the following experiment; 2) are there any better experiments to
> > > look into the reasons? Any other comments/suggestions are also very
> > > welcome. Thanks!
> > >
> > > The experiment is like this: in a single vCPU VM, I ran a program
> > > allocating and referencing lots of pages (e.g., 100*1024) and didn't
> > > let the program terminate. Then, I checked the program's guest virtual
> > > addresses (GVAs) and GPAs through parsing its pagemap and maps files
> > > located at /proc/pid/pagemap and /proc/pid/maps, respectively. At
> > > last, in the host OS, I checked the vCPU's pagemap and maps files to
> > > find the program's HVAs and host physical addresses (HPAs); I actually
> > > checked the new allocated physical pages in the host OS after the
> > > program was executed in the guest OS.
> > >
> > > With the above experiment, I found GPAs of the program are different
> > > from its corresponding HVAs. BTW, Intel EPT and other related Intel
> > > virtualization techniques were enabled.
> > >
> > > Thanks,
> > > Harry
> > >
> > The fundemental reason is that some HVAs (e.g. QEMU's virtual memory addresses) are already allocated
> > for qemu's own use (e.g qemu code/heap/etc) prior to the guest starting up.
> >
> > KVM does though use quite effiecient way of mapping HVA's to GPA. It uses an array of arbitrary sized HVA areas
> > (which we call memslots) and for each such area/memslot you specify the GPA to map to. In theory QEMU
> > could allocate the whole guest's memory in one contiguous area and map it as single memslot to the guest.
> > In practice there are MMIO holes, and various other reasons why there will be more that 1 memslot.
> 
> It is still not clear to me why GPAs are not the same as the
> corresponding HVAs in my experiment. Since two-dimensional paging
> (Intel EPT) is used, GPAs should be the same as their corresponding
> HVAs. Otherwise, I think EPT may not work correctly. What do you
> think?

No, the guest physical address spaces is not intrinsically tied to the host
virtual address spaces.  The fact that GPAs and HVAs are related in KVM is a
property KVM's architecture.  EPT/NPT has absolutely nothing to do with HVAs.

As Maxim pointed out, KVM links a guest's physical address space, i.e. GPAs, to
the host's virtual address space, i.e. HVAs, via memslots.  For all intents and
purposes, this is an extra layer of address translation that is purely software
defined.  The memslots allow KVM to retrieve the HPA for a given GPA when
servicing a shadow page fault (a.k.a. EPT violation).

When EPT is enabled, a shadow page fault due to an unmapped GPA will look like:

 GVA -> [guest page tables] -> GPA -> EPT Violation VM-Exit

The above walk of the guest page tables is done in hardware.  KVM then does the
following walks in software to retrieve the desired HPA:

 GPA -> [memslots] -> HVA -> [host page tables] -> HPA

KVM then takes the resulting HPA and shoves it into KVM's shadow page tables,
or when TDP is enabled, the EPT/NPT page tables.  When the guest is run with
TDP enabled, GVA->HPA translations look like the following, with all walks done
in hardware.

 GVA -> [guest page tables] -> GPA -> [extended/nested page tables] -> HPA