[edk2-devel] RFC: Fast Migration for SEV and SEV-ES - blueprint and proof of concept

Ashish Kalra ashish.kalra at amd.com
Thu Oct 29 17:06:38 UTC 2020


Hello Tobin,

On Wed, Oct 28, 2020 at 03:31:44PM -0400, Tobin Feldman-Fitzthum wrote:
> Hello,
> 
> Dov Murik. James Bottomley, Hubertus Franke, and I have been working on a
> plan for fast live migration of SEV and SEV-ES (and SEV-SNP when it's out
> and even hopefully Intel TDX) VMs. We have developed an approach that we
> believe is feasible and a demonstration that shows our solution to the most
> difficult part of the problem. In short, we have implemented a UEFI
> Application that can resume from a VM snapshot. We think this is the crux of
> SEV-ES live migration. After describing the context of our demo and how it
> works, we explain how it can be extended to a full SEV-ES migration. Our
> goal is to show that fast SEV and SEV-ES live migration can be implemented
> in OVMF with minimal kernel changes. We provide a blueprint for doing so.
> 
> Typically the hypervisor facilitates live migration. AMD SEV excludes the
> hypervisor from the trust domain of the guest. When a hypervisor (HV)
> examines the memory of an SEV guest, it will find only a ciphertext. If the
> HV moves the memory of an SEV guest, the ciphertext will be invalidated.
> Furthermore, with SEV-ES the hypervisor is largely unable to access guest
> CPU state. Thus, fast migration of SEV VMs requires support from inside the
> trust domain, i.e. the guest.
> 
> One approach is to add support for SEV Migration to the Linux kernel. This
> would allow the guest to encrypt/decrypt its own memory with a transport
> key. This approach has met some resistance. We propose a similar approach
> implemented not in Linux, but in firmware, specifically OVMF. Since OVMF
> runs inside the guest, it has access to the guest memory and CPU state. OVMF
> should be able to perform the manipulations required for live migration of
> SEV and SEV-ES guests.
> 
> The biggest challenge of this approach involves migrating the CPU state of
> an SEV-ES guest. In a normal (non-SEV migration) the HV sets the CPU state
> of the target before the target begins executing. In our approach, the HV
> starts the target and OVMF must resume to whatever state the source was in.
> We believe this to be the crux (or at least the most difficult part) of live
> migration for SEV and we hope that by demonstrating resume from EFI, we can
> show that our approach is generally feasible.
> 
> Our demo can be found at <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsecure-migration&data=04%7C01%7Cashish.kalra%40amd.com%7C6edb93f8936e465a9fee08d87b781d00%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637395103097650163%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=dsOh3zcwSWgnpmMdcCnSoJ%2B3Ohqz175axch%2B%2Bnu73Uc%3D&reserved=0>.
> The tooling repository is the best starting point. It contains documentation
> about the project and the scripts needed to run the demo. There are two more
> repos associated with the project. One is a modified edk2 tree that contains
> our modified OVMF. The other is a modified qemu, that has a couple of
> temporary changes needed for the demo. Our demonstration is aimed only at
> resuming from a VM snapshot in OVMF. We provide the source CPU state and
> source memory to the destination using temporary plumbing that violates the
> SEV trust model. We explain the setup in more depth in README.md. We are
> showing only that OVMF can resume from a VM snapshot. At the end we will
> describe our plan for transferring CPU state and memory from source to
> guest. To be clear, the temporary tooling used for this demo isn't built for
> encrypted VMs, but below we explain how this demo applies to and can be
> extended to encrypted VMs.
> 
> We Implemented our resume code in a very similar fashion to the recommended
> S3 resume code. When the HV sets the CPU state of a guest, it can do so when
> the guest is not executing. Setting the state from inside the guest is a
> delicate operation. There is no way to atomically set all of the CPU state
> from inside the guest. Instead, we must set most registers individually and
> account for changes in control flow that doing so might cause. We do this
> with a three-phase trampoline. OVMF calls phase 1, which runs on the OVMF
> map. Phase 1 sets up phase 2 and jumps to it. Phase 2 switches to an
> intermediate map that reconciles the OVMF map and the source map. Phase 3
> switches to the source map, restores the registers, and returns into
> execution of the source. We will go backwards through these phases in more
> depth.
> 
> The last thing that resume to EFI does is return. Specifically, we use
> IRETQ, which reads the values of RIP, CS, RFLAGS, RSP, and SS from a
> temporary stack and restores them atomically, thus returning to source
> execution. Prior to returning, we must manually restore most other registers
> to the values they had on the source. One particularly significant register
> is CR3. When we return to Linux, CR3 must be set to the source CR3 or the
> first instruction executed in Linux will cause a page fault. The code that
> we use to restore the registers and return must be mapped in the source page
> table or we would get a page fault executing the instructions prior to
> returning into Linux. The value of CR3 is so significant, that it defines
> the three phases of the trampoline. Phase 3 begins when CR3 is set to the
> source CR3. After setting CR3, we set all the other registers and return.
> 
> Phase 2 mainly exists to setup phase 3. OVMF uses a 1-1 mapping, meaning
> that virtual addresses are the same as physical addresses. The kernel page
> table uses an offset mapping, meaning that virtual addresses differ from
> physical addresses by a constant (for the most part). Crucially, this means
> that the virtual address of the page that is executed by phase 3 differs
> between the OVMF map and the source map. If we are executing code mapped in
> OVMF and we change CR3 to point to the source map, although the page may be
> mapped in the source map, the virtual address will be different, and we will
> face undefined behavior. To fix this, we construct intermediate page tables
> that map the pages for phase 2 and 3 to the virtual address expected in OVMF
> and to the virtual address expected in the source map. Thus, we can switch
> CR3 from OVMF's map to the intermediate map and then from the intermediate
> map to the source map. Phase 2 is much shorter than phase 3. Phase 2 is
> mainly responsible for switching to the intermediate map, flushing the TLB,
> and jumping to phase 3.
> 
> Fortunately phase 1 is even simpler than phase 2. Phase 1 has two duties.
> First, since phase 2 and 3 operate without a stack and can't access values
> defined in OVMF (such as the addresses of the pages containing phase 2 and
> 3), phase 1 must pass these values to phase 2 by putting them in registers.
> Second, phase 1 must start phase 2 by jumping to it.
> 
> Given that we can resume to a snapshot in OVMF, we should be able to migrate
> an SEV guest as long as we can securely communicate the VM snapshot from
> source to destination. For our demo, we do this with a handful of QMP
> commands. More sophisticated methods are required for a production
> implementation.
> 
> When we refer to a snapshot, what we really mean is the device state,
> memory, and CPU state of a guest. In live migration this is transmitted
> dynamically as opposed to being saved and restored. Device state is not
> protected by SEV and can be handled entirely by the HV. Memory, on the other
> hand, cannot be handled only by the HV. As mentioned previously, memory
> needs to be encrypted with a transport key. A Migration Handler on the
> source will coordinate with the HV to encrypt pages and transmit them to the
> destination. The destination HV will receive the pages over the network and
> pass them to the Migration Handler in the target VM so they can be
> decrypted. This transmission will occur continuously until the memory of the
> source and target converges.
> 
> Plain SEV does not protect the CPU state of the guest and therefore does not
> require any special mechanism for transmission of the CPU state. We plan to
> implement an end-to-end migration with plain SEV first. In SEV-ES, the PSP
> (platform security processor) encrypts CPU state on each VMExit. The
> encrypted state is stored in memory. Normally this memory (known as the
> VMSA) is not mapped into the guest, but we can add an entry to the nested
> page tables that will expose the VMSA to the guest.

I have a question here, is there any kind of integrity protection on the
CPU state when the target VM is resumed after nigration, for example, if
there is a malicious hypervisor which maps a page with subverted CPU
state on the nested page tables, what prevents the target VM to resume
execution on a subverted or compromised CPU state ?

Thanks,
Ashish

> This means that when the
> guest VMExits, the CPU state will be saved to guest memory. With the CPU
> state in guest memory, it can be transmitted to the target using the method
> described above.
> 
> In addition to the changes needed in OVMF to resume the VM, the transmission
> of the VM from source to target will require a new code path in the
> hypervisor. There will also need to be a few minor changes to Linux (adding
> a mapping for our Phase 3 pages). Despite all the moving pieces, we believe
> that this is a feasible approach for supporting live migration for SEV and
> SEV-ES.
> 
> For the sake of brevity, we have left out a few issues, including SMP
> support, generation of the intermediate mappings, and more. We have included
> some notes about these issues in the COMPLICATIONS.md file. We also have an
> outline of an end-to-end implementation of live migration for SEV-ES in
> END-TO-END.md. See README.md for info on how to run the demo. While this is
> not a full migration, we hope to show that fast live migration with SEV and
> SEV-ES is possible without major kernel changes.
> 
> -Tobin
> 


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66766): https://edk2.groups.io/g/devel/message/66766
Mute This Topic: https://groups.io/mt/77875297/1813853
Group Owner: devel+owner at edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [edk2-devel-archive at redhat.com]
-=-=-=-=-=-=-=-=-=-=-=-





More information about the edk2-devel-archive mailing list