[libvirt] [RFC][PATCH]: Secure migration

Fri Mar 20 13:43:40 UTC 2009

Daniel Veillard wrote:
> On Mon, Mar 16, 2009 at 04:26:58PM +0100, Chris Lalancette wrote:
>> All,
>>      Attached is a *very* rough first draft of the secure migration code I'm
>> working on.  This is in no way ready for merge.  That being said, this
>> demonstrates the basic idea that I'm pursuing, and I've actually been able to
>> perform a KVM secure live migrate using this.  Before I go and finish polishing
>> it up, though, I wanted to make sure there wasn't anything fundamentally wrong
>> with the approach.  So, in that vein, comments are appreciated.
> 
>> diff --git a/qemud/remote_protocol.h b/qemud/remote_protocol.h
>> index 75def5e..d97a18b 100644
>> --- a/qemud/remote_protocol.h
>> +++ b/qemud/remote_protocol.h
>> @@ -41,6 +41,8 @@ typedef remote_nonnull_string *remote_string;
>>  #define REMOTE_SECURITY_MODEL_MAX VIR_SECURITY_MODEL_BUFLEN
>>  #define REMOTE_SECURITY_LABEL_MAX VIR_SECURITY_LABEL_BUFLEN
>>  #define REMOTE_SECURITY_DOI_MAX VIR_SECURITY_DOI_BUFLEN
>> +#define REMOTE_CONNECT_SECURE_MIGRATION_DATA_MAX 65536
>> +#define REMOTE_CONNECT_SECURE_MIGRATION_COOKIE_MAX 65536
>>  
>>  typedef char remote_uuid[VIR_UUID_BUFLEN];
>>  
> 
>   Okay I have tried to think again about this, from the code fragment
> before and discussions on IRC while performances are tolerable, there
> is a lot of costs related to the 64KB chunking imposed by the XML-RPC.

Just so others are clear on what this means:

After doing a little bugfixing on the version of the code I posted, I did some
performance measurements with a guest using 3.4GB of memory.  A standard
migration, direct qemu->qemu, took somewhere between 30 seconds and a minute to
complete this.  The encrypted version took anywhere between 1 minute and 3
minutes to complete the migration, a slowdown of between 1.5 and 3 times.  I'll
have to do some more testing to get more solid numbers.

>   It is probably acceptable for a class of users who really want
> encryption of their data but I would like to make sure we don't close
> the door for a possibly more performant implementation.

Yes, that's a good point.

>   Trying to reopen a bit the discussion we had before on opening a
> separate encrypted connection, this would have a number of potential
> improvements over the XML-RPC:
>    - no chunking, far less context-switching (it would be good to know
>      how much of excess time spent in the secure migration is data
>      encoding, how much is overall system burden)

Well, as DanB points out later, there is still chunking, but we get rid of the
RPC overhead of a reply to every packet.

>   My main concern is keeping a port open in the firewall for the
> incoming connection of the encrypted data, and I wonder if it's really
> necessary, basically since the receiver and the sender can both
> communicate already via the XML-RPC maybe something like STUN (for UDP)
> where both end open simultaneously a new connection to the other side
> might work, and that can be coordinated via the XML-RPC (passing the new
> port opened etc). The point being that usually firewall block only
> incoming connections to non-registered port but outcoming connections
> are allowed to go, I have no idea if this can be made to work though.
>   In general I would like to make sure we have room in the initial phase
> to add such a negociation where an optimal solution may be attempted,
> possibly falling back to a normal XML-RPC solution like this.
> Basically, make sure we can try to be as efficient as possible, and
> allow the protocol to evolve, but fallback to XML-RPC encapsulation
> if that initial round fails.

I'm not so sure I agree with this.  If we have a reasonably generic solution, I
think the admin would prefer to know up-front that the most performant solution
is misconfigured somehow, rather than libvirt silently falling back to a less
optimal solution.  I think we should work on finding a solution that provides
the encryption and has reasonably good performance, and just stick with one
solution.

What I'm going to do early next week is do some additional work to try to get
DanB's suggestion of the STREAM_DATA RPC working.  Then I'll try benchmarking
(both for duration, and CPU usage):

1)  Original, unencrypted migration as a baseline
2)  Direct TLS->TLS stream migration; this requires a new port in the firewall,
but will be another good data point, and I already have the code for this
sitting around
3)  The solution I posted here
4)  The STREAM_DATA solution

Assuming I can get some stable numbers out of those tests, it should at least
give us the information we need to make a good decision.

-- 
Chris Lalancette