[Libguestfs] Extracting files from OVA is bad

Fri Sep 9 12:08:31 UTC 2016

On Fri, Sep 09, 2016 at 01:03:49PM +0200, Tomáš Golembiovský wrote:
> Hi,
> 
> recently we (oVirt) have started discussing whether the way virt-v2v
> handles import from OVA files is good. And I would be interested in
> ideas how it can be improved. It is likely somebody already gave some
> thought to this problem.
> 
> TL;DR: Extracting the OVA before import is a problem for large VMs (in
> sizes of TBs). Can we change something to prevent the extraction and
> work directly over OVA?
> 
> 
> What we consider a huge shortcoming is the fact that whole OVA is
> extracted prior to the import into a temporary directory and processed
> afterwards. Under normal situation user can have up to three copies of
> the VM on his drive at the end of import:
> 
>   * original OVA,
>   * temporary extracted files (will be deleted when virt-v2v terminates,
>   * converted VM.
> 
> 
> This is not a good idea for large VMs that have hunderds of GBs or even
> TBs in size. The requirements on the necessary storage space can be
> lessened with proper partitioning. I.e. source OVA and converted VM
> don't end up on the same drive and TMPDIR is set to put even temporary
> files somewhere else. But this is not a general solution. And sometimes
> the necessary space may not be available at all.
> 
> 
> The question is how to change the import path so that virt-v2v doesn't
> have to extract the OVA. I can see the following solutions:
> 
>  1) Solve it virt-v2v: create a layer for directly accessing the files
>     in the archive.
> 
>  2) Solve it in QEMU: create backing method that would allow creating
>     qemu disk backed by the archive. 
> 
>  3) Solve it on oVirt side: use some FUSE-based tool to provide
>     access to the archive and pass the OVA to virt-v2v not as a file but
>     as directory.
> 
> 
> Does anyone have any other ideas or suggestions?

Consider using virt-v2v --in-place, and doing your own import in
whatever way is most appropriate for your scenarios.

This is what we do in Virtuozzo to import VMs created in the previous
version (based on proprietary hypervisor) into Virtuozzo 7
(QEMU/KVM-based).

More specifically,

1) the target VM configuration is created from the source VM
   configuration by Virtuozzo code

2) we have two scenarios for hdd images:

   a) with data copying from a host running Virtuozzo 6 to a host
      running Virtuozzo 7.

      In this case the new qcow2 images are created and the data is
      migrated by Virtuozzo code; then they are attached directly to the
      new VM which is safe as the original images remain on the source
      host.  The new images are then modified during v2v --in-place and
      then used by the new VM.

   b) without data copying (e.g. if shared storage is used).

      In this case qcow2 overlays are created with original images as
      backing files, and attached to the new VM.  Once v2v --in-place is
      complete and the new VM starts its new life, the original images
      can be lazily merged into the new qcow2 files or just left as is.

I think something like this can be done for your usecase, e.g. you can
create qcow2 images with images from the archive as backing files.  One
possibility to do so is to have a loopback block device on top of the
archive with appropriate offset and length (OVA are just tarballs,
aren't they?  So every archive member is just a contiguous chunk of data
within the file.)  Upon completing the conversion and starting the
imported VM you can merge all the data into the new images and get rid
of the source tarball.

Roman.