[Libguestfs] [PATCH v2 4/5] v2v: ova: don't extract files from OVA if it's not needed

Thu Nov 24 15:47:49 UTC 2016

On Wednesday, 23 November 2016 16:16:44 CET Tomáš Golembiovský wrote:
> On Mon, 21 Nov 2016 16:21:02 +0100
> Pino Toscano <ptoscano at redhat.com> wrote:
> 
> > On Saturday, 12 November 2016 16:37:52 CET Tomáš Golembiovský wrote:
> > > We don't have to always extract all files from the OVA archive. The OVA,
> > > as defined in the standard, is plain tar. We can work directly over the
> > > tar archive if we use correct 'offset' and 'size' options when defining
> > > the backing file for QEMU.
> > > 
> > > This leads to improvements in speed and puts much lower requirement on
> > > available disk space.
> > > 
> > > Signed-off-by: Tomáš Golembiovský <tgolembi at redhat.com>
> > > ---
> > >  v2v/input_ova.ml | 177 +++++++++++++++++++++++++++++++++++++++++++++++--------
> > >  1 file changed, 154 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/v2v/input_ova.ml b/v2v/input_ova.ml
> > > index f76fe82..7a3e27b 100644
> > > --- a/v2v/input_ova.ml
> > > +++ b/v2v/input_ova.ml
> > > @@ -39,17 +39,23 @@ object
> > >  
> > >    method source () =
> > >  
> > > -    let untar ?(format = "") file outdir =
> > > -      let cmd = [ "tar"; sprintf "-x%sf" format; file; "-C"; outdir ] in
> > > +    (* Untar part or all files from tar archive. If [path] is specified it is
> > > +     * a path in the tar archive.
> > > +     *)
> > > +    let untar ?(format = "") ?(path) file outdir =  
> > 
> > No need for parenthesis around "path".
> > Also, tar supports extracting more files at the same time, so I'd
> > change "path" into "paths", and just call this once later on ...
> 
> You mean pass an array/list instead of single string? Yeah, why not.

Right.

> > > +      let rec loop lines =
> > > +        match lines with
> > > +        | [] -> raise Not_found
> > > +        | line :: lines -> (
> > > +          (* Lines have the form:
> > > +           * block <offset>: <perms> <owner>/<group> <size> <mdate> <mtime> <file>
> > > +           *)
> > > +          let rex = Str.regexp "^block \\([0-9]+\\): \\([^ \\t]+\\) \\([^ \\t]+\\) +\\([^ \\t]+\\) \\([^ \\t]+\\) \\([^ \\t]+\\) \\(.+\\)$" in  
> > 
> > Whoa how unreadable :/ The alternative would be using String.nsplit,
> > but it would play badly with filenames with spaces inside the archive.
> > Maybe a faster alternative could be:
> > a) find the 6th empty space, and pick the substring of all the rest
> >    of the string after it
> > b) filter the lines by picking only the wanted file, checking using (a)
> > c) once a single line is found, use String.nsplit and parse only parts
> >    1 (removing the ':' suffix) and 4
> > 
> > It feels slightly more complex, but this way we can avoid running the
> > regex until we find the right filename.
> 
> The problem is there are multiple spaces used to align the column with
> file sizes. Hence the complicated regexp. :(

I see... in any case, I'd prefer to avoid running the regex on every
line if possible. An idea could be improve our String.nsplit to ignore
empty fields, and thus
  String.nsplit " " "foo  bar" -> ["foo"; "bar"]

-- 
Pino Toscano
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20161124/81ec5cc2/attachment.sig>