On Wednesday, 23 November 2016 16:16:44 CET Tomáš Golembiovský wrote: > On Mon, 21 Nov 2016 16:21:02 +0100 > Pino Toscano <ptoscano redhat com> wrote: > > > On Saturday, 12 November 2016 16:37:52 CET Tomáš Golembiovský wrote: > > > We don't have to always extract all files from the OVA archive. The OVA, > > > as defined in the standard, is plain tar. We can work directly over the > > > tar archive if we use correct 'offset' and 'size' options when defining > > > the backing file for QEMU. > > > > > > This leads to improvements in speed and puts much lower requirement on > > > available disk space. > > > > > > Signed-off-by: Tomáš Golembiovský <tgolembi redhat com> > > > --- > > > v2v/input_ova.ml | 177 +++++++++++++++++++++++++++++++++++++++++++++++-------- > > > 1 file changed, 154 insertions(+), 23 deletions(-) > > > > > > diff --git a/v2v/input_ova.ml b/v2v/input_ova.ml > > > index f76fe82..7a3e27b 100644 > > > --- a/v2v/input_ova.ml > > > +++ b/v2v/input_ova.ml > > > @@ -39,17 +39,23 @@ object > > > > > > method source () = > > > > > > - let untar ?(format = "") file outdir = > > > - let cmd = [ "tar"; sprintf "-x%sf" format; file; "-C"; outdir ] in > > > + (* Untar part or all files from tar archive. If [path] is specified it is > > > + * a path in the tar archive. > > > + *) > > > + let untar ?(format = "") ?(path) file outdir = > > > > No need for parenthesis around "path". > > Also, tar supports extracting more files at the same time, so I'd > > change "path" into "paths", and just call this once later on ... > > You mean pass an array/list instead of single string? Yeah, why not. Right. > > > + let rec loop lines = > > > + match lines with > > > + |  -> raise Not_found > > > + | line :: lines -> ( > > > + (* Lines have the form: > > > + * block <offset>: <perms> <owner>/<group> <size> <mdate> <mtime> <file> > > > + *) > > > + let rex = Str.regexp "^block \\([0-9]+\\): \\([^ \\t]+\\) \\([^ \\t]+\\) +\\([^ \\t]+\\) \\([^ \\t]+\\) \\([^ \\t]+\\) \\(.+\\)$" in > > > > Whoa how unreadable :/ The alternative would be using String.nsplit, > > but it would play badly with filenames with spaces inside the archive. > > Maybe a faster alternative could be: > > a) find the 6th empty space, and pick the substring of all the rest > > of the string after it > > b) filter the lines by picking only the wanted file, checking using (a) > > c) once a single line is found, use String.nsplit and parse only parts > > 1 (removing the ':' suffix) and 4 > > > > It feels slightly more complex, but this way we can avoid running the > > regex until we find the right filename. > > The problem is there are multiple spaces used to align the column with > file sizes. Hence the complicated regexp. :( I see... in any case, I'd prefer to avoid running the regex on every line if possible. An idea could be improve our String.nsplit to ignore empty fields, and thus String.nsplit " " "foo bar" -> ["foo"; "bar"] -- Pino Toscano
Description: This is a digitally signed message part.