[Libguestfs] [PATCH] RFC: OCaml tools: add and use a Getopt module

Pino Toscano ptoscano at redhat.com
Mon Jul 11 15:57:43 UTC 2016


On Monday, 27 June 2016 15:42:46 CEST Richard W.M. Jones wrote:
> On Fri, Jun 24, 2016 at 05:42:37PM +0200, Pino Toscano wrote:
> > Add a new Getopt module to mllib, to parse command line arguments with
> > handlers close to the ones used with Arg, but using getopt(3) (actually
> > getopt_long_only) to do the real parsing.  This allow us to provide
> > options for OCaml tools with a syntax similar to the C tools, and use
> > the additional features getopt offers and Arg does not.
> > 
> > Do a single-step conversion of Common_utils and all the OCaml tools to
> > the syntax of Getopt.
> > 
> > As side-change due to the conversion, extra arguments for sysprep
> > operation can have more keys for the same argument.
> 
> In general terms, it's a very good change which really improves the
> tools.
> 
> I have a few fairly minor issues below.  ACK if you can clean all of
> those up.

[...]

> > diff --git a/mllib/getopt-c.c b/mllib/getopt-c.c
> > new file mode 100644
> > index 0000000..d44448f
> > --- /dev/null
> > +++ b/mllib/getopt-c.c
> > @@ -0,0 +1,398 @@
> 
> > +value
> > +guestfs_int_mllib_getopt_parse (value argsv, value specsv, value anon_funv, value usage_msgv)
> 
> I'm not convinced that this function is safe against OCaml GC compaction.
> 
> In particular there are problems such as:
> 
> ...
> > +    for (j = 0; j < len; ++j) {
> > +      const char *key = String_val (Field (keysv, j));
> 
> key now points to a string on the OCaml heap, and then ...
> 
> > +        if (newopts == NULL)
> > +          caml_raise_out_of_memory ();
> > +        longopts = newopts;
> > +        longopts[longopts_len].name = key;
> 
> the same pointer is copied to longopts, but ...
> 
> > +    case 0:  /* Unit of (unit -> unit) */
> > +      do_call1 (Field (actionv, 0), Val_unit);
> 
> At this point you're calling an OCaml function which is likely to
> allocate, and could therefore call the GC, and could therefore compact
> the heap, which would move that string around, and make your pointer
> invalid.  (You could try adding `Gc.compact ()' to one of these
> callback functions -- I'm fairly sure at least some of the time you
> could get a segfault, and if not, valgrind wouldn't be happy).

I see, it fails that way indeed.  Fixed it by:
- copying long option strings on the heap (with the ugly cast in
  cleanup_option_list, but almost unavoidable)
- making sure that operations that trigger allocations are executed
  one-by-one (i.e. caml_copy_string out of direct function parameters,
  etc)

> > +external getopt_parse : string array -> (c_keys * spec * doc) array -> ?anon_fun:anon_fun -> usage_msg -> unit = "guestfs_int_mllib_getopt_parse"
> > +
> > +let parse_argv argv specs ?anon_fun usage_msg =
> 
> I feel this function could do some sanity checking on the inputs,
> especially the keys (but see my comment below).  If the sanity check
> fails, it should either assert or failwith, but definitely not
> continue.

Moved the sanity checks in the OCaml part.

> > +  let specs = specs @ [
> > +    (* Handled internally by getopt_parse. *)
> > +    [ "-h"; "-help"; "--help" ], Unit (fun () -> ()), s_"Display brief help";
> > +  ] in
> > +  let specs = List.map (
> > +    fun (keys, spec, doc) ->
> > +      (Array.of_list keys), spec, doc
> 
> You don't need parens here, since function application always binds tightest.

Removed, thanks (leftover of other code previously there).

> > +type spec =
> > +  | Unit of (unit -> unit)
> > +    (* Simple option with no argument; call the function. *)
> > +  | Set of bool ref
> > +    (* Simple option with no argument; set the reference to true. *)
> > +  | Clear of bool ref
> > +    (* Simple option with no argument; set the reference to false. *)
> 
> Does getopt_long create the '--no-X' options automatically?

No, it doesn't.

> > +  | String of string * (string -> unit)
> > +    (* Option requiring an argument; the first element in the tuple
> > +       is the documentation string of the argument, and the second
> > +       is the function to call. *)
> > +  | Set_string of string * string ref
> > +    (* Option requiring an argument; the first element in the tuple
> > +       is the documentation string of the argument, and the second
> > +       is the reference to be set. *)
> > +  | Int of string * (int -> unit)
> > +    (* Option requiring an integer value as argument; the first
> > +       element in the tuple is the documentation string of the
> > +       argument, and the second is the function to call. *)
> > +  | Set_int of string * int ref
> > +    (* Option requiring an integer value as argument; the first
> > +       element in the tuple is the documentation string of the
> > +       argument, and the second is the reference to be set. *)
> > +
> > +type keys = string list
> 
> I had a vague idea that you might make this more type safe by
> changing this type to:
> 
>   type optstring = S of char (** short option *) | L of string (** --long *)
>   and keys = optstring list
> 
> It requires a bunch of extra changes through the code, but also avoids
> needing to write horrible unsafe code like:
> 
>   if String.is_prefix arg "-" && not (String.is_prefix arg "--") then
> 
> You could also put optstring into a submodule so that just the L and S
> definitions can be imported into client modules without needing to
> import the whole of Getopt, so that makes the syntax quite brief:
> 
>   open Getopt.Optstring
>     ...
>   [ S'l'; L"list" ],  Getopt.Unit list_mode,  s_"List available templates";
> 
> or if you prefer:
> 
>   module O = Getopt.Optstring
>     ...
>   [ O.S'l'; O.L"list" ],  Getopt.Unit list_mode,  s_"List available templates";

I thought about this, and at the moment it feels to me a bit too
complicated.  I might revise that in the future though.

> > --- a/v2v/cmdline.ml
> > +++ b/v2v/cmdline.ml
> > @@ -164,55 +164,47 @@ let parse_cmdline () =
> >    and o_options =
> >      String.concat "|" (Modules_list.output_modules ()) in
> >  
> > -  let ditto = " -\"-" in
> >    let argspec = [
> > -    "-b",        Arg.String add_bridge,     "in:out " ^ s_"Map bridge 'in' to 'out'";
> > -    "--bridge",  Arg.String add_bridge,     "in:out " ^ ditto;
> > -    "--compressed", Arg.Set compressed,     " " ^ s_"Compress output file";
> > -    "--dcpath",  Arg.String (set_string_option_once "--dcpath" dcpath),
> > -                                            "path " ^ s_"Override dcPath (for vCenter)";
> > -    "--dcPath",  Arg.String (set_string_option_once "--dcPath" dcpath),
> > -                                            "path " ^ ditto;
> > -    "--debug-overlay",Arg.Set debug_overlays,
> > -    " " ^ s_"Save overlay files";
> > -    "--debug-overlays",Arg.Set debug_overlays,
> > -    ditto;
> > -    "-i",        Arg.String set_input_mode, i_options ^ " " ^ s_"Set input mode (default: libvirt)";
> > -    "-ic",       Arg.String (set_string_option_once "-ic" input_conn),
> > -                                            "uri " ^ s_"Libvirt URI";
> > -    "-if",       Arg.String (set_string_option_once "-if" input_format),
> > -                                            "format " ^ s_"Input format (for -i disk)";
> > -    "--in-place", Arg.Set in_place,         " " ^ s_"Only tune the guest in the input VM";
> > -    "--machine-readable", Arg.Set machine_readable, " " ^ s_"Make output machine readable";
> > -    "-n",        Arg.String add_network,    "in:out " ^ s_"Map network 'in' to 'out'";
> > -    "--network", Arg.String add_network,    "in:out " ^ ditto;
> > -    "--no-copy", Arg.Clear do_copy,         " " ^ s_"Just write the metadata";
> > -    "--no-trim", Arg.String no_trim_warning,
> > -                                            "-" ^ " " ^ s_"Ignored for backwards compatibility";
> > -    "-o",        Arg.String set_output_mode, o_options ^ " " ^ s_"Set output mode (default: libvirt)";
> > -    "-oa",       Arg.String set_output_alloc,
> > -                                            "sparse|preallocated " ^ s_"Set output allocation mode";
> > -    "-oc",       Arg.String (set_string_option_once "-oc" output_conn),
> > -                                            "uri " ^ s_"Libvirt URI";
> > -    "-of",       Arg.String (set_string_option_once "-of" output_format),
> > -                                            "raw|qcow2 " ^ s_"Set output format";
> > -    "-on",       Arg.String (set_string_option_once "-on" output_name),
> > -                                            "name " ^ s_"Rename guest when converting";
> > -    "-os",       Arg.String (set_string_option_once "-os" output_storage),
> > -                                            "storage " ^ s_"Set output storage location";
> > -    "--password-file", Arg.String (set_string_option_once "--password-file" password_file),
> > -                                            "file " ^ s_"Use password from file";
> > -    "--print-source", Arg.Set print_source, " " ^ s_"Print source and stop";
> > -    "--qemu-boot", Arg.Set qemu_boot,       " " ^ s_"Boot in qemu (-o qemu only)";
> > -    "--root",    Arg.String set_root_choice,"ask|... " ^ s_"How to choose root filesystem";
> > -    "--vdsm-image-uuid", Arg.String add_vdsm_image_uuid, "uuid " ^ s_"Output image UUID(s)";
> > -    "--vdsm-vol-uuid", Arg.String add_vdsm_vol_uuid, "uuid " ^ s_"Output vol UUID(s)";
> > -    "--vdsm-vm-uuid", Arg.String (set_string_option_once "--vdsm-vm-uuid" vdsm_vm_uuid),
> > -                                            "uuid " ^ s_"Output VM UUID";
> > -    "--vdsm-ovf-output", Arg.String (set_string_option_once "--vdsm-ovf-output" vdsm_ovf_output),
> > -                                            " " ^ s_"Output OVF file";
> > -    "--vmtype",  Arg.String vmtype_warning,
> > -                                            "- " ^ s_"Ignored for backwards compatibility";
> > +    [ "-b"; "--bridge" ],        Getopt.String ("in:out", add_bridge),     s_"Map bridge 'in' to 'out'";
> > +    [ "--compressed" ], Getopt.Set compressed,     s_"Compress output file";
> > +    [ "--dcpath"; "--dcPath" ],  Getopt.String ("path", set_string_option_once "--dcpath" dcpath),
> > +                                            s_"Override dcPath (for vCenter)";
> > +    [ "--debug-overlay"; "--debug-overlays" ], Getopt.Set debug_overlays, s_"Save overlay files";
> > +    [ "-i" ],        Getopt.String (i_options, set_input_mode), s_"Set input mode (default: libvirt)";
> > +    [ "-ic" ],       Getopt.String ("uri", set_string_option_once "-ic" input_conn),
> > +                                            s_"Libvirt URI";
> > +    [ "-if" ],       Getopt.String ("format", set_string_option_once "-if" input_format),
> > +                                            s_"Input format (for -i disk)";
> 
> I'm interested to know if these awkward "single dash long options"
> actually work now?  And the -o* ones below.

Thanks to getopt_long_only, they work (and thus the v2v test suite
passes).

Thanks for the thorough review, v2 coming in a moment.

-- 
Pino Toscano
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/libguestfs/attachments/20160711/07d5b81f/attachment.sig>


More information about the Libguestfs mailing list