[Libguestfs] [PATCH 0/3] [FOR COMMENTS ONLY] Rework inspection.

Wed Dec 2 22:05:27 UTC 2015

This is something I've been working on: Reworking inspection so it's
not a big mess of ad hoc C code, but instead uses a well-defined
domain-specific language to describe how we inspect guests.

The best introduction to this is the manual page, which I include
below (it's also included in patch 2/3).

Rich.

----------------------------------------------------------------------
NAME

    guestfs-inspection - guestfs inspection program

SYNOPSIS

     guestfs-inspection

NOTE

    This man page documents the guestfs inspection program. If you want to
    read about guestfs inspection then this is the wrong place. See
    "INSPECTION" in guestfs(3) instead.

DESCRIPTION

    guestfs-inspection is a standalone program that performs inspection on
    the local disks, to find out what operating system(s) are installed. It
    normally runs inside the libguestfs appliance, started by guestfsd(8),
    when the caller uses the guestfs_inspect_os API (see
    guestfs-internals(1) and "guestfs_inspect_os" in guestfs(3)). You
    should never need to run this program by hand.

    The program looks at all disks attached to the appliance, looking for
    filesystems that might belong to operating systems. It may mount these
    temporarily to examine them for Linux configuration files, Windows
    Registries, and so on. It then tries to determine what operating
    system(s) are installed on the disks. It is able to detect many
    different Linux distributions, Windows, BSD, and others. The currently
    mounted root filesystem is ignored, since when running under
    libguestfs, that filesystem is part of the libguestfs appliance (this
    is the main difference compared to programs like facter).

    Guestfs-inpection is written in C, but most of the C is generated by a
    rules compiler from a set of inspection rules written in a more
    compact, declarative, Prolog-inspired language. If you want to write or
    modify the rules, see "WRITING RULES" below.

OPTIONS

    -?

    --help

      Display brief help.

    -v

    --verbose

      Enable verbose messages for debugging.

WRITING RULES

    Inspection is performed according to a set of rules written in a
    compact, declarative, Prolog-inspired language. This section explains
    how this language works, so you can write your own rules to detect
    other operating systems.

    The rules can be found starting in inspection/inspection.rules (in the
    libguestfs sources). The rules are compiled down to C and linked into
    the guestfs-inspection program, together with a bit of extra C code to
    provide runtime support.

 Facts

    Facts are what we try to determine about the operating system(s) we are
    inspecting. They look like this:

     Filesystem("/dev/sda1")

    which means "/dev/sda1 is a filesystem".

     File("/dev/sda1", "/etc/fstab")

    which means "there exists a file called /etc/fstab on the /dev/sda1
    filesystem".

    Facts come in three flavours: true facts, false facts, and unknown
    facts. False facts are written like this:

     ! File("/dev/sda1", "/etc/fstab")

    which means "either /dev/sda1 is not a filesystem or there does not
    exist a file called /etc/fstab on this filesystem".

    Unknown facts are facts that we don't know if they are true or false
    yet.

 Rules

    Rules are used to generate more facts. A simple rule for generating
    File facts might look like this:

     File(fs, filename) :-
         Filesystem(fs),
         {{
           // some C code to mount 'fs' and check for 'filename'
         }}.

    You can read this as: "For all fs & filename, if fs is a filesystem,
    and running the C code with parameters fs and filename returns true,
    then File(fs, filename) is a true fact".

    In the Prolog-inspired language, a comma (,) is the AND operator. A
    semicolon (;) is the OR operator. :- is a backwards if-statement (the
    condition is on the right, the conclusion is on the left). Also notice
    the dot (.) which must follow each rule.

    Uppercase identifiers are facts. Lowercase identifiers are variables.
    All identifiers are case-sensitive.

    Everything in {{ ... }} is embedded C code. In this case the C code
    returns a true/false/error indication, but embedded C code can also do
    more complicated things and return strings and lists as we'll see
    later.

    You can use parentheses (...) for grouping expressions on the right
    hand side of the :- operator.

 Program evaluation

    Let's take a simple set of rules which you might use to detect a Fedora
    root filesystem:

     File(fs, filename) :-
         Filesystem(fs),
         {{
           // some C code to mount 'fs' and check for 'filename'
         }}.

     Fedora(rootfs) :-
         Filesystem(rootfs),
         File(rootfs, "/etc/fedora-release").

    When evaluating this program, there are two sets of facts, the true
    facts and the false facts. Let's start with the false facts set being
    empty, and let's seed the true facts set with some Filesystem facts:

     true_facts = { Filesystem("/dev/sda1"), Filesystem("/dev/sda3") }
     false_facts = {  } // empty set

    Unknown facts are facts which don't appear in either set.

    Evaluating the program works like this: We consider each rule in turn,
    and see if we can find new true or false facts from it. These new facts
    are added to the true or false facts sets. After looking at each rule
    in the program, as long as at least one new fact was added to the true
    facts set, we go back to the start of the rules and repeat over. We do
    this until we can no longer add any new true facts, and then we're
    done.

    In the case of this program, we start with the File rule, and we
    substitute (theoretically) every possible string for fs and filename.

    For example, this substitution:

     File("/dev/sda1", "/etc/fedora-release") :-
         Filesystem("/dev/sda1"),
         {{ // checks for file and returns false }}.

    turns out to be false (because the C code doesn't find /etc/fstab in
    /dev/sda1), so that yields a new false fact:

     ! File("/dev/sda1", "/etc/fedora-release")

    But this substitution turns out to be true:

     File("/dev/sda3", "/etc/fedora-release") :-
         Filesystem("/dev/sda3"),
         {{ // checks for file and returns true }}.

    so that yields a new true fact:

     File("/dev/sda3", "/etc/fedora-release")

    In theory every possible string is tried, eg File("ardvark",
    "foo123654"). That would take literally forever to run, but luckily the
    rules compiler is smarter.

    Looking now at the second rule, we try this substitution:

     Fedora("/dev/sda3") :-
         Filesystem("/dev/sda3"),
         File("/dev/sda3", "/etc/fedora-release").

    which yields another new true fact:

     Fedora("/dev/sda3")

    Because we added several new true facts to the set, we go back and
    repeat the whole process. But after trying all the rules for a second
    time, no more true facts can be added, so now we're done.

    At the end, the set of true facts is:

     true_facts = { Filesystem("/dev/sda1"), Filesystem("/dev/sda3"),
                    File("/dev/sda3", "/etc/fedora-release"),
                    Fedora("/dev/sda3") }

    We don't care about the false facts -- they are discarded at the end of
    the program.

    The summary of inspection is that /dev/sda3 contains a Fedora root
    filesystem.

    Of course real inspection is much more complicated than this, but the
    same program evaluation order is followed.

 Some caveats with the language

    It's easy to look at an expression like:

     Fedora(rootfs) :-
         Filesystem(rootfs),
         File(rootfs, "/etc/fedora-release"). /* line 3 */

    and think that line 3 is "calling" the "File function". This is not
    what is happening! Rules are not functions. Rules are considered in
    isolation. Rules don't "call" other rules. Instead when trying to find
    possible values that can be substituted into a rule, we only look at
    the rule and the current sets of true and false facts.

    When searching for values to subsitute, in theory the compiler would
    have to look at every possible string. In practice of course it can't
    and doesn't do that. Instead it looks at the current sets of true and
    false facts to find strings to substitute. In the following rule:

     File(fs, filename) :-
         Filesystem(fs),
         {{ // C code }}.

    suitable choices for fs are found by looking at any Filesystem facts in
    either the true or false sets.

    In some cases, this doesn't work, as in the example above where we have
    no clues for the filename variable. In that case the compiler tries
    every string literal from every rule in the program. This can be
    inefficient, but by modifying the rule slightly you can avoid this. In
    the following program, only the strings /etc/fstab and
    /etc/fedora-release would be tried:

     Filename("/etc/fstab").
     Filename("/etc/fedora-release").
     File(fs, filename) :-
         Filesystem(fs),
         Filename(filename),
         {{ // C code }}.

 C expressions returning boolean

    Simple C code enclosed in {{ ... }} as shown above should return a
    true, false or error status only. It returns true by returning any
    integer ≥ 1. It should return 0 to indicate false, and it should return
    -1 to indicate an error (which stops the program and causes inspection
    to fail with a user-visible error).

    Here is an example of a simple C expression returning a boolean:

     File(fs, filename) :-
         Filesystem(fs),
         {{
           int r;
           char *relative_filename;
           r = get_mount (fs, filename, &relative_filename);
           if (r != 1) return r;
           r = access (relative_filename, F_OK);
           free (relative_filename);
           if (r == -1) {
             if (errno == ENOENT || errno == ENOTDIR)
               return 0;
             perror ("access");
             return -1;
           }
           return 1;
         }}.

    Notice that fs and filename are passed into the C code as local
    variables.

    You can see that dealing with errors is a bit involved, because we want
    to fail hard if some error like EIO is thrown.

 C expressions returning strings

    C expressions can also return strings or tuples of strings. This is
    useful where you want to parse the content of external files.

    The syntax for this sort of C expression is:

     (var1, var2, ...)={{ ... }}

    where var1, var2, etc. are outputs from the C code.

    In the following example, a lot of error checking has been omitted for
    clarity:

     ProductName(fs, product_name) :-
         Unix_root(fs),
         Distro(fs, "RHEL"),
         (product_name)={{
           int r;
           char *line = NULL;
           size_t n;
           char *relative_filename;
           r = get_mount (fs, "/etc/redhat-release", &relative_filename);
           FILE *fp = fopen (relative_filename, "r");
           free (relative_filename);
           getline (&line, &n, fp);
           fclose (fp);
           set_product_name (line);
           free (line);
           return 0;
         }}.

    The C code calls a function set_product_name (that the compiler
    generates).

    The return value from the C code should be 0 if everything was OK, or
    -1 if there is a error (which stops the whole program).

 C expressions returning multiple results

    Finally it is possible for C code to return multiple results.

    The syntax is:

     [var1, var2, ...]={{ ... }}

    where var1, var2, etc. are outputs. Unlike the previous rules, these
    rules may generate multiple facts from a single string substitution.

    This is how we populate the initial list of true facts about
    filesystems:

     Filesystem(fs) :-
         [fs]={{
           int i;
           for (i = 0; i < nr_filesystems; ++i) {
             set_fs (fs[i]);
           }
           return 0;
         }}.

    In this case, the C code repeatedly calls a function set_fs (that the
    compiler generates) for each new filesystem discovered. Multiple
    Filesystem facts can be generated as a result of one application of
    this rule.

    The return value from the C code should be 0 if everything was OK, or
    -1 if there is a error (which stops the whole program).

 Type checking

    The current language treats every value as a string. Every expression
    is a boolean. One possible future enhancement is to handle other types.
    There is still some minimal type checking applied:

      * A fact name which appears on a right hand side of any rule must
      also appear on the left hand side of a rule. This is mainly for
      catching typos.

      * A fact must have the same number of arguments ("arity") each time
      it appears in the source.

 Debugging

    You can debug the evaluation of inspection programs by calling
    guestfs_set_verbose (or setting $LIBGUESTFS_DEBUG=1) before launching
    the handle.

    This causes guestfsd(8) to pass the --verbose parameter to this
    inspection program, which in turn causes the inspection program to
    print information about what rules it is trying and what true/false
    facts it has found. These are passed back to libguestfs and printed on
    stderr (or sent to the event system if you are using that).

    You can also print debug messages from C code embedded in {{...}}
    expressions. These are similarly sent upwards through to libguestfs and
    will appear on stderr.

EXIT STATUS

    This program returns 0 if successful, or non-zero if there was an
    error.

SEE ALSO

    guestfsd(8), guestfs-hacking(1), guestfs-internals(1), "INSPECTION" in
    guestfs(3), http://libguestfs.org/.

AUTHOR

    Richard W.M. Jones http://people.redhat.com/~rjones/

COPYRIGHT

    Copyright (C) 2009-2015 Red Hat Inc.

LICENSE

    This program is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
    Free Software Foundation; either version 2 of the License, or (at your
    option) any later version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
    General Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
----------------------------------------------------------------------