[libvirt] [PATCH v2 6/6] tools: make virt-host-validate check CPU vulnerabilities

Mon Sep 30 15:10:10 UTC 2019

On Mon, 2019-09-30 at 15:30 +0100, Daniel P. Berrangé wrote:
> On Mon, Sep 30, 2019 at 10:55:00AM +0200, Martin Kletzander wrote:
> > Given the fact that most of these could just be virFileReadValueUint() it does
> > not even make it easier to read or write the code.
> 
> Errr, virFileReadValueUint reads a single integer from a file.
> 
> This is extracting a single *word* from a specific field in a
> file.
> 
> Doing this in code would involve doing a regex match the same as
> here.
> 
> > Every time someone will want to add a new check or a fact they will need to find
> > a similar one, copy-paste it, change it and hope for the best.  This introduces
> > yet another "language" on top of the two you are adding already.  I really do
> > not see any benefit in this.
> 
> Taking this particular example.
> 
>   - name: cpu.vulnerability.l1tf_smt
>     filter:
>       fact:
>         name: os.kernel
>         value: Linux
>     value:
>       file:
>         path: /sys/devices/system/cpu/vulnerabilities/l1tf
>         ignoreMissing: true
>         parse:
>           scalar:
>             regex: SMT (\w+)
>             match: 1
> 
> It could be approximately equiv to code
> 
>         if runtime.GOOS == "Linux" {
> 	    return "", nil
> 	}
> 
>         data, err = ioutils.ReadFile("/sys/devices/system/cpu/vulnerabilities/l1tf")
> 	if err != nil {
> 	    if  os.IsNotExist(err) {
> 	        return "", nil
> 	    }
> 	    return "", err
> 	}
> 
> 	match, err := regexp.Find(`SMT (\w+)`, data)
> 	if err != nil {
>  	    return "", err
> 	}
> 	err = SetFact("cpu.vulnerability.l1tf_smt", match)
> 	if err != nil {
> 	    return "", err
> 	}
> 
> The code does look more familiar, but it is certainly not simpler.
> Most notabley there are alot of error handling cases here that need
> to be dealt with, even when written in Go. Doing this in C with its
> string handling support involves even more error handling. This
> is a significant maintainence burden that is not well handled in
> a consistent manner across the code.
> 
> This is a fairly simple check example, case where we want to extract
> data from a more more complex file formats result in much more code
> complexity.
> 
> In the pure code approach, we have mixed up code related to reading
> files, running commands, parsing data formats, setting facts, printing
> messages, skipping execution when certain other conditions applied.
> Hidden in this is the information about what feature we are actually
> checking for.
> 
> A data driven approach to this problem means that we get clear separation
> between distinct parts/jobs.
> 
> The engine written in code concerns itself with generic facilities for
> reading files, running commands, parsing data, and all the error handling
> and data processing.
> 
> Thereafter the interesting checks is simply a declarative expression of
> what data we wish to extract for each fature, and declarative expression
> of dependancies between features.
> 
> This engine code has greater complexity than what exists today for sure,
> but the key thing is that this is mostly a one time upfront code, and is
> the least interesting part of the tool. What is most interest is the set
> of features being checked for and the relationships between the features.
> 
> Having the declarative part separated from the functional part is where
> the value arises. 
> 
>  - Changes to the functional code apply consistently across all
>    the checks made. For example, an improvement to error handling
>    to regex processing was done once and applied to all the checks
>    that used regex. Or consistently handling when reading files
>    that are potentially missing on systems.
> 
>  - New checks can be introduced in a more reliable and consistent
>    manner. These CPU vulnerability checks are an example of this
>    in practice. We merely have to express what files we're interested
>    in and how to match the relevant data in them. No need to worry
>    about reading files, compiling & running regexs, making sure the
>    error handling was done in the same way as other existing checks.

All of the above mostly sounds like "use functions" to me. Something
like

  value, err := GetIntFactFromFile(
      "/sys/devices/system/cpu/vulnerabilities/l1tf",
      "SMT (\w+)",
      OsLinux,
      IgnoreMissing,
  )
  if err != nil {
      return err
  }
  SetIntFact("cpu.vulnerability.l1tf_smt", value)

requires basically as much upfront knowledge to understand as the
YAML version, but doesn't introduce a new DSL. And hey, it's shorter
than either version! :)

>  - The declarative data can be processed algorithmically. For example
>    given the depedancies expressed between the facts, I saw that it
>    was not neccessary to define all the facts in the same data file.
>    It was possible split them across many files, load those files in
>    any order, and algorithmically determine exactly which subset of
>    checks need to be executed & in what order.
> 
>  - Having separated data from the code it is obviously possible to
>    extend this without introducing any code changes. This is possible
>    to use from outside libvirt. For example there are usage scenarios
>    which may not be supported by certain versions of QEMU. The QEMU
>    package can drop in some facts which will be picked up by the
>    virt-host-validate tool. Alternatively going up the stack, an
>    app like OpenStack which uses libvirt may wish to restrict what
>    they use, or wish to check extra virt host features interesting
>    to their app's usage of libvirt. Again their package can now
>    drop in extra facts.

This last point is the only advantage I can see very clearly. I'm
not sure it's worth introducing a specific format for, though.

>  - New features can be built ontop of the declarative data. At first
>    I just set out to replicate the existing tool's output as is. Once
>    that was done, it was obvious that we could trivially add a new
>    feature allowing us to print the raw informationm about each
>    fact that was checked, as an alternative to the human targetted
>    message strings. IOW we got a machine readable output format
>    essentially for free by virtue of using declarative data.

This is a benefit of storing the information as facts, which as I
said before I think is a good idea; it doesn't mean we have to
obtain said facts by processing YAML files.

>  - The implementation can be rewritten again at will. If the original
>    C code had this split of data vs processing logic, this conversion
>    of langauge would have been even more compelling, as none of othe
>    declarative data would have needed changing. I'm not expecting
>    another rewrite, but this capability was already valuable, between
>    the v1 and v2 posting of this patch I changed from XML to YAML for
>    the data format. This conversion was entirely automated, because
>    it could be algorithmically processed.

If you had used code instead of a pure data format as the source for
information, then you wouldn't have needed to perform that conversion
in the first place O:-)

> These are all typical & expected benefits that arise from separating
> functional logic from declarative data. In the long term this is a
> clear win, & as I mentioned I would have done this sooner had it not
> been for our use of C.

I expect that if you would have posted patches introducing this
machinery to the C version of virt-host-validate you would have seen
more or less the same reactions.

> > If I was to pick a new feature we could benefit from, I would much rather prefer
> > having an opt-in for report-home of HW features and usage for some very rough
> > anonymous statistics.
> 
> I don't think we want to get involved in reporting user data back to any
> server, as it is a data privacy minefield. This tool can be used by other
> existing reporting tools though - eg Red Hat distros often include the
> 'sosreport' tool that bundles stuff up for reporting customer tickets.

Wholeheartedly agreed.

-- 
Andrea Bolognani / Red Hat / Virtualization