[libvirt] [PATCH v2 6/6] tools: make virt-host-validate check CPU vulnerabilities

Mon Sep 30 15:30:18 UTC 2019

On Mon, Sep 30, 2019 at 05:10:10PM +0200, Andrea Bolognani wrote:
> On Mon, 2019-09-30 at 15:30 +0100, Daniel P. Berrangé wrote:
> > On Mon, Sep 30, 2019 at 10:55:00AM +0200, Martin Kletzander wrote:
> > > Given the fact that most of these could just be virFileReadValueUint() it does
> > > not even make it easier to read or write the code.
> > 
> > Errr, virFileReadValueUint reads a single integer from a file.
> > 
> > This is extracting a single *word* from a specific field in a
> > file.
> > 
> > Doing this in code would involve doing a regex match the same as
> > here.
> > 
> > > Every time someone will want to add a new check or a fact they will need to find
> > > a similar one, copy-paste it, change it and hope for the best.  This introduces
> > > yet another "language" on top of the two you are adding already.  I really do
> > > not see any benefit in this.
> > 
> > Taking this particular example.
> > 
> >   - name: cpu.vulnerability.l1tf_smt
> >     filter:
> >       fact:
> >         name: os.kernel
> >         value: Linux
> >     value:
> >       file:
> >         path: /sys/devices/system/cpu/vulnerabilities/l1tf
> >         ignoreMissing: true
> >         parse:
> >           scalar:
> >             regex: SMT (\w+)
> >             match: 1
> > 
> > It could be approximately equiv to code
> > 
> >         if runtime.GOOS == "Linux" {
> > 	    return "", nil
> > 	}
> > 
> >         data, err = ioutils.ReadFile("/sys/devices/system/cpu/vulnerabilities/l1tf")
> > 	if err != nil {
> > 	    if  os.IsNotExist(err) {
> > 	        return "", nil
> > 	    }
> > 	    return "", err
> > 	}
> > 
> > 	match, err := regexp.Find(`SMT (\w+)`, data)
> > 	if err != nil {
> >  	    return "", err
> > 	}
> > 	err = SetFact("cpu.vulnerability.l1tf_smt", match)
> > 	if err != nil {
> > 	    return "", err
> > 	}
> > 
> > The code does look more familiar, but it is certainly not simpler.
> > Most notabley there are alot of error handling cases here that need
> > to be dealt with, even when written in Go. Doing this in C with its
> > string handling support involves even more error handling. This
> > is a significant maintainence burden that is not well handled in
> > a consistent manner across the code.
> > 
> > This is a fairly simple check example, case where we want to extract
> > data from a more more complex file formats result in much more code
> > complexity.
> > 
> > In the pure code approach, we have mixed up code related to reading
> > files, running commands, parsing data formats, setting facts, printing
> > messages, skipping execution when certain other conditions applied.
> > Hidden in this is the information about what feature we are actually
> > checking for.
> > 
> > A data driven approach to this problem means that we get clear separation
> > between distinct parts/jobs.
> > 
> > The engine written in code concerns itself with generic facilities for
> > reading files, running commands, parsing data, and all the error handling
> > and data processing.
> > 
> > Thereafter the interesting checks is simply a declarative expression of
> > what data we wish to extract for each fature, and declarative expression
> > of dependancies between features.
> > 
> > This engine code has greater complexity than what exists today for sure,
> > but the key thing is that this is mostly a one time upfront code, and is
> > the least interesting part of the tool. What is most interest is the set
> > of features being checked for and the relationships between the features.
> > 
> > Having the declarative part separated from the functional part is where
> > the value arises. 
> > 
> >  - Changes to the functional code apply consistently across all
> >    the checks made. For example, an improvement to error handling
> >    to regex processing was done once and applied to all the checks
> >    that used regex. Or consistently handling when reading files
> >    that are potentially missing on systems.
> > 
> >  - New checks can be introduced in a more reliable and consistent
> >    manner. These CPU vulnerability checks are an example of this
> >    in practice. We merely have to express what files we're interested
> >    in and how to match the relevant data in them. No need to worry
> >    about reading files, compiling & running regexs, making sure the
> >    error handling was done in the same way as other existing checks.
> 
> All of the above mostly sounds like "use functions" to me. Something
> like
> 
>   value, err := GetIntFactFromFile(
>       "/sys/devices/system/cpu/vulnerabilities/l1tf",
>       "SMT (\w+)",
>       OsLinux,
>       IgnoreMissing,
>   )
>   if err != nil {
>       return err
>   }
>   SetIntFact("cpu.vulnerability.l1tf_smt", value)
> 
> requires basically as much upfront knowledge to understand as the
> YAML version, but doesn't introduce a new DSL. And hey, it's shorter
> than either version! :)

"use functions" is another way of saying separate the declarative
data from the functional logic. This series has indeed taken this
approach and has such functions in the data processing engine.

They look more complex than what you've illustrated here, because
they cope with many more scenario. Those set of args you've shown
and just one combination of many possibilities. 

So instead of passing the arguments inline like this, you'll
want to define a struct to hold the arguments & just pass a
struct to the function. This is such a good idea this impl
already defines such structs to hold the data.

Once you've got this nice separate of data and code, loading
the data from a file is just one tiny step more to make a more
flexible tool allowing drop in extensions. 

> >  - The declarative data can be processed algorithmically. For example
> >    given the depedancies expressed between the facts, I saw that it
> >    was not neccessary to define all the facts in the same data file.
> >    It was possible split them across many files, load those files in
> >    any order, and algorithmically determine exactly which subset of
> >    checks need to be executed & in what order.
> > 
> >  - Having separated data from the code it is obviously possible to
> >    extend this without introducing any code changes. This is possible
> >    to use from outside libvirt. For example there are usage scenarios
> >    which may not be supported by certain versions of QEMU. The QEMU
> >    package can drop in some facts which will be picked up by the
> >    virt-host-validate tool. Alternatively going up the stack, an
> >    app like OpenStack which uses libvirt may wish to restrict what
> >    they use, or wish to check extra virt host features interesting
> >    to their app's usage of libvirt. Again their package can now
> >    drop in extra facts.
> 
> This last point is the only advantage I can see very clearly. I'm
> not sure it's worth introducing a specific format for, though.
> 
> >  - New features can be built ontop of the declarative data. At first
> >    I just set out to replicate the existing tool's output as is. Once
> >    that was done, it was obvious that we could trivially add a new
> >    feature allowing us to print the raw informationm about each
> >    fact that was checked, as an alternative to the human targetted
> >    message strings. IOW we got a machine readable output format
> >    essentially for free by virtue of using declarative data.
> 
> This is a benefit of storing the information as facts, which as I
> said before I think is a good idea; it doesn't mean we have to
> obtain said facts by processing YAML files.

Of course you could define everything via a set of structs in the
code, but its crazy to do that as you've now hardcoded everything
at build time, pointlessly throwing away the key extensibility
benefit of having it defined via a data file.

> >  - The implementation can be rewritten again at will. If the original
> >    C code had this split of data vs processing logic, this conversion
> >    of langauge would have been even more compelling, as none of othe
> >    declarative data would have needed changing. I'm not expecting
> >    another rewrite, but this capability was already valuable, between
> >    the v1 and v2 posting of this patch I changed from XML to YAML for
> >    the data format. This conversion was entirely automated, because
> >    it could be algorithmically processed.
> 
> If you had used code instead of a pure data format as the source for
> information, then you wouldn't have needed to perform that conversion
> in the first place O:-)

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|