[libvirt] [PATCH v2 6/6] tools: make virt-host-validate check CPU vulnerabilities
Andrea Bolognani
abologna at redhat.com
Mon Sep 30 15:10:10 UTC 2019
On Mon, 2019-09-30 at 15:30 +0100, Daniel P. Berrangé wrote:
> On Mon, Sep 30, 2019 at 10:55:00AM +0200, Martin Kletzander wrote:
> > Given the fact that most of these could just be virFileReadValueUint() it does
> > not even make it easier to read or write the code.
>
> Errr, virFileReadValueUint reads a single integer from a file.
>
> This is extracting a single *word* from a specific field in a
> file.
>
> Doing this in code would involve doing a regex match the same as
> here.
>
> > Every time someone will want to add a new check or a fact they will need to find
> > a similar one, copy-paste it, change it and hope for the best. This introduces
> > yet another "language" on top of the two you are adding already. I really do
> > not see any benefit in this.
>
> Taking this particular example.
>
> - name: cpu.vulnerability.l1tf_smt
> filter:
> fact:
> name: os.kernel
> value: Linux
> value:
> file:
> path: /sys/devices/system/cpu/vulnerabilities/l1tf
> ignoreMissing: true
> parse:
> scalar:
> regex: SMT (\w+)
> match: 1
>
> It could be approximately equiv to code
>
> if runtime.GOOS == "Linux" {
> return "", nil
> }
>
> data, err = ioutils.ReadFile("/sys/devices/system/cpu/vulnerabilities/l1tf")
> if err != nil {
> if os.IsNotExist(err) {
> return "", nil
> }
> return "", err
> }
>
> match, err := regexp.Find(`SMT (\w+)`, data)
> if err != nil {
> return "", err
> }
> err = SetFact("cpu.vulnerability.l1tf_smt", match)
> if err != nil {
> return "", err
> }
>
> The code does look more familiar, but it is certainly not simpler.
> Most notabley there are alot of error handling cases here that need
> to be dealt with, even when written in Go. Doing this in C with its
> string handling support involves even more error handling. This
> is a significant maintainence burden that is not well handled in
> a consistent manner across the code.
>
> This is a fairly simple check example, case where we want to extract
> data from a more more complex file formats result in much more code
> complexity.
>
> In the pure code approach, we have mixed up code related to reading
> files, running commands, parsing data formats, setting facts, printing
> messages, skipping execution when certain other conditions applied.
> Hidden in this is the information about what feature we are actually
> checking for.
>
> A data driven approach to this problem means that we get clear separation
> between distinct parts/jobs.
>
> The engine written in code concerns itself with generic facilities for
> reading files, running commands, parsing data, and all the error handling
> and data processing.
>
> Thereafter the interesting checks is simply a declarative expression of
> what data we wish to extract for each fature, and declarative expression
> of dependancies between features.
>
> This engine code has greater complexity than what exists today for sure,
> but the key thing is that this is mostly a one time upfront code, and is
> the least interesting part of the tool. What is most interest is the set
> of features being checked for and the relationships between the features.
>
> Having the declarative part separated from the functional part is where
> the value arises.
>
> - Changes to the functional code apply consistently across all
> the checks made. For example, an improvement to error handling
> to regex processing was done once and applied to all the checks
> that used regex. Or consistently handling when reading files
> that are potentially missing on systems.
>
> - New checks can be introduced in a more reliable and consistent
> manner. These CPU vulnerability checks are an example of this
> in practice. We merely have to express what files we're interested
> in and how to match the relevant data in them. No need to worry
> about reading files, compiling & running regexs, making sure the
> error handling was done in the same way as other existing checks.
All of the above mostly sounds like "use functions" to me. Something
like
value, err := GetIntFactFromFile(
"/sys/devices/system/cpu/vulnerabilities/l1tf",
"SMT (\w+)",
OsLinux,
IgnoreMissing,
)
if err != nil {
return err
}
SetIntFact("cpu.vulnerability.l1tf_smt", value)
requires basically as much upfront knowledge to understand as the
YAML version, but doesn't introduce a new DSL. And hey, it's shorter
than either version! :)
> - The declarative data can be processed algorithmically. For example
> given the depedancies expressed between the facts, I saw that it
> was not neccessary to define all the facts in the same data file.
> It was possible split them across many files, load those files in
> any order, and algorithmically determine exactly which subset of
> checks need to be executed & in what order.
>
> - Having separated data from the code it is obviously possible to
> extend this without introducing any code changes. This is possible
> to use from outside libvirt. For example there are usage scenarios
> which may not be supported by certain versions of QEMU. The QEMU
> package can drop in some facts which will be picked up by the
> virt-host-validate tool. Alternatively going up the stack, an
> app like OpenStack which uses libvirt may wish to restrict what
> they use, or wish to check extra virt host features interesting
> to their app's usage of libvirt. Again their package can now
> drop in extra facts.
This last point is the only advantage I can see very clearly. I'm
not sure it's worth introducing a specific format for, though.
> - New features can be built ontop of the declarative data. At first
> I just set out to replicate the existing tool's output as is. Once
> that was done, it was obvious that we could trivially add a new
> feature allowing us to print the raw informationm about each
> fact that was checked, as an alternative to the human targetted
> message strings. IOW we got a machine readable output format
> essentially for free by virtue of using declarative data.
This is a benefit of storing the information as facts, which as I
said before I think is a good idea; it doesn't mean we have to
obtain said facts by processing YAML files.
> - The implementation can be rewritten again at will. If the original
> C code had this split of data vs processing logic, this conversion
> of langauge would have been even more compelling, as none of othe
> declarative data would have needed changing. I'm not expecting
> another rewrite, but this capability was already valuable, between
> the v1 and v2 posting of this patch I changed from XML to YAML for
> the data format. This conversion was entirely automated, because
> it could be algorithmically processed.
If you had used code instead of a pure data format as the source for
information, then you wouldn't have needed to perform that conversion
in the first place O:-)
> These are all typical & expected benefits that arise from separating
> functional logic from declarative data. In the long term this is a
> clear win, & as I mentioned I would have done this sooner had it not
> been for our use of C.
I expect that if you would have posted patches introducing this
machinery to the C version of virt-host-validate you would have seen
more or less the same reactions.
> > If I was to pick a new feature we could benefit from, I would much rather prefer
> > having an opt-in for report-home of HW features and usage for some very rough
> > anonymous statistics.
>
> I don't think we want to get involved in reporting user data back to any
> server, as it is a data privacy minefield. This tool can be used by other
> existing reporting tools though - eg Red Hat distros often include the
> 'sosreport' tool that bundles stuff up for reporting customer tickets.
Wholeheartedly agreed.
--
Andrea Bolognani / Red Hat / Virtualization
More information about the libvir-list
mailing list