[libvirt] [GSOC] project libvirt fuzzing

D L srwx4096 at gmail.com
Tue Mar 7 20:22:28 UTC 2017


On Tue, Mar 7, 2017 at 4:08 AM, Michal Privoznik <mprivozn at redhat.com>
wrote:

> On 03/07/2017 06:27 AM, D L wrote:
> > On Sun, Mar 5, 2017 at 2:47 AM, Michal Privoznik <mprivozn at redhat.com>
> > wrote:
> >
> >> On 04.03.2017 07:23, Da L wrote:
> >>> Dear all,
> >>>
> >>
> >> Hey,
> >>
> >>> This is my first post in the list.
> >>
> >> Very well. Welcome. It is always nice to see people interested in
> libvirt.
> >>
> >> Hi Michal,
> >
> > Thank you very much for the explanation and encouragement.
> > I am so glad to join the community.
> >
> >
> >>>
> >>> I am currently a graduate student studying computer science,
> particularly
> >>> interested in visualization technologies and I have been using QEMU
> for a
> >>> variety of projects for a while. Two of the courses that I am taking
> this
> >>> semester really attracted me to the libvirt community  are Advanced
> >>> Operating Systems and Secure Software Development. I have been learning
> >>> kernel fuzzing as well as other general fuzzing tools.
> >>>
> >>> Then I found the topic of "QEMU command line generator XML fuzzing" is
> >>> pretty interesting and totally in line with my interest and background.
> >>> Though I have read through the documentations on the website, just to
> >> make
> >>> sure I am doing it correctly, could anyone confirm this project is
> still
> >>> available? And what I need to do next in order to participate the
> project
> >>> this summer? Do I need to find a mentor by myself? Potentially, I could
> >>> find my OS or Security professor as my mentor, but I am not sure yet
> >> which
> >>> would be the best way.
> >>
> >> Yes, the project is still on. It does not have a mentor assigned yet,
> >> but don't worry about that now - there is a lot of mentors around. For
> >> now, I can be your point of contact.
> >>
> >> So, just to explain you some details of the project: libvirt's format
> >> for storing domain configuration is XML. However, none of the
> >> hypervisors out there uses XML to describe domain configuration. For
> >> instance, in qemu it's all about the command line. You want this disk
> >> for you domain? You have to put it onto the command line. And so on.
> >> Therefore, in a very simplistic way, for qemu libvirt translates the XML
> >> into qemu command line language. Now, this process is very complex and
> >> sort of tricky. That's why we would like to generate "all" possible
> >> combinations of XML, let the command line generator crunch them and
> >> produce qemu command line. Well, that's not entirely true, because
> >> command line generator works over some internal representation of domain
> >> (not XML) that is produced by our XML parser:
> >>
> > Please correct me if I am wrong about  my following understanding:
> > 1. Regarding XML config file, one typical usage with libvirt could be:
> >     $ virsh define <domain_config_file.xml <http://your_xml_config_file.
> ml>>
>
> The file has to be stored locally. Libvirt doesn't have an
> 'url-grabber'. In fact, our APIs expect XML document passed as string
> (not a filename where it is stored). It's just virsh that allows users
> to point it to a file which is read and passed to the define API.
>
> > 2. I noticed in the source code of libvirt, there exist several files in
> > close relation
> > to xml, including src/util/virxml.{c,h}, which might be the target of
> this
> > project?
>
> Sort of. virxml.c file contains XML parsing helpers (mostly higher-level
> APIs over libxml2). The XML parsing is done in src/conf/domain_conf.c
> (or network_conf.c for libvirt networks, etc.). The entry point for
> exploring domain XML parsing can be virDomainDefParseString() function.
> BTW: while exploring libvirt sources I strongly advice to use so called
> tagged sources ("make tags" or "ctags -R ." or some equivalent), because
> libvirt sources consists of lots of short functions calling other
> functions. Tagged sources then allow developers to jump onto symbol
> under cursor (in vim it is "CTRL + ]" or "g + ]" if the symbol is
> defined at multiple locations).
>
> Hi Michal,

Thank you so much for the detailed description. I will get back to you for
each point in detail next week.

By the way, so nice to see the power of vi in a real project.

Best,

Dan

Now that we have parsed the domain XML into internal representation
> (virDomainDef), we can look into qemu command line generation. I think
> the whole process is best visible in qemuDomainCreateXML() (e.g. "vim -t
> qemuDomainCreateXML" ;-)). This is qemu driver implementation of public
> API virDomainCreateXML(). It allows users to create so called transient
> domains. Long story short: "here, I have domain XML, start it up for me,
> will you?". Therefore at the beginning the domain XML is parsed (using
> the function described above), several not-important-right-now functions
> are called and then qemuProcessStart() is called which calls
> qemuProcessLaunch() which calls qemuBuildCommandLine(). Finally, this is
> the function that takes the virDomainDef (among other arguments) and
> produces yet another internal representation of qemu command line
> (virCommandPtr). This command line is then executed later in the process.
>
> > 3. And libvirt also is compiled with libxml2.
>
> Yes. This has strong historical background (hint: look who started
> libvirt and who wrote libxml2 ;-)). Frankly, I don't think we've ever
> considered a different xml parsing library.
>
> > 4. Then in virt-xml-validate, which is a bash script,
> >   (in build/bin directory after make install) calling xmllint.
>
> Yeah. Writing our XMLs by hand can be overwhelming. Moreover, libvirt
> has this philosophy of ignoring unknown elements/attributes. So it might
> happen that for instance you have a typo in an element name and you're
> still wondering why libvirt ignores that particular setting (e.g. path
> to disk of domain). Therefore we have grammar rules (RNG) that could
> help you here - virt-xml-validate would error out in this example. Well,
> even virsh errors our now because it instructs libvirt to do the XML
> validation before parsing. But that hasn't been always the case.
>
> >
> > I have not been able to get round to figure out the relations of the
> above
> > pieces yet.
> > I spent some time to try to instrument and compile the executables with
> > AFL, but so
> >  far with no luck. (The idea is as simple as changing gcc in
> > Makefile/configure to afl-gcc).
> > The attached figure is just a demo showing using AFL to fuzz virt-admin,
> > which is
> > not instrumented, (so kinda of boring and not quite useful). But I think
> > AFL could be
> >  one of the candidate as a fuzzer for this project due its prevalence and
> > proved effectiveness.
>
> We don't have to limit ourselves just for domain XML -> qemu cmd line
> fuzzing. We can look into other areas too (there's a lot of inputs for
> libvirt), e.g. RPC protocol (we have our own protocol for communication
> with distant server/client over network), fuzz XML parsers themselves
> (domain is not the only object that libvirt manages, we have networks,
> interfaces, storage pools/volumes, etc.). It's just that qemu cmd line
> fuzzing seemed complicated enough so that the chances of running a
> fuzzer successfully are high.
>
> >
> > Regarding fuzzing, I think we can try several fuzzing tools to run in
> > parallel, as different
> >  fuzzers tend to find different kinds of bugs.
>
> True. I had this on my mind as well.
>
> > Thus, AFL (American Fuzz
> > Lop) [1],
> > which is a coverage-guided mutation-based fuzzer with genetic algorithm,
> > can
> > take hand-crafted xml seed to fuzz our libvert target. Alternatively, we
> > could
> > develop generation-based grammar module in AFL (which is definitely
> > non-trivial);
>
> Yeah, I thought about this when watching a talk on AFL. We might explore
> other possibilities - they already might have something we want.
>
> > so far I have not seen active development in AFL community on xml format
> > grammar generation. Another option could be clang-libfuzzer [2].
> >
> > Several related articles show examples of fuzzing are using AFL to
> generate
> > SQL [3], llvm-afl [4], and hexml fuzzing with AFL [5]. In combination
> with
> > lcov, we
> >  could compare different fuzzers and guide our fuzzing tuning.
>
> Yes, good idea.
>
> >
> > NOTE  the [5] example is quite interesting; it is fuzzing a
> haskell-written
> > xml paser.
>
> Indeed.
>
> >
> > I will probably not update more until next week; I am having three
> > mid-terms this week.
>
> Good luck.
>
> >
> > [1] http://lcamtuf.coredump.cx/afl/
> > [2] http://llvm.org/docs/LibFuzzer.html
> > [3]
> > https://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-
> grammar-with.html
> > [4] http://lists.llvm.org/pipermail/llvm-dev/2014-December/079390.html
> > [5] https://github.com/ndmitchell/hexml/issues/6
> >
> > Again, thanks a lot. Any guidance, comments, or suggestions would be more
> > than
> > welcome and highly appreciated.
> >
> > Best,
> >
> >
> > Dan
>
> Michal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20170307/e0313563/attachment-0001.htm>


More information about the libvir-list mailing list