[libvirt] Redesigning Libvirt: Adopting use of a safe language

Fri Nov 17 10:33:55 UTC 2017

On Thu, Nov 16, 2017 at 04:55:55PM -0500, John Ferlan wrote:
> On 11/14/2017 12:27 PM, Daniel P. Berrange wrote:
> > When libvirt was created, C was the only viable choice for anything aiming to be
> > a core system library component. At that time 2005, aside from C there were
> > common choices of Java, Python, Perl. Java was way too heavy for a low level
> > system component, Python was becoming popular but not widely used for low level
> > system services and Perl was on a downward trend. None of them are accessible to
> > arbitrary languages as libraries, without providing a RPC based API service. As
> > it turns out libvirt did end up having RPC based approach for many virt drivers,
> > but the original approach was to be a pure library component.
> > 
> > IOW it is understandable why C was chosen back in 2005, but 12 years on the world
> > around us has changed significantly. It has long been accepted that C is a very
> > challenging language to write "safe" applications. By "safe" I mean avoiding the
> > many problems that lead to critical security bugs. In particular the lack of a
> > safe memory management framework leads to memory leaks, double free's, stack or
> > heap corruption and more. The lack of strict type safety just compounds these
> > problems. We've got many tools to help us in this area, and at times have tried
> > to design our APIs to avoid problems, but there's no getting away from fact that
> > even the best programmers will continually screw up memory management leading to
> > crashes & security flaws. It is just a fact of life when using C, particularly if
> > you want to be fast at accepting new feature proposals.
> > 
> > It is no surprise that there have been no new mainstream programming languages in
> > years (decades) which provide an inherantly unsafe memory management framework.
> > Even back in 2005 security was a serious challenge, but in the last 10+ years
> > the situation has only got worse with countless high profile security bugs a
> > direct result of the choice to use C. Given the threat's faced today, one has to
> > seriously consider the wisdom of writing any new system software in C. In another
> > 10 years time, it would not surprise me if any system software still using C is
> > considered an obsolete relic, and ripe for a rewrite in a memory safe language.
> > 
> 
> Programming languages come and (well) go - it just so happens the C has
> been a survivor. There's always been challengers promising something,
> but eventually falling by the wayside when either they fail to deliver
> their promise or the next sexy language comes along. In my 30 years
> (eeks) even with all warts C has been there. It was certainly better
> than writing in assembly/macro or BLISS. I recall converting a lot of
> BLISS to C when I first started.

I had to go an look up what BLISS was - that's a new (well old) one for
me :-)

Just thinking about the languages that have risen up while I've been
using C, I'm not surprised that C has been a survivor in the area that
libvirt live. There's been a plethora of dynamic and/or scripting
languages that have become popular (perl, python, ruby, javascript,
to name but 4). None of these have really been a credible choice for
usage libvirt. Use of interpretors by default has limited their perf,
and forces a multi-process model to get any true parallelism. The
latter has been a huge bottleneck for OpenStack Nova's concurrency.
There's then the elephant in the room, Java, and with its JVM model
the memory footprint of running it is just crazy. In all of them,
interfacing to C is possible, but horribly unpleasant work to a large
degree. And of course there's always C++, but that takes C's already
complex world and makes it even more complex & leaves all the dangerous
aspects of C still there, so when you shoot yourself in the foot, it
blows away your entire leg. There's quite a few other interesting
languages around, but none of them have received such mainstream
usage, so are hard to justify if you want a broad contributor set.

Rust & Go by comparison, have offered something pretty unique in
languages that are still compiled to native and fairly low level
and easy to interface with C. Between them they have strong potential
to eliminate the need to use C for the majority of remaining usage
scenarios, which can't be said of other languages I've described
above. 

> There is something to be said about the "devil you know" vs. the one you
> don't! Just as much as there is a need to keep yourself "current" with
> technology trends. The latter becomes harder to do the longer I do this.

I don't disagree, but I think you would be pleasantly surprised at how
easy it is to learn Go if coming from a C background (or indeed a Python
or Java background too).

> 
> > There are long term implications for the potential pool of contributors in the
> > future. There has always been a limited pool of programmers able todo a good job
> > in C, compared to those who know higher level languages like Python/Java. A
> > programmer write bad code in any language, but in C/C++ that bad code quickly
> > turns into a serious problem. Libvirt has done ok despite this, but I feel our
> > level of contribution, particularly "drive by" patch submissions, is held back
> > by use of C. Move forward another 10 years, and while C will certainly exist, I
> > struggle to imagine the talent pool being larger. On the contrary I would expect
> > it to shrink, certainly in relative terms, and possibly in absolute terms, as
> > other new languages take C's place for low level systems programming. 10 years
> > ago, Docker would have been written in C, but they took the sensible decision to
> > pick Go instead. This is happening everywhere I look, and if not Go, then Rust.
> 
> I'm not convinced that "drive by" patch submissions are those we seek.
> As stated, libvirt is a fairly complex project. I would think drive by
> submissions lead to more problems regardless of the language chosen
> because a reviewer spends so much of his/her valuable time trying to
> assist the new contributor only to eventually learn that it is a drive
> by. Then those that are committed to the project are left to decide to
> drop the drive by submission or support it for years to come. Invariably
> there's some integration interaction that was missed.
> 
> I would hope our long term goal would be build up not only contributors,
> but more importantly reviewers. Again, doesn't matter what language is
> chosen, since libvirt has review requirements then it needs reviewers.
> If GO is a language from which to draw new contributors and more
> importantly reviewers, then great.

Yep, I don't disagree about our need for reviewers, as much as we need
contributors, if not more. I think pretty much every non-trivial project
I have seen suffers from a need for more reviewers. Every time you make
reviewers more efficient, you enable a greater flow of patches, and so
you enable more contributions. Just like building roads to solve traffic
jams, never actually solves traffic jams. So I won't claim that choice of
language is the driving factor in availablity of reviewers.  What I will
say is that more productive language would lets us focus attention on
the more interesting problems to libvirt, instead of working on general
infrastructure, portaibility and all the things that language doesn't
provide us. IOW reviewers would still be horribly overloaded, but the
stuff they would be reviewing would be more useful.

> Maybe it's a bit of 'bias' and terminology, but I've always thought
> there is a difference between programmer and software engineer. My FUD
> is that we attract too many of the former and not enough of the latter
> that are necessary to solve that complex issue.

I think that is basically true of every single open source software
project that's out there.

> > The other big trend of the past 10 years has been the increase in CPU core
> > counts. My first libvirt dev machine had 1 physical CPU with no cores or threads
> > or NUMA. My current libvirt dev machine has 2 CPUs, each with 6 cores, for 12
> > logical CPUs. Common server machines have 32/64 logical CPUs, and high end has
> > 100's of CPUs. In 10 years, we'll see high end machines with 1000's of CPUs and
> > entry level with mere 100's. IOW good concurrency is going to be key for any
> > scalable application. Libvirt is actually doing reasonably well in this respect
> > via our heavily threaded libvirtd daemon. It is not without cost though with
> > ever more complex threading & locking models, which still have scalability
> > problems. Part of the problem is that, despite Linux having very low overhead
> > thread spawning, threads still consume non-trivial resources, so we try to
> > constrain how many we use, which forces an M:N relationship between jobs we need
> > to process and threads we have available.
> > 
> 
> So GO's process/thread model is then lightweight?  What did they learn
> that the rest of us ought to know! Or is this just a continuation of the
> libvirtd discussion?

So in C you officially have the option of pthreads, which we use heavily.
As you know a pthread maps 1:1 to an operating system thread (at least in
the Linux impl, IIUC, that's not technically required by POSIX specs).
Each thread has a fixed stack size associated with it (defaults to 8MB
in size on Fedora at least). Of course a thread only uses 8MB of physical
memory if it actually touches all those pages. If you exceed the stack
bad things happen, so it is hard to pick a safe smaller size for pthread
stacks. The OS schedules the threads and deals with context switching as
for any normal process. Linux pthreads is nicely efficient compared to
the original LinuxThreads impl and other OS like Solaris, but it is still
a fairly heavy context switch.

Alternatively you have the option of inventing a Coroutine concept like
QEMU has used in its block layer. That lets a single OS thread run multiple
userspace threads, typically the application will switch between coroutines
manually at key points (like I/O operations). Coroutine context switching
is lighter than thread switching so it can be beneficial. You would typically
use a smaller stack size, but it is still a fixed stack, so bad stuff happens
if you pick too small a size. You have to decide which OS level thread runs
which coroutines manually.

Goroutines are basically a union of the thread + coroutine concepts. The
Go runtime will create N OS level threads, where the default N currently
matches the number of logical CPU cores you host has (but is tunable to
other values). The application code just always creates Goroutines which
are userspace threads just like coroutines. The Go runtime will dynamically
switch goroutines at key points, and automatically pick suitable OS level
threads to run them on to maximize concurrency. Most cleverly goroutines
have a 2 KB default stack size, and runtime will dynamically grow the
stack if that limit is reached. So this avoids the problem of picking a
suitable stack size, and avoids any danger of overruns. As a result it
is possible for a process to create *1000's* of goroutines and have less
overhead than if you tried todo the same thing in C with threads+coroutines
manually.

The fact that goroutines are so cheap means you can use simpler threading
designs for applications. eg instead of the approach where libvirt tries
to multiplex all I/O into a single event thread, and then have a pool of
threads for RPC calls, but then add another pool for threads for RPC calls
that must always run quickly, we could dramatically simplify things. Normal
practice in Go would just have a single Goroutine for each client socket
connection, and this would spawn a single Goroutine for each RPC call that
needs to be run. Essentially eliminates all the throttling & queuing of
calls that we do, which removes the bottlenecks it inherantly creates.
cf the problems that Prerna is trying to solve where our main loop is
getting block by QEMU event handling and the increasingly complex solutions
we're trying to invent to deal with it.

> Still it seems the pendulum has swung back to hardware and software
> needs to catch up. It used to be quantum leaps in processor speed as it
> related to chip size/density - now it's just leaps in the ability to
> partition/thread at the chip level. I'd hate to tell you about the boat
> anchor I had on my desktop when I first started!

IBM solved everything in the 60's & 70's on the mainframe, and we're still
trying to reinvent all the solutions they had :-P

> > Two fairly recent languages, Go & Rust, have introduced new credible options for
> > writing systems applications without sacrificing the performance of C, while
> > achieving the kind of ease of use / speed of development seen with languages
> > like Python. It goes without saying that both of them are memory safe languages,
> > immediately solving the biggest risk of using C / C++.
> > 
> 
> If memory mgmt and security flaws are the driving force to convert to
> GO, then can it be claimed unequivocally that GO will be the panacea to
> solve all those problems? Even the best intentions don't always work out
> the best. If as pointed out in someone else's response there have been
> CVE's from/for GO centric apps - how many of those are GO related and
> how many are App related? Not that it matters, but the point is we're
> shifting some amount of risk for timely fixes elsewhere and shifting the
> backwards compatible story elsewhere which could be the most
> problematic. Not everyone has the same end goal for ABI/API
> compatibility. Add to that the complexity of ensuring that a specific
> version of some package you've based your product/reputation on.

NB, I'm certainly not claiming it will solve all security flaws. Far
from it, there's been plenty of screwups we've done that are not at all
related to choice of language. I am claiming that we'll eliminate all
those flaws related to use of unsafe memory mangement. ie buffer overflows,
double frees, use of free'd memory, and all the other fun ways in which
we screw up and crash libvirtd, often enabling security attacks (even
if we don't file CVEs for most of them)

> Curious, is the performance rated vs. libc memory alloc/free or
> something else? I don't recall ever being on a project that didn't have
> some sort of way to "rewrite" the memory mgmt code. Whether it was shims
> to handle project specific needs or usage of caches to avoid the awful
> *alloc/free performance. Doing the GC is great, but what is the cost.
> Perhaps something we don't know until we got further down that path.

I don't see a compelling case where libvirt has performance critical
memory management requirements. Indeed use of manual malloc/free is
far from offering the best performance from a memory mgmt POV in
general. Our biggest performance problems come when we inherantly
self-limit our performance by using fewer threads than are really
needed to deal with the number of VMs we're managing. Assuming the GC
is not totally useless, I don't see a reason why it is an issue for
libvirt. In terms of scope Docker is very similar to what libvirt
tries todo, and probably has greater performance requirements than
libvirt because container density on a machine is usually much
higher than VM density. Beyond that you have apps like Etcd and
Kubernetes which have an order of magnitude greater performance
needs than libvirt, as they're managing across enter clusters of
100's of machines or more.  IOW, use of GC is not a concern for
me, and clearly outweighs the downsides of our current manual
approach both in terms of code simplicity and reliabity.

> > The particularly interesting & relevant innovation of Go is the concept of
> > Goroutines for concurrent programming, which provide a hybrid kernel/userspace
> > threading model. This lowers the overhead of concurrency to the point where you
> > can consider spawning a new goroutine for each logical job. For example, instead
> > of having a single thread or limited pool of threads servicing all QEMU monitor
> > sockets & API clients, can you afford to have a new goroutine dedicated to each
> > monitor socket and API client. That has the potential to dramatically simplify
> > use of concurrency while at the same time allowing the code to make even better
> > use of CPUs with massive core counts.
> 
> Sounds promising and complicated, but is the risk of libvirt discovering
> some flaw or limitation in goroutine's worth it?  IOW: Would libvirt be
> blazing a new trail or are other consumers that have "helped" work
> through the initial issues.

We would not be anywhere near pushing the boundaries of Go. Apps like
Etcd / Kubernetes stretch it far more than we would.

> Oh, and license wise it would seem we'd have to be careful, true? At
> least w/r/t attempting to utilize packages written or listed on the wiki
> page link. From just a quick scan there, it seems to be numerous
> "packages" available and some list difference licenses.

Yes, licensing is always a concern. Most commonly I see Go code under
more permissive licenses (BSD, Apache) than libvirt has traditionally
used. L(GPL)v2+ is compatible with both BSD & Apache, though for Apache
it relies on v2+ becoming v3+. L(GPL)v2-only code is a problem with
Apache compatibilty. We do unfortunately suffer from GPLv2-only in the
VirtualBox driver, inherited from the VirtualBox XPCOM API which is
GPLv2-only and hence why we had to move it out of libvirt.so into
libvirtd despite it being a stateless driver. The only real good option
there, aside from deleting it, is to isolate the VirtualBox driver still
further in a standalone process such that its problems are confined.
The refactoring of libvirtd I suggested would probably help with the latter

> Also, once chosen what happens if/when issues or incompatibilities are
> discovered in some package? Do we follow the same principle of GNULIB
> and try to fix it ourselves or somehow work around it? As I've learned
> through time - "how" someone else fixes a problem may not work out best
> and the degree of importance of the problem can result in delays in
> getting a resolution. Having some amount of control is nice and we just
> have to weigh the risk(s) of giving some of that away.

I don't have a clear answer for this, since it would probably depend on
the kind of problems we hit.  This kind of unknown is why a cautious
approach would be best, starting at the edge of libvirtd where scope
for interactions and/or impact on other work is limited. eg virtlogd
and virtlockd are both very self-contained services, so could be good
guinea pigs for initial prooving work without disrupting libvirt in
general.

> > I don't believe that the unique features of Rust, over Go, are important to the
> > needs of libvirt. eg while for QEMU it would be critical to not have a GC
> > doing asynchronous memory deallocation, this is not at all important to libvirt.
> > In fact precisely the opposite, libvirt would benefit much more from having GC
> > take care of deallocation, letting developers focus attention other areas. In
> > general, as from having a memory safe language, what libvirt would most benefit
> > from is productivity gains & ease of contribution. This is the core competancy
> > of Go, and why it is the right choice for usage in libvirt.
> 
> Depends on the GC, right? Is GC context/scope based? or overall APP
> based? There are certainly some particularly hairy uses of memory and
> arguments in libvirt code.

GC approach in Go is mark+swweep, but as mentioned above, when compared to
the scale of other apps using Go, I'm not concerned from libvirt POV.

> > The best way to start, however, is probably to focus on a simple self-contained
> > area of libvirt code. Specifically attack the virtlockd, and/or virtlogd daemons,
> > converting them to use Go. This still need not be done in a "big bang". A first
> > phase would be to develop the server side framework for handling our RPC protocol
> > deserialization. This could then just dispatch RPC calls to the existing C impls.
> > As a second phase, the RPC method impls would be converted to Go. Both of these
> > daemons are small enough that the conversion would be possible across the time
> > of a couple of releases. The hardest bit is likely ensuring compatibility for
> > the re-exec() upgrade model they support, but this is none the less doable.
> > The lessons learned in this would go a long way towards informing the best way
> > to tackle the bigger task of the monolithic libvirtd (or equivalently the swarm
> > of daemons the previous proposal suggests)
> > 
> 
> It will take though "someone" who knows GO and libvirt well enough
> start. At this time, I submit that pool of talent is quite limited. Not
> necessarily GO contributors, but those that understand the libvirt build
> system, how to mash things together, how to write good GO code, and what
> types of considerations one has to make when developing at the OS,
> daemon, and library level.

You can probably guess that the "someone" would be me ;-) You certainly
have a valid point here, which is again my a cautious approach is needed
rather than attempting to much of a "big bang". I certainly would not
start anywhere near the QEMU driver, since chance of disrupting other
devs productivity is far too high.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|