Moving project infrastructure / services to GitLab

Mon Mar 2 11:01:55 UTC 2020

We've discussed the idea of moving to a GitForge before, with the debate
circling around the pros/cons of mailing list vs merge request workflows.

This mail is NOT focused on that particular debate, rather it is tackling the
much broader issue of libvirt project infrastructure consolidation, such that
we can eliminate time spent as OS sysadmins, while having long term sustainable
infra which facilitates collaboration for project participants & integration
between services.

Having said that, a move to a merge request workflow IS facilitated by this
and IS one of the items proposed. A full consideration of this item would make
this mail way to large. Thus I won't discuss it in any significant detail here,
merely mention it as a bullet point. I'll send a separate mail for this topic
later, as it isn't a pre-requisite for any of the other changes described in
this mail.

The short summary
=================

The mail is quite a long read, so lets get the key points out of the way in a
few lines.

 - Libvirt uses a wide range of services, spread across many different
   providers (redhat.com, quay.io, gitlab.com, travis-ci.org, openshift.com
   and libvirt.org)

 - The interaction/integration between the services we use is minimal or
   non-existant.

 - There is no consistency to whom has access / admin rights and multiple
   logins are needed by people involved.

 - Several services rely on individual people to do key manual tasks

 - Several services are only managable by Red Hat employees (BZ, mailman,
   openshift)

 - The libvirt.org server is outdated, single point of failure, that is causing
   frequent problems for the website build.

 - Maintaining infrastucture is not a good use of libvirt contributors limited
   time.

 - GitLab can consolidate all the services that our *current* dev workflow
   requires, except for the mailing list. This is beneficial even without
   merge requests as a workflow.

 - Libvirt project will be following commonly accepted best practices for open
   source project development, lowering the barrier to entry / project specific
   knowledge gap for contributors.

The key proposed changes
========================

The text below will outline each of the infrastructure changes to be performed
with key points for their rationale. Although the points below are numbered,
they should NOT be considered to be a strictly ordered sequence. Most of the
points are independent with few or no dependancies on other points.

 1. Use gitlab.com as the location of the "master" git repositories

    Current committers to libvirt.org would have create accounts on gitlab.com
    and upload their SSH key.

    No change in development workflow, merely update the "origin" URI.

    Any current committer who has comitted in the last 12 months would get
    access to the new git repos. This would cleanup any inactive people who
    no longer contribute to libvirt.

    Gives us the ablity to have per-GIT repo commit privileges

    Partially eliminates DanB / DanV as points of failure on libvirt.org, as
    the gitlab.com project admin privileges can be more flexibly granted, as
    compared to multiple people having root on DanV's personal server.

    Eliminates the libvirt.org physical server as a single point of failure
    for SCM, which has no disaster recovery plan in place.

    Improved reliability as libvirt.org anon-git breaks periodically

    libvirt.org SCM would remain as a read-only mirror, to serve as a disaster
    recovery option should we need it.

 2. Use gitlab.com CI job as a way to generate the static website

    Replaces the frequently breaking cronjob on libvirt.org which runs
    configure+make.

    Is more reliable and secure since it runs in a confined container
    with known good distro package envionrment matching libvirt's min needs.

    A new cron job on libvirt.org would download the published artifact
    from the CI job, and deploy it to libvirt.org apache root

    Partially elimintes DanB / DanV as points of failure, as the most
    likely part to break is now in the CI job & fixable by any libvirt
    contributor.

    Still reliant on libvirt.org server for web presence in near term.

 3. Use gitlab.com file hosting for release artifacts

    Replaces the current use of libvirt.org/sources/ download archive.

    gitlab is said to have a default size limit of 100 MB, but raised
    to 1GB on gitlab.com. It is believed this is a per-file limit, but
    it is unclear if there is also a cummulative limit across all files.
    Limits must be confirmed before attempting this change.

    libvirt.org has 4.5 GB of tar.xz files for libvirt, 7 GB of rpm files,
    and 0.5 GB of other pieces (ie bindings)

    The RPMs have been produced against a wide variety of distros over
    time, from Fedora 12 to Fedora 30. The RPMs don't provide much obvious
    value since downstream users have a variety of OS. Fedora Virt Preview
    repo will provide the very same content after release in an easy to
    consume YUM repo.

    All historical tar.gz/xz files would be uploaded to gitlab.

    No historical RPMs would be uploaded to gitlab.

    A cronjob would sync newly uploaded files in gitlab, back to libvirt.org
    to provide a disaster recovery option should we need it.

 4. Use gitlab.com as the primary upstream issue tracker

    This is to replace the current usage of bugzilla "Virtualization Tools"
    product.

    For the work we do with the upstream bugzilla product the GitLab issue
    tracker is a good match, avoiding the complexity Bugzilla has grown for
    dealing with RHEL process bureaucracy.

    It is good for users, as they no longer need to register for an account
    on BZ.

    It is easier & more inclusive for maintainers as changes to the issue
    tracker config are entirely self-service, instead of via a private Red
    Hat issue tracker only available to Red Hat employees, or knowing the
    right Red Hat admins personally.

    Repo forks for major pieces of work can have their own issue tracker
    for free, providing a collborative way to track the problems before
    the code lands in upstream.

 5. Use gitlab.com CI and container registry to build and host the CI images

    This replaces our use of Quay.io

    No longer any need for manually triggering container builds on Quay.io
    Any libvirt maintainer can make changes to the CI/container setup and it
    gets automatically processed when the changes hit git master.

    libvirt-dockerfiles project would no longer be required, as dockerfiles
    needed by each git repo would be added to that git repo. eg libvirt.git
    would contain the its own docker files (generated from lcitool still)/

    Eliminates the complexity or breakage when needing to deploy changes to
    the container images in lockstep with changes to the CI control rules in
    the project.

    Eliminates the need to create yet another account login for Quay

 6. Use gitlab.com CI as primary post-merge build and test platform

    Red Hat has recently provided libvirt significant resource on both an
    OpenStack and OpenShift, to serve as CI runners for libvirt, as we see
    fit.

    We can initially use the shared runners for all Linux testing and provide
    our own docker containers as the environment.

    If our CI throughput requires it, we can provide further private runners
    for Linux via the Red Hat OpenShift resources we have access to.

    For FreeBSD we would need to make lcitool install the gitlab agent in the
    VM images. Can optionally do this for Linux images too if desired.

    Choice of either carrying on using the physical host from CentOS CI
    runners, but just connected to GitLab instead of Jenkins, or go straight
    to new VMs deployed on Red Hat OpenStack resource we have access to.

    Consolidates all our CI logic (except macOS) into one place in GitLab CI
    yaml config. All the jenkins job builder logic in libvirt-jenkins-ci.git
    is obsoleted, lcitool remains though potentially simplified.

    Consistent with use of CI for generating website static content and
    building container images.

    Eliminates the need to use & manage CentOS CI, and eliminates need for
    Travis CI except for macOS, so fewer accounts for contributors to create.

    Any forks of the gitlab repo will automatically have a full set of CI
    which is good for developers

 7. Use gitlab.com as the project wiki

    Replaces the current mediawiki install that is deployed via a Red Hat
    hosted OpenShift instance

    Would require a way to do an automated migration of the content from
    the current mediawiki deployment that I manage for wiki.libvirt.org

    Would requires a way to HTTP redirects from the old URLs

    DanB is eliminated as a single point of failure for the wiki and no longer
    has to waste time playing sysadmin for mediawiki / mysql.

 8. Use gitlab.com pages for hosting virttools.org blog planet

    Replaces the current hosting of planet tools in a Red Hat hosted
    OpenShift instance

    Setup the planet software as a periodic gitlab CI job publishing
    artifacts to be server on gitlab pages.

    DanB is eliminated as a single point of failure for the planet and no
    longer has to waste time playing sysadmin

 9. Use gitlab.com merge requests for non-core projects

    This means every git repo that is not the core libvirt.org. All the
    language bindings, the other object mappings (libvirt-dbus/snmp/etc)
    and supporting tools (libvirt-jenkins-ci, etc)

    All these projects would now benefit from pre-merge CI testing which
    will catch build problems before they hit master. There is less
    disruption to downstream consumers & no need for "build breaker fix"
    rule to push changes without review.

    The patch traffic for these repos is fairly low compared to libvirt.git
    and the patches themselves tend to be simpler and thus easier to review.

    Moving the non-core projects thus makes a smooth onramp for the libvirt
    contributors to become more familiar with the merge request workflow,
    and identify any likely places we might need to invest in tooling

    Refer to separate mail to follow later for full consideration.

 10. Use gitlab.com merge requests for core project

    This is the final piece, removing the mailing list workflow for the main
    libvirt.git repo.

    Same points as for merger requests with non-core projects

    Refer to separate mail for full consideration.

Background information
======================

As I was coming up with this email, I spent a bunch of time thinking about the
history of how libvirt project infra has grown, and what has happened to the
open source world in that time. What follows is stuff I wrote to help my own
understanding of why the GitForge model is so appealing & has grown so fast.
This was originally going to be the main part of the email, but I changed to
put all the concrete actions first.  This is just background that motivated
the changes.

Project history
---------------

Since the very beginning of the project, libvirt has followed an email based
development workflow, using the libvir-list mailing list. At this time, email
based development was the standard model followed by all significant projects.
Prominent hosting options like sourceforge & its clones, offered a service
approx covering email, SCM and bug tracking. 

Over time we've made some changes to the process for libvirt, but nothing
major. The most notable changes were switching from CVS to Git, and the use
of CentOS CI for post-merge testing & formalization of our platform support.
Less notable were mandating Signed-off-by, and partial usage of Reviewed-by
tags.

In the time since we switched to Git, the open source world has changed
massively with the rapid adoption of Git Forge services. An email based
workflow is no longer the norm, it is the rare outlier. This has gone hand in
hand with the increased recognition of the importance of CI automation into
the development workflow, and more recently importance of containers as a
deployment & distribution mechanism.

Libvirt.org management
----------------------

The libvirt.org server & domain registration is owned & managed by DanV. I
have sudo access to do administrative tasks too. It also hosts xmlsoft.org for
libxml / libxslt. This server is running RHEL-6 which has increasingly caused
us problems, since libvirt itself stopped supporting RHEL-6. This impacted
our ability to create nightly tarballs, and update the static website in
particular. It is also a major single point of failure both in terms of
hardware and administration access.

The key reason why the libvirt project exists on the libvirt.org server is
because in the early days of the project we considered it important that the
infrastructure used by the project was NOT under the control of Red Hat
corporate and IT. This was an attempt to promote the project as independently
run, as well as provide resilience for the long term, should Red Hat loose
interest in its development.

Since we manage libvirt.org we can in theory grant access to anyone we need
to who is involved in the project. Management of the server is very old school,
however, with no automation. We've been lucky to not have many serious outages
over the years. Access control is also crude as there's no two factor auth,
no fine grained repository commit permissions.

This leads to three key questions

 - Is the use of the libvirt.org physical server neccessary for the
   long term viability of the libvirt project

 - Are we doing a good job at maintaining its services and ensuring we
   are resilient to problems that occurrs

 - Is working on sysadmin tasks a sensible and productive use of project
   maintainer time

I'm pretty clear that the answer to all these questions is NO.

I still believe in the key point of avoiding a dependancy on Red Hat corporate
and IT, but my priorirty is changed. The original reasons are still valid, but
much more importantly than that, modern self-service infrastructure is a more
flexible approach, than infrastructure that depends on individual admins (like
myself and DanV for libvirt.org), or by opening tickets to get changes made
that take many days or weeks to be looked at (mailman, BZ).

Understanding the appeal of the Git Forge
-----------------------------------------

In the early days of open source, projects would start off in a single
maintainer mode, with code shared in an adhoc manner. Even if the maintainer
used SCM, typically no one would see it, so everyone else was a second class
citizenship in terms of participation.

If a project took off in popularity with multiple contributors actively
collaborating, it would have to organize its own infrastructure for something
like a CVS server, mailing list and a possibly a website. This was a burden
for whomever maintained the infra, but at least all active contributors were
now first class citizens. With an SCM like CVS though, infrequent contributors
were still second class citizens.

SourceForge was the first big service to offer commodity project hosting
infrastructure for free to open source projects. This reduced the burden on
project maintainers, no longer requiring them to spend time on sysadmin tasks.
SourceForge was briefly released as open source software, but went closed
source again, resulting in various forks, which you still see evidence of in
services like GNU Savannah. Use of this hosting was following a classic model
of needing to apply for a new project, get approval and wait for it to be
created on the backend.

In the classic world, forking a project had a very high cost because the fork
is cut off from any interaction with the origin project's infrastructure, even
if using a platform like sourceforge. A fork was thus a long term burden and
only desirable if there was compelling benefit to be had. If the fork withered,
it was often lost in the ether as its infrastructure went away.

The arrival of Git started something of a revolution for open source projects.
It democratized usage of the SCM, such that part time contributors are no
longer second class citizens vs the active maintainers. Everyone has accesss to
the same set of SCM tooling functionality.

The SCM tools are only a small part of the infra around a project, there is
still the web hosting, public SCM hosting, mailing list, bug trackers, CI
infrastructure, etc which are all an increasingly important part of a project
with multiple active contributors.

The compelling value of GitHub and GitLab is that they provide the way to
democratize all aspects of the project infra hosting to an extent that was not
achieved with the SourceForge like platforms. The keys differences are that
the new Git Forge platforms are completely self-service / instantaneous, while
also providing and encouraging a collaborative model between the project forks
and their services (linking issue trackers / merge requests across projects).

The projects' active maintainers are no longer in a tier above part time
contributors when it comes to project infrastructure. With one click to fork
the project, the user has access to the full range of infrastructure that the
original project had, SCM, bug tracking, web hosting, CI. Even more importantly
the fork is not cut off in a silo from the origin. The merge request model
makes it trivial to feed changes from a fork into the origin, or to reference
bugs betweeen projects. The notion of which project is "the origin" is now
fluid.

Forks no longer need to be thought of as a costly thing to avoid as much as
possible, rather they become a convenient tool for innovation, to develop code
for ideas which can't immediately be done as part of the origin project.
Successful ideas can then feed back into the origin.

It is no longer which project owns the infrastructure that matters most, but
rather which one has the biggest gravity amongst contributors, drawing in
their pull requests and bug reports. This may change over time, which is
especially beneficial for the lone-maintainer projects where the original
author looses interest, but a new person steps up to drive it forward. It is
also useful to multi-maintainer projects to have infra on a neutral third
party, so if the company sponsoring project developer changes focus and stop
funding infra, there's no longer a need to redeploy elsehwere. In this way the
forges help to keep compelling code alive over the long term and facilitate a
collaboration model that is stronger than that exhibited with the pre-GitForge
approach to project infrastructure.

Tangent: The fall from grace of SourceForge is a cautionary tale of risk of
relying on closed source hosted software for infrastructure. This lesson can be
extended to cover any hosted service in general, even if running open source
software, and taken to its logical conclusion, a project would end up hosting
everything itself. The latter has a huge burden in both cost and time, as well
as ease of collaboration. Thus there needs to be a risk/cost/reward tradeoff
made to decide where to draw the line.  Relying on hosted services, but only
those based on open source software, is one pragmatic choice in where to place
the line that we've considered appropriate for libvirt, hence our use of Quay
in preference to DockerHub, and CentOS CI. This is the driver for picking
GitLab over GitHub, maximising the feature set available, but still using a
software platform that is open source with proven 3rd party deployments such as
those used by the GNOME and FreeDesktop projects. IOW we can use public hosted
gitlab.com to eliminate our sysadmin burden, while having confidence in our
ability to switch to self-hosted if circumstances change.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|