[EnMasse] Proposal: fewer repositories

Thu Jun 29 07:26:18 UTC 2017

Hi all,

Today, the components that we release in EnMasse is spread across 12 
github repositories. There are a couple of advantages to this approach:

    * We may release them independently. This is not something we do today
    * Each component may be built independently by travis
    * Makes it easy to do incremental builds and only build the 
component that was changed
    * Developers don't have to be 'disturbed' by other components and 
merge conflicts

There are, however, a few disadvantages:

    * Duplication of build configuration across components. Right now we 
have a ~50 line .travis.yml in each repository that essentially does the 
same thing. Whenever we need to change something (i.e. push artifacts to 
bintray or just change credentials, or change configuration), we have to 
update all 12 repositories.

    * Doing integration testing between components. This is 'solved', 
but there is a set of fragile scripts maintained to get it working.

    * Different build systems for different components. This is kind of 
a feature of travis which assumes each repo contains only code in 1 
programming language, but it has resulted in 3 (make, gradle, maven) 
different ways to build components.

    * Keeping track of all repositories 'in use' in release scripts. To 
release, we need to tag all 12 git repositories. The list of 
repositories needs to be maintained somewhere and is currently hardcoded 
in the scripts.

    * It is confusing for new developers where to find the source code, 
which repo to look at, where to file issues etc.

    * It is sometimes confusing for us working on EnMasse already to 
file issues for the correct component, and this again makes it harder to 
keep track of what development work is being done.

The current repositories we have (that are released together) are:

    * enmasse            - openshift/k8s templates + documentation
    * admin              - address-controller + configserv + 
queue-scheduler + common lib
    * ragent             - router agent
    * subserv            - subscription service
    * artemis-image      - artemis docker file + plugins + shutdown hooks
    * router-image       - router docker file + configuration
    * router-metrics     - router metrics collector + docker file
    * routilities        - console
    * topic-forwarder    - forwarding of messages between brokers in a 
cluster
    * mqtt-gateway       - MQTT gateway
    * mqtt-lwt           - MQTT last will and testament service
    * (amqp-kafka-bridge - AMQP-Kafka bridge) - ignored for the rest of 
this post

 From reading the above you can probably feel my desire to merge these 
repositories into fewer. I'm proposing a few alternatives where the 
repositories are named without taking into concern the CI system or 
programming language used.

# By deployment

    * enmasse - templates and documentation
    * admin   - admin, routilities, router agent
    * router  - router-metrics, router-image
    * broker  - artemis-image, topic-forwarder, subserv
    * mqtt    - mqtt-gateway, mqtt-lwt

The repositories in this list each contain components that are deployed 
together. Building and testing changes to code each of these 
repositories makes sense I think, and changes to each of them _should_ 
have minimal impact of the other components. On the other hand, the way 
we deploy components can change over time, so we might have to move 
components around in the future, which I don't like.

# By address space types

    * enmasse            - templates, documentation
    * enmasse-common     - address-controller, configserv, (console?)
    * enmasse-standard   - router-image, queue-scheduler, artemis-image, 
routilities, ragent, router-metrics, topic-forwarder, subserv, 
mqtt-gateway, mqtt-lwt

This structure groups the repositories into pieces that match the 
address space types. The argument is that we could release the 
components of each address space type individually. I would, however, 
argue that it's not about the need to release them independently, but 
for the enmasse-common components to work with multiple versions of 
enmasse-standard. I don't think we need to release them independently to 
guarantee that.

# Single repo

     * enmasse

This is my personal favorite. Todays repos would be modules within the 
repo. If we find that some of these modules needs to be released 
independently, we can move them into separate repos at that point. Some 
work is required for travis to work, but should be doable. I also think 
that using a common build system that supports incremental builds would 
be valuable here.

There are 2 common issues with this approach: build times and merge 
conflicts. Build times can be addressed by incremental builds. Merge 
conflicts can be avoided by having a proper structure of the components 
within the repo. It also warrants strong guidelines on having clear 
boundaries between components and a review process that ensures that the 
'right thing' is done.

All in all I think having a simple way to build and test the whole 
project is a key feature here.

-- 
Ulf