Intro Aims Methods Splitting Into Components Dependency Specification Communication Deployment Independence Version Independence Separate Repos Language Independence Database Independence Proposal

2023-05-24

Manipleservices

Maniple: a subdivision of a Roman legion. From manipulus: bundle, handful.

Intro

Discussions of system design often focus on one of two architectures:

This neat lumping together of ideas (think left-wing/right-wing) isn't particularly helpful when it comes to solving our problems effectively - we're free to pick and choose ideas from each approach.

With that in mind, this document aims to decompose these approaches into (somewhat) orthogonal:

It then proposes a third approach - Manipleservices - with the target of maximising our aims, in commonly encountered software environments, given our available methods. There is an accompanying skeleton Python + CircleCI repo - variations on the ideas in this repo can be applied to a wide variety of codebases. The main takeaway from this document is just a restatement of YAGNI:

Start your architecture as simply as possible. As your system grows, work out what aims you have, and pick methods that best reach those aims. Don't choose methods based on ideologies, that solve problems you may never have, often at significant cost.

To be more generic, service-like units of the system will be referred to as "components" instead of "services".

Aims

Regardless of the methods we choose, we tend to hold the same aims. Some of them end up being conflicting in reality, and their costs/benefits need to be weighed up against one another.


Methods

Splitting Into Components

Splitting into more small components (as opposed to fewer large components) simply magnifies the effects of the other methods below.

Dependency Specification

Monoliths tend to very loosely specify "this code depends on this other code", often via language-level imports. In the wild, microservices also tend to loosely specify dependencies by enumerating service URLs in some config file.

An explicit dependency graph has many advantages, it can be achieved with:

An explicit dependency graph enables:

Note: the aims achieved by using this method are mostly independent from those achieved using the other methods.

Communication

The most common ways components communicate are:

Using function calls means it's difficult to achieve deployment independence (see below), however there are numerous downsides to picking any of the other options in regards to developer ergonomics:

There are also various performance costs to constantly serialising and flinging data over the wire.

Using message buses introduces further deployment independence - if one service is down, the caller can still give it work to do later. However there are often large debugging costs - where did that message originally come from?

Deployment Independence

The ability to deploy different components to different types of machines, and also to scale parts of the system independently we refer to as deployment independence. Deployment independence often comes together with version independence (see below), but it's possible to have one without the other.

Having things deployed separately can help organisationally, for example, different teams can be billed according to their resource usage. Occasionally, different types of workers might be more suitable for different types of workloads, for example, there might be some performance reason to serve user requests by some magic cloud worker near to the user.

The case that deployment independence in and of itself enables scaling is somewhat overstated. Consider an inbound request that hits component A, this in turn talks to component B. If B takes 98% of the CPU, there's a temptation to make B a separately deployed component, this means you can scale up the number of machines accordingly right? This only acheives anything meaningful where the work done by B is parallelisable for each original call to A, if this is not the case, you may as well deploy A+B on every machine and scale them together.

Version Independence

This is the ability to have different versions of components running in prod together.

An important use case is the ability to eg: deploy v1.3.7 of your payments service alongside v1.3.6, directing 10% of traffic to the former and 90% of traffic to the latter. This should lead to increased reliability as you can test new software for a subset of users and automatically roll back if some threshold of errors is crossed.

If you wish to test pre-prod, there are significant testing tooling problems to overcome - are you going to test against the cross-product of possible interacting services? Some small developer ergonomics friction is introduced - if I CMD-click, do I hit the version of the function that is deployed and causing bugs? Similar pain is introduced whilst debugging. Also, there is often significant tooling overhead introduced implementing versioning and deploying versions of packages (see the example below).

Separate Repos

Choosing many-repos vs monorepo mostly impacts developer ergonomics and CI.

One downside of many-repos is duplication of component level boilerplate (think CircleCI config etc). The flipside with monorepos is that tooling like Github/CircleCI can get a bit overwhelming, it should be possible to overcome this with tooling though.

The bigger downside with many-repos is that it induces code rot. In a monorepo, if there is a small cleanup to do with a bit of library code, you:

  1. Make the fix.
  2. Run linters/tests.
  3. Fix everything downstream.
  4. Repeat steps 2-3.
With many-repos, you:
  1. Make the fix.
  2. Upversion the library's package.
  3. Deploy the package to some repository.
  4. Work out which downstream things depend on the library.
  5. For each downstream thing:
  6. Realise you made a small error in the original fix.
  7. Repeat steps 1-6.
What tends to happen is people either make the fix and ignore steps 2 onwards, thereby making people downstream wary of updating. Or they just don't make the fix. At Google, people making wide-reaching changes are responsible for cleaning up at least 80% of the mess they make - this is the way it should be.

In theory, the decision to have one/many-repos should be independent of version independence, in practice, one tends to follow the other.

Aside: It should at least be at least feasible to not have one follow the other, consider these scenarios: It's also possible to imagine some funky setup with one master repo (that's maybe in charge of deployment and integration testing) that contains one Git submodule for each component.

Language Independence

Choosing to have a mix of languages impacts the feasibility of various other methods:

In practice, there will often be eg. Python and TypeScript living alongside each other. It might be advantageous to mix and match different kinds of components and methods in this case.

Database Independence

Sometimes a component has many databases (maybe a separate time series db for a particular domain). The consensus is that the opposite - having many components sharing one database - is a no-go, in this case the coupling is such that you only truly have one component.

More databases means:

Proposal

Example Python + CircleCI repo.

Manipleservices is a set of methods picked to maximise our aims within commonly seen, medium sized codebases (that is to say, almost all codebases). The following table summarises the choices made, comparing them to the two dominating approaches:

Method Trad Monolith Modern Classic Microservices Manipleservices
Splitting Into Components One component Many components Handful of components
Dependency Specification Loose via imports Not really Explicit at the language's package level (eg. in pyproject.tomls)
Communication Function calls REST Mostly function calls
Deployment Independence None Loads Nope
Version Independence Nope Yup Mostly not
Separate Repos Nope Yup Nope
Language Independence Not really Yup Mostly not
Database Independence One DB One DB per component One or many DBs per component

Manipleservices adopt further deployment independence/communication methods only when there are concrete reasons to do so.

Why did we pick these methods?

Manipleservices in Python

File layout

├── README.md
└── packages
    ├── account
    │   ├── pyproject.toml
    │   ├── src
    │   │   └── account
    │   │       ├── __init__.py
    │   │       ├── api
    │   │       │   ├── get.py
    │   │       │   └── post.py
    │   │       ├── db.py
    │   │       ├── py.typed
    │   │       └── users.py
    │   └── tests
    │       └── test_user.py
    └── shop
        ├── pyproject.toml
        ├── src
        │   └── shop
        │       ├── __init__.py
        │       ├── api
        │       │   ├── get.py
        │       │   └── post.py
        │       ├── basket.py
        │       ├── client
        │       │   └── account.py
        │       └── py.typed
        └── tests
            └── test_basket.py