2023-05-24
Maniple: a subdivision of a Roman legion. From manipulus: bundle, handful.
Discussions of system design often focus on one of two architectures:
Trad Monolith - one service, handling entire requests vertically via function calls.
Modern Classic Microservices - many services, running on independent machines (or VMs or Pods), talking over the network (usually REST or some message bus).
This neat lumping together of ideas (think left-wing/right-wing) isn't particularly helpful when it comes to solving our problems effectively - we're free to pick and choose ideas from each approach.
With that in mind, this document aims to decompose these approaches into (somewhat) orthogonal:
Aims - what are the actual development/production aims we're trying to achieve?
Methods - How to structure the code? How should components of the system communicate? etc.
It then proposes a third approach - Manipleservices - with the target of maximising our aims, in commonly encountered software environments, given our available methods. There is an accompanying skeleton Python + CircleCI repo - variations on the ideas in this repo can be applied to a wide variety of codebases. The main takeaway from this document is just a restatement of YAGNI:
Start your architecture as simply as possible. As your system grows, work out what aims you have, and pick methods that best reach those aims. Don't choose methods based on ideologies, that solve problems you may never have, often at significant cost.
To be more generic, service-like units of the system will be referred to as "components" instead of "services".
Regardless of the methods we choose, we tend to hold the same aims. Some of them end up being conflicting in reality, and their costs/benefits need to be weighed up against one another.
Reliability - The system should keep working for the end user.
Database
Isolation - Frequent, difficult-to-avoid incidents are often caused by schema migrations, query planner changes etc. We would like these to be isolated.
Performance (assuming a conventional db) - After we've reached the limits of vertical scaling, we would like to be able to split the data to achieve some level of horizontal scaling. For most companies, application-code performance is irrelevant as apps scale easily and they're cheap to run.
Consistency - ACID-ier the better.
Performance - The system should run quickly for the end user, without being too pricey for the company running it.
CI - The CI should run quickly, both running tests and deploying components.
Developer Ergonomics - Developers would like to be able to CMD-click things, they'd like their tests to run fast, they'd like to have confidence their one-line change doesn't break something miles away, they'd like well-typed data to work with.
Testing - Testing any slice of the system - from unit tests through to cross component tests - should be easy.
Organisational - Management want to have specific teams responsible for specific bits of the code, both during development and running in prod.
Debugging - It should be easy to isolate the cause of production problems and recreate them locally. If something starts breaking, it should be easy to say - "change X caused this to start breaking". It would be nice to be able to run two versions of a component in production to see which one doesn't work.
Splitting into more small components (as opposed to fewer large components) simply magnifies the effects of the other methods below.
Monoliths tend to very loosely specify "this code depends on this other code", often via language-level imports. In the wild, microservices also tend to loosely specify dependencies by enumerating service URLs in some config file.
An explicit dependency graph has many advantages, it can be achieved with:
Language-level packages (eg. Installable Python packages + lists of requirements).
File-level build system (eg. Bazel).
An explicit dependency graph enables:
Quicker CI - only build/test/deploy things that changed/that's dependencies changed.
More comprehensive testing - we know if we change component A, we should run the integration tests covering {A, B, C}.
Better developer ergonomics - tests run quicker, the section of system to bear in mind while working on a task is reduced.
Note: the aims achieved by using this method are mostly independent from those achieved using the other methods.
The most common ways components communicate are:
Using function calls means it's difficult to achieve deployment independence (see below), however there are numerous downsides to picking any of the other options in regards to developer ergonomics:
Typing tends to be rubbish without significant investment in wacky (often codegen-based) tooling - see gRPC and friends.
CMD-clickability goes out the window, back to grep.
There are also various performance costs to constantly serialising and flinging data over the wire.
Using message buses introduces further deployment independence - if one service is down, the caller can still give it work to do later. However there are often large debugging costs - where did that message originally come from?
The ability to deploy different components to different types of machines, and also to scale parts of the system independently we refer to as deployment independence. Deployment independence often comes together with version independence (see below), but it's possible to have one without the other.
Having things deployed separately can help organisationally, for example, different teams can be billed according to their resource usage. Occasionally, different types of workers might be more suitable for different types of workloads, for example, there might be some performance reason to serve user requests by some magic cloud worker near to the user.
The case that deployment independence in and of itself enables scaling is somewhat overstated.
Consider an inbound request that hits component A
, this in turn talks to component B
.
If B
takes 98% of the CPU, there's a temptation to make B
a separately deployed component,
this means you can scale up the number of machines accordingly right? This only acheives anything meaningful where the work done by B
is parallelisable for each original call to A
, if this is not the case, you may as well deploy
A+B
on every machine and scale them together.
This is the ability to have different versions of components running in prod together.
An important use case is the ability to eg: deploy v1.3.7 of your payments service alongside v1.3.6, directing 10% of traffic to the former and 90% of traffic to the latter. This should lead to increased reliability as you can test new software for a subset of users and automatically roll back if some threshold of errors is crossed.If you wish to test pre-prod, there are significant testing tooling problems to overcome - are you going to test against the cross-product of possible interacting services? Some small developer ergonomics friction is introduced - if I CMD-click, do I hit the version of the function that is deployed and causing bugs? Similar pain is introduced whilst debugging. Also, there is often significant tooling overhead introduced implementing versioning and deploying versions of packages (see the example below).
Choosing many-repos vs monorepo mostly impacts developer ergonomics and CI.
One downside of many-repos is duplication of component level boilerplate (think CircleCI config etc). The flipside with monorepos is that tooling like Github/CircleCI can get a bit overwhelming, it should be possible to overcome this with tooling though.
The bigger downside with many-repos is that it induces code rot. In a monorepo, if there is a small cleanup to do with a bit of library code, you:
In theory, the decision to have one/many-repos should be independent of version independence, in practice, one tends to follow the other.
Aside: It should at least be at least feasible to not have one follow the other, consider these scenarios:It's also possible to imagine some funky setup with one master repo (that's maybe in charge of deployment and integration testing) that contains one Git submodule for each component.
Many-repos where one repo depends always on the HEAD of another.
A single repo, where one Python package depends on another package at a different commit within the same repo.
Choosing to have a mix of languages impacts the feasibility of various other methods:
Communication - no more language-level function calls.
(Aside: How difficult to implement is eg. a Python call to a Node process to server-side render some React be?)
Dependency Specification - no more utilising language-level package tooling.
In practice, there will often be eg. Python and TypeScript living alongside each other. It might be advantageous to mix and match different kinds of components and methods in this case.
Sometimes a component has many databases (maybe a separate time series db for a particular domain). The consensus is that the opposite - having many components sharing one database - is a no-go, in this case the coupling is such that you only truly have one component.
More databases means:
More isolation.
The potential for better performance.
The potential for worse performance by being forced into doing cross-database JOINs at the application level.
Worse consistency - cross database transactions are hard to do.
Worse debugging, the number of things you can JOIN across to debug is reduced. This is somewhat surmountable with data-lake-y things.
Example Python + CircleCI repo.
Manipleservices is a set of methods picked to maximise our aims within commonly seen, medium sized codebases (that is to say, almost all codebases). The following table summarises the choices made, comparing them to the two dominating approaches:
Method | Trad Monolith | Modern Classic Microservices | Manipleservices |
---|---|---|---|
Splitting Into Components | One component | Many components | Handful of components |
Dependency Specification | Loose via imports | Not really | Explicit at the language's package level (eg. in pyproject.toml s) |
Communication | Function calls | REST | Mostly function calls |
Deployment Independence | None | Loads | Nope |
Version Independence | Nope | Yup | Mostly not |
Separate Repos | Nope | Yup | Nope |
Language Independence | Not really | Yup | Mostly not |
Database Independence | One DB | One DB per component | One or many DBs per component |
Manipleservices adopt further deployment independence/communication methods only when there are concrete reasons to do so.
Why did we pick these methods?
Splitting Into Components - We pick a handful - it's all in the name, a maniple.
Dependency Specification - We're explicit so we can always test/deploy everything that we need to, but nothing more.
Communication - Until there's some RPC library with excellent typing and good CMD-clickability, function calls are a vastly superior dev experience. The benefits from sending things over wire aren't big enough for most medium codebases.
Deployment Independence - The benefits of doing this are only really present with very large codebases/organisations. It's worth restating, that this is largely orthogonal to version independence.
Version Independence - This might be the first thing you'd change when looking from a reliability perspective, being able to A-B test core features is good. However, until you actually see incidents where a root fix would have been to do this, the complexity introduced in testing and developer tooling probably isn't worth it.
Separate Repos - The Github (or equivalent) tooling needs to improve for monorepos, getting the CI right is tricky, but code enters long term decline in many-repos.
Language Independence - There don't seem to be many projects that try get cross-language typing/in process communication working well. For this reason, combined with the choice of function-calls-by-default, we try keep the number of languages in use low.
Database Independence - A handful of DBs makes it largely possible to avoid cross-component N+1 patterns, while still letting us tune specific databases for specific performance characteristics.
Components are installable packages. (See also You Want Modules, Not Microservices).
A dependency tree of the entire system is encoded in the pyproject.toml
files via -e
installs.
CI builds continuation jobs based on which component(s) change and their dependencies.
We only call functions from <other-package>.api
, this is by convention, but should probably be linted (and linted such that only "dumb data" can cross the interface).
Note that we don't import any types from <other-package>
, we make our own types.
When we call <other-package>.api.f(a)
, a
only has to conform to a Protocol
, not a nominal type.
In the return type from <other-package>.api(...) ->
, we can just choose which bits we need, we don't have to conform to the whole return type.
mypy
will check that everything lines up.
The aim of the above, is that if we for some reason wanted to switch inter-package communication from function calls to eg. REST, it should be trivial.
In .circleci/continue.py
, changed_packages(...)
could be any function that returns a list of packages that you think have changed. Ditto with add_dependant_packages(...)
- this would normally add any packages that depend on the ones that have changed.
As well as packages that are "service-like", you'd have eg: packages/shop_integration_tests
that might depend on account
and shop
and run tests that depend on both of them.
To complement the above, we might want that by convention, eg: shop
would monkeypatch any methods of account.api
that it uses during testing.
├── README.md └── packages ├── account │ ├── pyproject.toml │ ├── src │ │ └── account │ │ ├── __init__.py │ │ ├── api │ │ │ ├── get.py │ │ │ └── post.py │ │ ├── db.py │ │ ├── py.typed │ │ └── users.py │ └── tests │ └── test_user.py └── shop ├── pyproject.toml ├── src │ └── shop │ ├── __init__.py │ ├── api │ │ ├── get.py │ │ └── post.py │ ├── basket.py │ ├── client │ │ └── account.py │ └── py.typed └── tests └── test_basket.py