Microservices at Scale: The Rise and Importance of Contract Testing

    Microservices and Risk

    Microservices are trendy. Everyone’s using them, but are they just a passing fad, or are they here to stay?

    I believe they’re here to stay. Why? Well, one of the most compelling reasons is that they change the nature of risk in software releases. Plus, they do so in a way that is compatible with the very high speed of development and release that is required for any business that revolves around software—which is pretty much any business that plans on surviving the age of digital transformation.

    Even at relatively small scale (think tens or hundreds of services), microservices architectures and agile development practices contain risk in any given release. Microservices help ensure that components are small, simple, well understood, and, though their encapsulation behind interfaces or contracts, are isolated from the greater ecosystem.

    In theory, the practice of working with small units of change made to small units of well understood and strongly encapsulated code means two things:

    • The change itself can be easily and quickly acceptance tested.
    • System testing is unnecessary for this (or indeed any) release, because the risk of software change is fully contained in the component.

    The first of these statements is usually pretty much true. The second statement bears further analysis though. What do we mean by system testing here? Well, there are these options:

    • E2E functional smoke test—happy paths, most important functions are not broken.
    • Full functional regression tests—happy and negative paths, nothing is broken.
    • Component interoperability robustness tests—components are well behaved citizens of the service ecosystem—they don’t break their consumers when they are upgraded, they handle properly all the ways in which the components they depend on may respond to their requests, and they are resilient to those dependencies changing in unpredictable and badly behaved ways, failing gracefully if and when they need to fail.

    The first piece can and should be done in the CD pipeline before, during, or after deployment. And they can be done continuously in the form of API monitoring or shift-right testing.

    The second piece is an effort that is really made unnecessary by strong functional encapsulation. It is necessary to run full regression, including negative tests at the component level and this should be done in a fully automated way in CI on every PR. But it is not necessary to do this level of testing at the service mesh level, downstream of component change acceptance. The responsibility for this can and should be fully left-shifted.

    The third piece is the interesting one. I call it component interoperability robustness testing, because that is what it is. Contract testing is starting to arise for this purpose.

    What is Contract Testing?

    The intent of contract testing is to test that a release of a component does not break the contract between the component and its consumers, that is, it is safe to release into the mesh. I go a little further with this and add that it should also be resilient to its downstream dependencies changing the contract on them without notice. For example, it should not assume that everyone else is as well behaved as I am. (If you learn nothing else from driving a car in California, it should be this.)

    Also, you aren’t really testing the contract. What you are testing is that the component does not break the contract and that the component is resilient if it encounters fellow citizens on the mesh that are less law-abiding with regards to the contracts they have with you, much in the same way that civil society as a whole is resilient to occasional acts of criminality in its midst. Of course, components can change the contract that they implement, but well-behaved components do it in a way that ensures backward compatibility for all consumers that may not yet have been upgraded to understand the new terms of the updated contract.

    What Does Contract Testing Test For?

    These tests verify that a component:

    • Does not break contract with its consumers. If the contract evolved, that’s fine, but it must not break consumers who are unaware of the change.
    • Fully and completely handles all possible contractually defined responses that may come back from services that it sends requests to.
    • Does not break if its downstream dependencies change their contract, regardless of whether they do it a well-behaved way or not. It is either resilient to unexpected responses or fails gracefully. It does not roll over and stick its feet in the air.

    The question then is how and when to do this testing. You have three choices for this:

    1. Left-shifted component interoperability testing. The responsibility for ensuring that components released into the service mesh are well behaved with regards to their interoperability with other components rests with the agile team producing the component.
    2. Traditional system and integration testing. The responsibility for ensuring the application or service mesh as a whole is working lives with a team downstream of the component-producing agile team.
    3. Make like an ostrich. Pretend there is no problem and don’t make anyone own this.

    Option 3 is particularly popular (but ill-advised). Option 2 is possible at low scale, as it was with monolith-based apps. But at high scale, it is simply impossible. At high scale, you potentially have thousands or tens of thousands of services in the mesh constantly changing. You cannot test the interoperability of that many things that won’t stand still long enough to be tested.

    Option 1 is the only rational approach that scales.

    A Closer Look at Contract Testing

    Let’s say I have a simple mesh of services that interoperate with each other in the way shown below. We’ll concentrate on A, C, and D for the discussion, but know that A is not the only consumer of C’s contract, and D is not C’s only downstream dependency.

    Now some of the discussion around contract testing, is line-centric—trying to test the lines on this picture. In my view, this isn’t the right approach because you don’t actually release lines. You release boxes. So you really need to test boxes and the fact that they are well-behaved with regards to the lines that are relevant to them. Hence I take a box-centric approach and lens to this.


    Now, A, C, and D are owned and maintained by three separate teams, helpfully named team A, team C, and team D. They are good Agile teams that release new versions of their little components at a frequency that would have been unheard of in the bygone age of the monolith. However, some teams release more frequently than others. They don’t release on the same dates or frequency. Team A releases every two weeks religiously. Team C releases every week equally religiously and is threatening a religious war with team A. Team D releases whenever it feels it has something ready to be released, which can be several times a day at times, and then nothing for a month. You never know with them. 


    Their release trains look something like the diagram above. The ovals represent releases. The upward arrows represent testing of their upstream interoperation with their consumers (verifying that they are not breaking their contract). The downwards arrows represent the testing of their downstream interoperation with their dependencies (validating that they handle all the conditions that may arise under the relevant contracts fully, and that they are resilient to those dependencies breaking contract).

    The upward arrows are API tests. Team C needs API tests that fully exercise the contract that component C implements and checks that it is implementing them properly. This is not just happy path testing. This includes negative testing of course. Part of the contract is how C handles badly formed requests of various natures. Further, if team C knows how its consumers (A and B) actually call it, that is, which bits of the interface it exposes are actually used by its consumers, then it can clearly prioritize those use cases above others that are less or never used. If they don’t know this, they have to fully regression test the interface with API tests.

    The downward arrows are unit tests wrapped around virtual services. They have to be virtual services, not real services. Why? Because real services are hard enough to manipulate into providing all of their possible responses based on the contract, and they definitely don’t misbehave on demand in ways that violate contract. To really test downstream interoperability robustness you must be able to simulate your downstream dependencies acting in all the ways defined in the contract, and some that are outside of it. So team A needs a virtual service that can pretend to be component C and be manipulated to respond in whatever way is needed for the interoperability tests they need to run.

    Getting Smart About Contract Testing

    The observant will note that for any consumer-producer pair in the mesh, let’s stick with the A-C pair for now, the transactions that flow between them define the “de facto contract.” If you know those, you can use them to generate both:

    The API tests for C to use to test its upstream interoperation whenever the team releases a new version of C.

    The virtual services for A to use to test its handling of the contract with C whenever a new version of A is released.


    In addition, as long as all these teams do this, the relevant contracts will be continuously tested from both ends, across time.

    With a shared transaction repository, populated by recording the interactions in test or production, or from API specs, the tests for the up arrows and the virtual services for the down arrows, can be automatically generated.

    Scale is the Kicker

    For any given service pair, this is all pretty cool. But if you take it to the real world, where the service dependency graph is orders of magnitude more complex than this example and ever changing, it’s not a case of being cool—it’s basic hygiene. It is a critical competency to have if you want to effectively manage risk in a scaled microservices-based ecosystem.

    And if you want to do it efficiently, consistently, and collaboratively across many engineering teams in the organization, you will absolutely need tooling to manage the transactions and contracts, and automatically generate component interoperability tests and virtual services to support this type of testing.