Microservices Testing: Keys to Meeting New Test Data Requirements

    Introduction

    As organizations seek to develop modular, maintainable, and deployable software, microservices architectures have emerged as an increasingly popular approach. At the same, testing practices have evolved to align with these architectures as well. However, I find that one key area in testing—test data management (TDM)—has not been adequately addressed in the context of microservices testing. Microservices introduce their own specific nuances to TDM.

    We know that TDM is a significant challenge in application testing more generally. Various surveys have indicated that TDM issues are some of the leading causes of delays in testing and application delivery. Some of these challenges are more easily addressed with microservices architectures, while others are exacerbated.

    In this blog post, we will look at the specific requirements for TDM in microservices testing, and how we can address them.

    TDM for Microservices is Different than for Monolithic Applications

    In order to understand how TDM is different for microservices applications, we need to first consider how data is managed differently in these environments. In monolithic applications, data typically resides in central databases, and is shared between different application components.

    On the other hand, in microservices architectures, data is typically decentralized. Each service is autonomous and has its own data store, which is relevant to its functionality. See figure below.

    Microservices Testing: Keys to Meeting New Test Data Requirements - Image 1

    In addition to being distributed, microservices typically employ a wide variety of data store types, with each customized for the specific needs of the microservice. These data stores can be document based, SQL-based RDBMSs, no-SQL graph databases, columnar databases, and so on.

    The figure below depicts an example of how various types of databases may be employed by different microservices, depending on the context of the specific application. (At first glance, you can see why these approaches are typically referred to as a “spaghetti architecture.”)

    Microservices Testing: Keys to Meeting New Test Data Requirements - Image 2

    Given that data management in microservices environments is very different than approaches used for monolithic applications, it follows that TDM approaches need to be different too.

    The figure below represents a typical TDM setup for monolithic applications. “Gold copies” of test data are created by selecting a subset of data from production, masking it, and provisioning it to testing teams. Test data creation and maintenance can be manual, labor-intensive processes, efforts that can consume hundreds of staff hours. Often, developers manually create fake test data for their unit tests.

    Microservices Testing: Keys to Meeting New Test Data Requirements - Image 3

    This is typically not how TDM for microservices would work. Following are some of the key ways that these efforts differ in microservices environments:

    Greater Emphasis on Synthetic Test Data

    Applications built using microservices-based architectures emphasize shift-left testing, an approach in which services are tested more rigorously early in the lifecycle. For example, a team may use unit, component, and integration tests, following the test pyramid principles. Typically built by developers, these tests are lightweight, they need to run fast, and they need to provide easy access to just-in-time test data. Creating test data synthetically is the best option for these types of scenarios compared to the more traditional TDM approaches outlined above.

    Synthetic test data also allows us to define test data “as-code” so that it can easily be integrated into the continuous integration lifecycle. When developers define new services, there is no production data to leverage, so synthetic TDM capabilities are essential.

    No Direct Access to Data Stores for Test Data Setup

    In monolithic application environments, we typically define test data specifications for subset creation and masking based on our knowledge of the schema of the data store. In microservices environments, however, data store schemas are private, and accessible only by the service in production. Consequently, all data access requests must use an API.

    Compared to the relational data stores typically used in monolithic applications, the use of APIs makes it easier for us to access data from across a much broader variety of test data repositories. The use of APIs to access data also helps to address the test data consistency challenges that come with highly distributed test data stores (see the next section).

    Test Data is Highly Distributed

    In monolithic application environments, gold copies of test data are typically large and generalized, rather than being specific to application components. With microservices testing, we have highly distributed test data stores that are smaller and associated with specific services. These data stores would only be created for the specific services being tested. In addition, this data is matched specifically to the tests being executed against the service.

    All of the challenges that come with synchronizing such test data across multiple services, for example, as in the case of an end-to-end integration test, are addressed in the same way that we address data consistency across services generally.

    Test Data for Microservices is Closely Tied to Service Virtualization

    Service virtualization is a key capability that is required for testing microservices-based applications. Service virtualization allows us to simulate other services or endpoints that the service under test is dependent upon. Through this approach, microservices testing is not impeded by the lack of availability of the endpoint service. For microservices-based architectures, where we have services that are dependent on each other, virtual services help enable “parallel” development approaches. In this way, multiple interdependent services can be developed in parallel, without creating a deadlock in testing. The richness of a virtual service is dependent upon the richness of the data it can support. For example, a service may be limited by the range of request-response pairs it can support.

    When a service is not yet implemented, a “synthetic” virtual service may be created to stand in its place simply from its API specification (for example, through the use of Swagger) and a set of sample request-response pairs. Request-response pairs are created manually, which is a labor-intensive effort. Further, they will not support realistic behavior of a service. As a result, generating request-response pairs using a synthetic test data generation tool provides a more elegant and scalable solution for creating synthetic virtual services.

    Similarly, a “real” virtual service that is created by recording service behavior will benefit from either synthetic test data or test data obtained from its production instance. More importantly, in a test environment with multiple virtual services (representing different microservice endpoints), it is important that the test is synchronized across all the services, so that an end-to-end test can be run. Tying virtual services to a TDM system—which automates data generation using specific rules—is the best way to ensure such synchronization.

    Summary and Key Takeaways

    In this blog, we have looked at the various ways TDM for microservices is different than that for classic monolithic applications and new approaches required to address these needs. As applications become more distributed with microservices, sophisticated TDM and service virtualization approaches are needed to address the complexities that arise with such architectures.

    As your enterprise continues to use microservices architectures to modernize its legacy applications or build new applications, I would love to hear about your microservices testing experiences and best practices.