As the novel coronavirus (COVID-19) pandemic spread across the globe in the past couple of months, we’ve been learning a lot about the disease and attempts to control its impact.
One of the subjects that almost always came up in any discussion on COVID-19, especially in the USA, where I live, is “testing”. This had me intrigued as a DevOps (and software testing) professional. While I am not an expert in healthcare, and while COVID-19 testing and its implications are quite different than that for software testing, I quickly began to realize there are some similarities between the two.
This blog attempts to capture some of the parallels I have noticed between the two testing domains, and what lessons we can learn from each that may be applicable to the other.
The landscape around COVID-19 (and its testing) is changing rapidly on a day-to-day basis. This article attempts to capture the facts that reflect the current reality. Given the pace of change, I recognize that some of the COVID-related facts in this article may become irrelevant in the near future. However, my hope is that some of the principles and lessons learned may still be applicable.
I will organize my observations and thoughts into the following subsections.
In almost all discussions on COVID-19, we heard that testing is of paramount importance to understand where we stand in terms of quality of public health. The testing data (and the # of positive cases identified) also are vital inputs into planning, containment, and remediation processes. I heard experts say that they need to do more and more COVID-19 testing to improve their understanding of the situation and plan accordingly. The following screen shot from the telecast of New York Governor Andrew Cuomo giving an update crisply summarizes how important testing is.
The same is true for software of course. Testing is the leading process for identifying defects and flaws in the software. In an age of agile delivery, there is a need to balance speed with quality—especially given the significant impact quality can have on business outcomes.
While it is clear why testing is important for both domains, it is important for us to distinguish the differences in the implications of testing. In both cases, the importance of testing is to foster quality as the outcome, either in terms of the quality of public health or the quality of software.
In the case of COVID-19 testing, that outcome (quality of public health) is the minimization of the number of infections and most importantly the number of fatalities. The way to do that is to contain the spread of the disease. Since the infection spreads so fast, it is imperative to identify infected subjects as early as possible so that containment measures can be put in place.
In the case of software testing, similarly, the outcome (quality of software) is the minimization of the number of serious defects leaked to production and improve the reliability and UX/CX of the software/service. And we all know that in software testing early detection and containment of defects is very important—since the cost of remediation keeps increasing the later the defect is detected.
In both cases, then, we see that early detection and containment are important. However, the role of quality assurance—of which testing is an integral part—but to assure overall quality. We need to evaluate our testing efforts in the context of this overall bigger picture.
We all heard, especially in the USA, that there were a lot of challenges with COVID testing. Not enough tests were available early on (see Figure below), the process was cumbersome (and uncomfortable for the subjects being tested), and results took a long time (initially up to a week or more). We also heard that many tests were unreliable and returned false positives.
All of these factors had significant implications on the quality of public health given the fact that the virus propagated rapidly, and every impediment in testing posed downstream challenges in the containment processes. To quote an expert from this article: “The testing fiasco was the original sin of America’s pandemic failure, the single flaw that undermined every other countermeasure.”
Despite the fact that software testing is an old discipline, surprisingly, we find the same challenges. Until the recent evolution of continuous testing, software testing has tended to lag behind development. Further, tests take a lot of time to develop and maintain, a significant amount of the testing is still manual (and takes a long time to execute, which has an impact on the feedback time to development), and tests results are sometimes flaky (that is, they do not produce consistent results).
With continuous testing came techniques like shift-left testing and test driven development (TDD), which enabled tests to become available before (or as soon as) software was developed, so that there was no lag in the availability of tests.
When COVID-19 tests were finally available, the initial tests took up to a week to deliver results. Very soon, pharmaceutical companies started to create better, automated tests that could deliver results faster. As of the time of this writing, new tests produced test results in about 5 to 15 minutes. This agility in testing made it easier to plan for counter measures to tackle the aggressive spread of the virus.
Not surprisingly, automation in software testing has been key in enabling agile delivery. It enables fast feedback time to development teams (for example after a commit or build) and enables them to take action quickly to address the problem. The longer the feedback time, the great the loss of developer productivity and higher the cost of fixing the problem.
Even with faster, automated COVID-19 tests, we saw that that there were other impediments. The number of testing sites were few, tests needed to be sent to a remote lab, there were long lines (which negated the purpose of “social distancing”) and limited supplies (resulting in people leaving untested). We also heard about challenges related to test results reporting as more and more private facilities started to offer the tests.
We see the same pattern with software testing. Test automation simply reduces the time it takes to execute the tests. However, to do efficient testing, we also need to create proper test environments correctly and quickly, populate the environments with appropriate test data, and remove other impediments, such as dependencies on other applications that were not readily available for testing (using techniques like service virtualization). Similarly, it is just as important to automate the process to provision the tests and capture the results.
As a result of the impediments noted above, officials started to set up more convenient and accessible locations for COVID-19 testing, such as drive-thru testing sites, and even mobile testing facilities. Some testing kits are self-service, meaning health-care workers don’t need to intervene.
We see the same scenarios apply to software testing as well. In traditional testing models, testing COEs existed to develop and execute tests. This acted as a bottleneck from a resource and agility perspective. Testing jobs were queued (just like the queue for COVID-19 tests), and feedback time was slow. In modern continuous testing, testing is democratized. Developers and testers work collaboratively on testing and test assets. Regardless of who builds them, tests are available to be run at any time as part of the CI/CD process. For example, automated tests developed by testers can be executed by developers or by the CI engine as part of the build verification process.
Initially, COVID-19 testing involved analyzing throat swabs or sputum samples. This works only for patients who are currently infected (i.e. “active infection”) with the virus. Later testing evolved to testing for specific antibodies in the blood, which not only identified subjects that are currently infected, but also identified subjects that have already had (and maybe subsequently recovered from) the virus. The latter techniques obviously provide a better picture of the total population infected by the virus. These approaches also provides additional benefits, for example, they’re now being extended to investigate the development of COVID-19 vaccines. This is helping to step-up efforts in disease prevention and in bringing the pandemic under control.
Similarly, software testing has also evolved from traditional approaches of running tests on software to using novel AI and machine learning techniques to identify if defects are hiding in the software. The latter techniques pick up possible defects by analyzing the other types of data (e.g. application logs, incident logs) to identify defects without running any actual tests. Similarly AIOps techniques use AI and machine learning to automate not only to reduce the mean time to detect, but also provide extensive root cause analysis information that helps us prevent the problem from re-occurring.
Since many people infected with the virus are asymptomatic, it is considered ideal to be able to test everybody. However, this is impractical due to all the testing challenges (as well as cost considerations) mentioned above. Hence techniques are being evolved to better monitor both the spread of the disease overall (for example by usingweb-connected thermometers) as well as to monitor high risk subjects. We also see the use of data from cellphones to track the spread of the disease. This allows health officials to take both proactive and responsive actions without having to test the population extensively.
We see a very similar approach in the “shift-right” of software testing as well. Monitoring of applications in production provides insights into application health in a manner that may not be possible (or practical) in test environments. In addition, synthetic monitors may be created by developers and testers and deployed into production to provide fast feedback on the health of the application so that remediation actions may be taken.
Some countries in Europe have implemented very low-level tests per capita. The approach they seem to be taking is the development of “herd immunity” which happens when a significant percentage of the population becomes immune to a disease (after recovery) and that slows or stops the spread of the disease.
While this approach is questionable for COVID-19, without a vaccine in place at the present time, this approach to building resilience is in a fact popular technique in software engineering. Building fault-tolerant, highly resilient software may in fact reduce the need to do extensive testing up-front. As a matter of fact, I noticed a recent Gartner report on approaches for developing AI-enabled, resilient and bug-resistant applications.
We heard from health officials that persons most at risk from COVID-19 complications (and even death) are seniors (above the age of 60) or those with underlying health conditions (see figure below). We sadly observed this in reality as COVID-19 ravaged a senior nursing home in Seattle, WA, causing more than 19 deaths in a short span of time.
In software testing, we can immediately relate this to the well-defined concept of risk-based testing. Since testing and test resources are scarce, it is also important to optimize the testing we do using techniques such as model-based testing.
One of the key initiatives being advocated in the prevention or containment of COVID-19 is social distancing, sheltering in place, and quarantining.
This concept, building loosely coupled systems (such as microservices), is also remarkably similar, and offers a way to improve failure (or fault) isolation in software systems. Software sandboxes are also used to promote better fault isolation.
In addition, techniques like mocking and service virtualization allow better, unconstrained testing of software systems in isolation.
Other approaches advocated for COVID-19 prevention include washing of hands and use of disinfectants.
In the software domain, we can immediately relate this to using software security scanning and testing tools, anti-virus tools, and enterprise security systems to prevent hacker attacks.
One of the key lessons we learned from COVID-19 containment and prevention efforts is that it is a community effort (figure below). Just because lower age and healthier segments of the population are less at risk from complications does not mean that individuals in these categories shouldn’t cooperate with the broader community isolation efforts.
Similarly, we see in software testing that the best defect prevention happens when different teams work collaboratively—such as developers, testers, product owners, SREs, and so on. This is especially key in agile software development environments, where the pace of change is rapid.
One of the key factors driving early COVID-19 testing and containment is better capacity planning. We were told that we needed to “flatten the curve” so that hospitals and care facilities would not be overwhelmed. We have seen the development of peak healthcare stress models and “surge planning” to address rapidly growing cases.
This is analogous to software performance and stress testing and scalability analysis, which allow us to better prepare for high usage scenarios (for example, helping retail sites prepare for Black Friday). Early reliability testing of software allows us to identify and fix scalability issues, or to plan for accommodating the required scalability (for example by provisioning additional computing or storage resources).
In the context of COVID-19, we saw a great deal of emphasis on general hygiene (such as periodic washing of hands and wearing of masks).
This is analogous as well to engineering best practices for maintaining software hygiene and health. We have already discussed that it is not practical to test everything and software quality is everybody’s job, not just testers’.
All of the COVID-19 testing challenges notwithstanding, we mostly saw that the personally identifiable information (PII) of specific test subjects, was protected, unless they chose to reveal the test results on their own. This is not just for regulatory reasons, but also to prevent social ostracization in a sensitive circumstance such as this.
Similarly, in software testing it is of paramount importance to protect access to PII data during testing, especially when test data is derived from production. Test data management solutions enable teams to both identify sensitive PII data and mask that data in test and development environments.
Sophisticated analytics models such as this one, have been developed for forecasting the spread of the virus, number of fatalities, and the peak occurrence dates, and so on. The availability of so much data (which changes on a daily basis) allows modelers to develop correlations and machine learning algorithms to improve such predictions. These models are absolutely critical for health officials to plan for proactive actions.
We see intelligent software testing also take advantage of such analytics techniques for things like defect/failure prediction. This allows us to plan our testing efforts better and more effectively contain defects.
The blog captures the remarkable similarities between COVID-19 testing and software testing, though they address very different domains. The key similarities for me lie in the management of scarce test resources, test automation for rapid feedback, and the use of test data for response planning.
The situation on COVID-19 testing and containment is very fluid right now and is changing rapidly. Specifically in the USA, we learned that the latest projection for COVID-19 fatalities is more than 200,000, a significantly higher number than previously anticipated. Early in March, US healthcare officials predicted a much lower impact.
The most striking takeaway for me is the mis-assessment of the risk and the apparent lack of risk-based testing and containment.
Risk-based testing is a well-established and proven discipline in software engineering that would appear to be very applicable to the COVID-19 situation.
Since as discussed above, it is not practical to test everyone, it would have been preferable to focus on testing the highest risk subjects (and folks close to them) first, and isolate them accordingly, before testing other sections of the population. These are folks typically in nursing homes and senior care centers. We also learned that people in congested low income neighborhoods in inner cities are also more vulnerable. Rigorously testing and sanitizing the “zone of influence” of the most vulnerable populations may have helped to protect those who are most likely to experience fatalities. This concept is not new. For example, airports use this technique to rigorously test and sanitize the zone of influence near and in vicinity. Sweden is one country that has adopted this approach successfully, without using extensive social distancing.
As we build out more distributed and autonomous systems that cross the digital/cyber-physical boundary (for example smart robots or embedded IoT systems like smart pacemakers), we’re likely to see greater intertwining of inter-domain testing disciplines, for example see here. In such systems, the testing approach is a remarkable fusion of techniques in both disciplines. For example: how do you detect and contain the spread of a “software” virus in pacemakers that are embedded in human subjects that then endanger the health of the wearers?
As I monitor the COVID-19 situation, especially in the USA, I see other striking similarities to other aspects of DevOps and continuous. Maybe that will be the subject a separate bog in the days to come. Till such time, stay safe and healthy, and get yourself tested (and contained) if you feel unwell.
Shamim is a thought leader in DevOps, Continuous Delivery, Continuous Testing and Application Life-cycle Management (ALM). He has more than 15 years of experience in large-scale application design and development, software product development and R&D, application quality assurance and testing, organizational quality management, IT consulting, and practice management. Shamim is currently the CTO for DevOps business unit at Broadcom, where he is responsible for innovating DevOps solutions using Broadcom's industry leading technologies.