Site Reliability Engineering (SRE) has been gaining popularity recently as a way to help improve system reliability and establish a prescriptive approach to implementing DevOps.
SRE teams use techniques such as service level objectives (SLOs) and error budgets to quantify the risk tolerance for systems and services, as well as to balance the needs of velocity and system stability and reliability.
Similarly, testers, and software (development) engineers in test (SDETs or SETs) specifically, play a key role in balancing the needs of velocity with overall system quality. We feel that these approaches can be synergized through better collaboration between developers, testers, SDETs, and SREs and by leveraging each other’s practices.
In this blog we will explore the synergies between SREs and SDETs and how these staff members can work with development teams to balance the needs of velocity and quality.
Error budgets allow SREs to balance between the needs of velocity and stability. As long as there is sufficient room in the error budget, teams prioritize new feature development and frequent deployments. However, as the error budget is exhausted, teams slow down (or stop) new feature development and deployment, and focus more on system hardening and testing.
The error budget approach used by SREs is analogous to how testers use overall application quality and release risk to modulate velocity. Since reliability is only one of many overall quality metrics, the error budget approach is in fact a more specific approach to modulating velocity based on quality. Therefore, this approach can (and should) be synergized with overall QA modulation.
Before we discuss the unified approach, let’s first discuss the synergies between the roles of SREs and testers—specifically SDETs. Both have their roots in software development, and therefore share much in common.
Some of the key points of commonality include the following:
Testers and QA professionals use a variety of techniques and measures to assess release quality and risk. These include things like code quality, batch size, functional and non-functional requirements coverage (through tests), defect detection and removal, user and customer experience, compliance, supportability, technical debt, and so on.
Various approaches exist to quantify release risk based on the measures from the above techniques. Organizations make business decisions to proceed with releases despite risk. However, deficiencies in each of these measures add up to the quality debt of an application. As quality debt increases, the risk of releasing software progressively increases. At some point, the risk threshold is crossed, and releases are slowed or halted to allow time for remediation or hardening.
Clearly, this is analogous to how SREs use error budgets. Therefore, it makes sense to use them in a combined, synergistic manner.
In this unified approach, the velocity is modulated by a combination of error budget and release risk. This provides a more holistic view of balancing velocity and quality and essentially subsumes reliability as a measure of overall quality.
Hopefully, this article has provided readers with some insight into the synergies between SREs and testers and how an integrated approach can be used for modulating velocity. We don’t quite have well-defined models for “release risk budgets;” however, we can define those along the lines of SRE error budgets.
In addition, if you’re interested in learning more on the topic of SREs and development teams, see my post on mapping SRE functions to the Scaled Agile Framework.
Shamim is a thought leader in DevOps, Continuous Delivery, Continuous Testing and Application Life-cycle Management (ALM). He has more than 15 years of experience in large-scale application design and development, software product development and R&D, application quality assurance and testing, organizational quality management, IT consulting, and practice management. Shamim is currently the CTO for DevOps business unit at Broadcom, where he is responsible for innovating DevOps solutions using Broadcom's industry leading technologies.
bizops.com is sponsored by Broadcom, a leading provider of solutions that empower teams to maximize the value of BizOps approaches.