Gain tighter alignment with business objectives. Crystallize your SRE team’s focus on the efforts that matter most to the organization.
“SRE adoption is most likely to succeed when I&O leaders take three steps. First, they must build reliability into critical products. Then, they must reduce the toil in operating and monitoring those products. Finally, they must iteratively adopt DevOps tools for continuous feedback and improvement.”
SRE models have a legacy of being developed and employed by so-called digital natives like Google or Facebook. As a result, these models tend to assume that large teams of technology experts are available to apply engineering approaches to IT operations and application development. However, that’s not the reality for most of the mainstream enterprises that are now seeking to institute SRE models.
As teams pursue SRE initiatives, the tools in place can offer significant support—or pose a massive detriment. The reality is that many organizations looking to adopt SRE models are either employing loosely connected toolchains or open-source tools developed in house. Some of the largest technology companies have the internal resources to make do-it-yourself or point-tool approaches work. That doesn’t mean it’s the optimal approach for most enterprises. On the contrary, these approaches can represent significant efforts that derail SRE adoption, leaving teams spending too much time, and realizing too little value for their efforts.
To fully capitalize on SRE models, teams need platforms that provide scalable, flexible, and easy-to-use automation. They need these capabilities to be aligned with complex, dynamic enterprise IT environments and rapidly evolving business requirements.
It is vital that the golden signals and SLIs being monitored ultimately track what matters most: the user experience. Having this visibility is essential if SRE teams are to manage their error budgets intelligently. However, in today’s environments, determining how to identify and track the right metrics is more easily said than done.
To effectively adopt SRE models, teams need to establish comprehensive coverage that delivers unified visibility of the entire enterprise ecosystem. That’s true whether development teams are running legacy on-premises technologies, modern services and systems, or a mix of both. The software engineer needs visibility that spans from mobile applications to networks and mainframes.
Today, it’s no longer enough to monitor a monolithic computing stack or a discrete infrastructure element. It’s now about making complex, modern ecosystems observable.
SRE and DevOps share many fundamental themes. In addition, both SRE and DevOps present a similar challenge: How do previously isolated teams begin to work together seamlessly?
This requires a shift in workflows and cultures, and it demands that tools deliver entirely new features. If a development team simply seeks to connect disparate tools, silos will remain.
Ultimately, for SRE models to succeed, teams must have a holistic view of the stack—the frontend, backend, libraries, storage, kernels, and physical machine. The solution is to expand upon the concept of a data lake, and build a “digital river.”
A digital river enables all teams across the software development lifecycle (SDLC) to gain role-specific views into a unified data model. As a result, they can maximize the utility of data in solving problems, gaining insights, and delivering new features.
Using automation to reduce toil is a core tenet of SRE best practices. Through automation, for example, SRE teams can take a software engineering approach to prevent an incident, rather than reacting to an issue after the fact. However, that doesn’t mean teams should work on automation efforts in an ad hoc, one-off fashion.
In many cases, teams have employed limited automation that is based on custom-developed scripts or APIs that are connected to domain-specific tools. These approaches create islands of automation, which presents a couple of challenges: