Being a site reliability engineer (SRE) isn’t easy. Andrew Widdowson explained that “it’s like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100 mph.”
Known as the “automators,” SREs are often asked to observe application environments and manage incidents… at all hours of the day. Because everyone knows, when your app is down, so is your business.
The SRE’s job is to secure a flawless user experience. To deliver site reliability, SREs bridge dev and ops, ensuring new releases improve the product, rather than breaking it.
The Challenge: Alarm Storms Conspiring Against SRE Skills
The trouble with monitoring application environments is that there are hundreds of thousands of monitoring data points. How do you prioritize which data points are useful, and which can be ignored? Alarm storms aren’t helpful. They prompt panic, instead of resolution.
When a crucial incident does occur, how do you quickly mitigate it? The common SRE approach is to spend a ton of time and energy manually sifting through data—often at the expense of other initiatives, or worse, personal time (for example, responding to the dinner-time incident alert).
What if you could get to that Aha! moment faster? What if instead of the typical hair-on-fire response, you had a trusted guide that could quickly lead you to the source of the incident?
AIOps Solutions: The SRE’s Trusted Guide
What if you could empower SREs with the insights needed to drive improvements? What if instead of the typical war rooms and on-call burn out, SREs had a trusted guide to quickly fix problems?
Today, AIOps solutions augment SRE skills by automating incident response. These solutions leverage AI, automation, and domain expertise to help your SRE teams prevent alert fatigue. These solutions can triage alerting rules continuously, using a combination of notification rules, process changes, dashboards, and machine learning. They can proactively monitor the SRE four golden signals and measure what really matters for customer experience.
Kieran Taylor has 20 years of high-tech product marketing experience with a focus on application performance management, AIOps, and DevOps. He is author of DevOps for Digital Leaders and is Head of Marketing for Broadcom’s Enterprise Software Division leading go to market activities across that portfolio. Prior he led product marketing teams at Adobe, Akamai, DataPower/IBM and Nortel Networks. His career began as an editor of high-tech publications at Mc-Graw Hill.
Connect with the author
More About AIOps
To learn more about how Broadcom is leveraging AI and machine learning in its AIOps solutions, visit Broadcom.