Every year presents change, adversity, and lessons to learn—but 2020 took that principle to an extreme. Read on to get some insights for making sure your organization’s 2021 is one that’s characterized by improved IT resilience.
Adversity: The Ultimate Teacher
“There is no education like adversity.”
I have no doubt that in the months and years ahead, there will be massive volumes of books, articles, blog posts, and other write ups dissecting the year that was 2020. If adversity is indeed the educator Disreali made it out to be, it’s a safe bet that many of us will come into 2021 much, much smarter than when we entered 2020.
For those of us working in enterprise IT, the past months have taught some clear lessons. Here are two that jump immediately to mind:
First, it’s that we have to expect the unexpected. While important before, digital agility has emerged as a key to business survival.
Second, it’s that delivering optimized service levels to customers and all users of digital services is only getting more difficult, and more vital.
Heading into 2021, many enterprises will be confronting a number of pressing challenges as they seek to contend with the two imperatives above:
Disconnects still plaguing organizations. While DevOps teams are embracing efforts like objectives and key results (OKRs) and value stream management, IT operations (ITOps) teams are focused on business services, configuration items, and so on. Disaster recovery programs have their established systems, which are run entirely independently of the SREs focused on service level objectives and error budgets. As a recent Gartner report explained, “Blind spots often run rampant between SRE and DevOps teams and traditional IT DR.”1
Point tools. A logical outcome of the siloed organizations that have persisted, siloed tools remain an issue as well. The same Gartner report explained that “Organizations find it difficult to scale their DevOps initiatives because DevOps teams use multiple point solutions. These disparate tools increase complexity due to orchestration, integration and management issues.”2
Siloed visibility, management. Because there are such persistent gaps between groups, senior executives can’t gain an end-to-end perspective on services and IT resilience. This makes it difficult to manage service levels and to prioritize areas that need greater investment. The Gartner report mentioned that “Only 13% of respondents have an end-to-end business-services view of most or all technical underpinnings.”3
As they battle the obstacles outlined above, teams in today’s enterprises are encountering a number of limitations:
Slow recovery. Gaps between silos and a lack of transparency make it difficult, time consuming, and costly to recover from disasters.
Bottlenecks. Manual efforts associated with data aggregation, service monitoring, troubleshooting, and remediation are creating persistent bottlenecks for teams, significantly inhibiting speed and agility, both at the team and the organization level.
Reactive. Existing limitations leave teams reacting to incidents and scrambling to restore services, rather than establishing more proactive approaches.
Steps for Success in 2021
As you enter into 2021, following are some key strategies that can propel your ITOps teams and business:
Institute Chaos Engineering
Chaos engineering embodies a mindset in which, instead of assuming no failures will occur, it is understood that failures will be inevitable. When adopting this approach, teams deliberately introduce failures in their production environments, which drives developers to build in resilience.
Gartner analysts estimate that “By 2025, 60% of I&O leaders will implement chaos engineering to add resilience and velocity improvements to value stream flow, increasing system availability by 10%.”4
Institute IT Resilience Roles
Moving forward, it will be vital to incorporate IT resilience roles that are aligned with SRE initiatives. These new IT resilience roles can help enhance collaboration and alignment between DR and product teams.
Gartner analysts anticipate that, “Through 2025, 20% of enterprises will go beyond SRE by adding IT resilience roles to improve resiliency posture between product teams and traditional DR.”5
When taking on IT resilience roles, staff focus on these critical efforts:
Helping recover from unknown hazards.
Assessing and reporting on the organization’s IT resilience posture.
Reporting on key performance indicators and key risk indicators.
Employing automation, monitoring, and alerting to support these respective efforts.
Focus on Customer-Centric Metrics
Across teams and organizations, staff need to establish a laser focus on customers. Toward that end, it is increasingly vital for teams to focus on KPIs that are aligned with customer value. This will ultimately be a key to success. In their report, Gartner analysts reveal that “Agile and DevOps teams who focus only on technical performance metrics (at the exclusion of customer-value-centric metrics) will fail to align their priorities with the organization.”6
How Automation and AI Can Help
By leveraging AI and automation, teams can be better informed and more effectively interact with other team members and across teams. With these capabilities, teams can begin to get a better understanding of the quality attributes of an incoming release, which can be shared with operations, DR, and SRE teams.
AI and automation can also foster more effective, data-driven collaboration. With these capabilities, teams can better segment issues and establish causality of issues across the continuous integration/continuous delivery (CI/CD) pipeline. For example, based on deeper intelligence, analysts may uncover that 50 unplanned outages were associated with a specific application component, and see that any time changes are introduced in that area, there’s a higher likelihood of incidents to arise. These kinds of insights can be indispensable as teams seek to boost IT resilience.
Serge Lucio is Vice President and General Manager of the Enterprise Software Division at Broadcom (ESD). In this role, he is responsible for the company's BizOps solutions that help organizations scale their digital transformation by fusing business and IT. Mr. Lucio joined the company through the CA Technologies acquisition where he was most recently SVP Product and Strategy for the mainframe business unit. Prior to this role, he held various leadership roles across Product Management, Strategy, and M&A at IBM. Mr. Lucio earned an M.S. in Computer Science from Telecom Nancy, France, and holds several patents in the Software Test Automation space.
1. Gartner, "Predicts 2021: Value Streams Will Define the Future of DevOps," October 5, 2020, ID: G00734377, Analyst(s): Daniel Betts, Chris Saunderson, Ron Blair, Manjunath Bhat, Jim Scheibmeir, Hassan Ennaciri