The very qualities that make artificial intelligence (AI)-driven applications so powerful also makes them temperamental from a performance perspective. This blog post offers key principles to consider as you tackle the challenge of optimizing application resilience in today’s AI-driven environments.
Application Resilience: Tough, Getting Tougher
We’ve always had issues with the behavior of applications in production. Response times lag under peak workloads. Complex dependencies in multi-tier apps cause functions to freeze or fail.
Much of the original driving force behind DevOps, in fact, was to reduce these problems by having our development and operations teams work together to understand how code actually behaves in our real-world production environments. Armed with this insight, we can now address problematic application behaviors—not just by throwing infrastructure at bottlenecks, but by writing better-performing code.
Today, however, we have a new challenge: Ensuring the performance of artificial intelligence (AI) applications in the real world. And DevOps can’t really help us there—algorithmic applications are profoundly temperamental, and their behavior is fundamentally beyond our direct control.
The Upside’s Downside
While the complexity of our conventional applications may cause them to behave in ways we didn’t expect, that behavior is ultimately deterministic. We know when an application calls for data from a database, runs a piece of business logic, or executes a transaction. Its behaviors are, therefore, built directly into its code.
Our algorithmic systems, on the other hand, are non-deterministic. And we want them to be because once we launch them, we want them to learn to make smarter data correlations over time. That’s what makes AI, well, AI.
The upside of this indeterminacy is that we can capture and automatically act on data-driven insights that were never available to us before. The downside is that the indeterminacy of their inner workings can make them very temperamental in production. As they mix and match ever-expanding datasets in new and better ways, they can consume more processor cycles, more memory, more input/output, and more network bandwidth.
In other words, if we want to reap the tremendous business benefits of AI systems that can keep getting smarter, we have to come to terms with the fact that they will also keep behaving differently in production under their ever-evolving workloads.
AI Application Resilience Matters
Despite the new challenges associated with managing the behaviors of non-deterministic AI application in production, we have to get performance right. This is certainly true in the case of real-time implementations, such as autonomous vehicles, where split-second results are critical to safety. It’s also true when we’re using algorithms to deliver superior experiences to customers on their mobile devices since tolerances for application latency continue to approach zero.
Unfortunately, in our haste to get up to speed on the underlying data science itself, most of us have focused on algorithmic artists and their artistry at the expense of the practicalities of putting that artistry into production. The outcomes that AI, machine learning, natural language processing (NLP), and the like promise are so compelling that it’s easy to forget we eventually have to put these apps into production.
But performance in production counts—so it’s time for us to focus on artificial intelligence operations (AIOps) just as we have DevOps.
While AIOps is still an emerging discipline—and certainly requires more than the tail end of an article to address in detail—here are a few high-level principles to consider as you tackle the challenge of optimizing the performance of non-deterministic algorithmic applications in production:
Optimize end-to-end pipeline architectures. AIOps isn’t only about the execution of algorithms, because the performance of these applications is affected by the entire pipeline of data intake, data prep, algorithmic execution, and delivery of outputs to points of consumption—whether that’s an autonomous device, a mobile app, or an executive dashboard.
Embrace elastic infrastructure. Despite our efforts to optimally engineer our applications, there will always be situations where we need to throw infrastructure at demand—especially when business is cause for doing so. So, make sure your infrastructure provides on-demand access to elastic capacity.
Pre-test performance in production with “inert” features. It’s possible to release algorithmic functions into the production environment prior to making them functionally active. This technique offers a good way of getting visibility into the function’s performance under load without impacting the performance of the current, live version of the application.
Use AI to understand AI. As noted above, the impact of various factors on the behavior of non-deterministic apps is not as easy to understand as it has been with traditional apps. We all need to think about using machine learning to understand the behavior of machine learning. This may wind up being one of the best ways to bring predictability to AI performance.
AI is making our businesses smarter than ever. But smart and slow—or smart and temperamentally erratic—is not a winning combination. We all need to start putting AIOps into practice.