Today’s business, IT, and development teams face unprecedented demands. In response, these teams have to gain unprecedented levels of speed, intelligence, and sophistication. Today, this means harnessing an advanced artificial intelligence (AI) platform and machine learning algorithms. In this post, we examine why it’s becoming so vital to leverage advanced AI platforms, and we offer a detailed look at the architectural requirements these platforms need to address.
Evolving IT Landscape and Operations
In recent years, the IT landscape has seen massive transformation, and the rate of change only continues to accelerate. In the not-too-distant past, monolithic applications and static infrastructures were the norm. Today, applications, and the way they’re developed, delivered, monitored, and supported looks very different. Following are just a few of the aspects of this change:
Proliferating technologies. Environments are now composed of a diverse mix of virtualization technologies, clouds, containers, microservices, orchestration systems, and more. Environments continue to grow increasingly dynamic and ephemeral. Increasing layers and types of security technologies continue to be implemented as well.
Expanding toolsets. It isn’t just that the volume and complexity of technologies has been increasing. These evolving environments accelerate the velocity of change and require continuous analysis. As a result, the number of tools adopted to manage these complex environments has also continued to expand. Now, it’s not uncommon for IT organizations to be employing hundreds of tools for managing IT operations, security, and development.
Changing approaches. Approaches like DevOps, BizOps, site reliability engineering (SRE), and continuous integration/continuous delivery (CI/CD) are emerging as norms, which all serve to accelerate the rate of change. These emerging paradigms place fundamentally different demands on team members, requiring staff to move from being specialists with a focus on specific tools and technologies to generalists who are dedicated to managing services.
Escalating demands. For today’s enterprises, the quality and performance of digital services is increasingly intertwined with business performance. It is paramount to bring innovative, compelling digital services to market faster. Teams need to maximize service levels and security at all times. At the same time, wringing maximum productivity out of staff and technology investments continues to grow increasingly critical.
The Need for Advanced AI Platforms
All the above factors are necessitating a fundamental change in the way business, IT, and development organizations work. These organizations simply can’t continue to rely on legacy models and approaches to meet their charters.
To keep in front of their organization’s rapidly evolving technological environments, business requirements, and security threats, teams need new, advanced solutions that combine advanced AI and machine learning algorithms that augment and automate decision making. These advanced AI platforms should integrate business, development, and operations data to generate actionable insights, and help teams establish continuous improvement in the business outcomes of digital initiatives.
As a starting point, look for AI platforms that establish an intelligence layer that can sit on top of a number of specific solutions, rather than employing a platform that requires the procurement or adaptation of multiple tools. At a high level, it’s important to enable teams to access data from across a number of silos, while preserving the context of the source content and enabling this context to be shared efficiently. This contextual, comprehensive visibility is vital in delivering the actionable insights today’s decision makers, developers, and operations teams need. In the following sections, we offer a detailed look at the architectural requirements these AI platforms need to address.
Employ a Knowledge Graph to Optimize Algorithm Usage
Often, when it comes to deriving value from machine learning, it isn’t the algorithms themselves that matter. Typically, what matters is the way machine learning algorithms are orchestrated and scoped. To be able to use algorithms effectively, look for a platform that employs a multi-domain knowledge graph that describes the enterprise in great detail. Once this detailed information is available, the platform should decide what machine learning technique to apply, and which specific data set to apply it to. In addition, the platform needs to evolve dynamically as the organization and environment change.
In effect, we want to match the situational awareness of a real human analyst so the platform can pick and apply the right analysis technique to the right data set, dynamically, based on what it discovers about the environment. This approach means that any specific analysis we apply can change and adapt to the enterprise as it evolves. This is very different than the rule-based expert systems of the past, which were simply too brittle for today’s dynamic enterprises.
Harness Domain Expertise
Domain experts can leverage data effectively because they know what questions to ask. To harness this knowledge, it is vital to understand how domain experts interact with their tools and carry out their tasks. It is important to examine what signals matter for them while testing their hypothesis about a situation, including what patterns they scan, what correlations they seek, and so on. Platforms should then capture these heuristics with machine learning robots so the system can leverage this expertise and apply it correctly to the right scenarios, technologies, and problems.
Constrain Problem Scope and Employ Small, Reusable Components
Powerful machine learning techniques are often costly to run, and their efficacy typically increases with input curation. Consequently, to be most effective, machine learning has to be employed within the right guardrails. It is important to constrain the scope of the problem you’re trying to solve.
When automobile manufacturers started to employ machine learning to deliver natural language processing-based in-car assistants, they struggled for some time. However, these days these systems are very effective, which is primarily enabled by restricting the possible words to those related to making commands to a car. Platforms should narrowly define tasks and use small, reusable components to build robots. Through this approach, these platforms can make robots fast and efficient to run, and extend their utility.
Each of these robots should perform discrete tasks. For example, one robot can have the sole responsibility of detecting incidents, while another will handle incident response. Each of these robots should be managed independently, so they don’t need to be run on the same computer or built by the same team. These robots can therefore be enhanced and optimized on independent schedules, according to evolving priorities.
Employ Ontological Abstractions
In developing the machine learning architecture, it is important to establish capabilities around ontologies, rather than being tied to specific products. For example, in the area of application performance management (APM), any specific product can have distinct collection methods, terminology, and so on. Instead, it is optimal to focus on the common, industry-accepted ontology that all APM solutions share. Consequently, an architecture can work for all APM solutions, rather than being tied to a solution from a specific vendor.
At the same time, it’s important to recognize the fact that ontologies vary across domains. For example, while an infrastructure monitoring ontology will be concerned with elements like routers and switches, a DevOps ontology will be focused on testing and production rules. That’s why it is important to build architecture to accommodate different ontologies, including those for APM, infrastructure, networks, DevOps, security, and more. Most importantly, the platform should incorporate and integrate the intelligence from all these different domains.
Leverage an Open, Flexible Architecture
In the market today, many topology approaches are closed in nature, bound by specific technological approaches and linear models. By contrast, advanced AI platforms should employ an open, source-agnostic approach. The platform’s architecture should deliver flexibility in several key ways:
Data source extensibility. The platform’s architecture should not bound by any specific product, but feature an open data lake, algorithms, and more. Customers should be able toeasily accommodate new data sources, including those from a wide range of third-party vendors.
Architecture extensibility. To offer maximum long-term flexibility, a platform should enable partners and customers to introduce entirely new ontologies, without having to make any architectural changes.
Ontology extensibility. Teams should be able to add different properties onto existing ontologies, and so easily accommodate organization-specific information, including tribal knowledge, naming or classification approaches, and so on.
Robot extensibility. The ideal architecture should efficiently accommodate new robots as needed, while at the same time, enabling each robot to be employed against a unified, consistent data set.
Employ an Intelligent, Flexible, Scalable Data Model
In development, vendors can choose whether to create data models that are very general or very specific. While very specific models may be pragmatic in the near term, they typically require manual modifications to accommodate new technologies, data types, and so on. Over time, these specific models tend to become brittle, leaving teams unable to react quickly to changes.
Instead, it is important to look for a platform that employs a property-graph based approach that has history awareness.With property graphs, complex relational lookups can be done instantaneously. As a result, they provide an excellent structure for doing ontological inference. By comparison, using a traditional relational database management system (RDBMS) for this model would require an impractical amount of join queries between schemas and tables, introducing an unacceptable level of performance-degrading latency.
The optimal machine-learning model uses properties to describe the qualitative specifics of a given entity. A customer may have monitoring agents that discover some properties of an element, such as the manufacturer, firmware version, software type, and so on. In addition, teams can leverage other data sources to further decorate the element with additional facts.
Within the AI platform, all events should be captured in a time journal. These time-stamped records represent an immutable data point, and a valuable way to establish incremental observation of an environment.Through this approach, users and algorithmic analysts can observe structures and cross-domain interdependencies. For example, this enables a team responsible for networks to also see how changes in the broader environment may affect specific devices, such as when an application update causes traffic to start being sent to a different router, causing an unexpected workload spike.
Line of business leaders can assess how a backend infrastructure upgrade affected a business metric like sales per hour. In development scenarios, this approach enables teams to track build deployments and compare them with changes in application performance, so they can more quickly and intuitively spot issues and optimization opportunities.
Through employing the design principles outlined above, advanced AI platforms with robust machine learning algorithms can provide users with an unparalleled mix of characteristics. These advanced AI platforms equip customers with the ability to gain value immediately, and to leverage the flexibility they need to gain maximum benefits over the long term. In our next post, we’ll examine the benefits that can be realized in more detail.
Formerly the chief architect of DX APM and now the chief architect of Broadcom AIOps, Erhan Giral has been working in monitoring space for over twelve years and holds various patents around operational intelligence and application monitoring. Before joining Broadcom, he worked on various graph visualization and analytics frameworks and SDKs; servicing diverse verticals such as network intelligence, bioinformatics and finance.