ADP is a leading solution provider in the human capital management (HCM) market. To retain our leadership position, we’re increasingly reliant upon the quality of our software initiatives. This blog post reveals how ADP has harnessed SRE models, BizOps strategies, and optimized metrics to successfully launch IT transformation initiatives that position the business for long-term success.
A top provider and innovator in the HCM market, ADP delivers solutions that support more than 800,000 clients in 140 countries. The company’s offerings span such areas as human resources, talent, time management, benefits, and payroll. ADP has been around for more than 70 years, and has been a pioneer in leveraging technology since its inception. We were early adopters of cloud models and today we run global, follow-the-sun IT operations.
I’m a chief architect at ADP. Within the IT arena, we’re tasked with delivering value to the enterprise while navigating an age of automatic system restoration, cloud infrastructure, intelligent alerting, and highly automated software delivery systems. We’re constantly striving for greater efficiency and automation, both to streamline operations and to provide consistently optimized user experiences. Currently, we’re focused on initiatives relating to Kubernetes, DevOps, and open source, and we’re managing the transition to a model that’s based on Site Reliability Engineering (SRE) approaches.
Through these efforts, we’re continuing to pursue IT transformation and boost our operational excellence. In the following sections, I’ve outlined some of the key steps that have positioned us for success in this journey.
The bottom line is that it is through measurements and analytics that you continue to optimize and refine your operations. That’s why we set out to build a culture of maturity in how we select, validate, and promote KPIs.
On a constant basis, we get thousands of metrics from monitoring tools. None of these metrics are superfluous; they help populate transaction traces and offer the visibility needed to triage performance problems. However, we also needed to distill these metrics to get to the KPI gold.
Toward that end, we embarked on a voyage of discovery. This included several efforts. We started with establishing KPIs, and then going through a process that included establishing trends and baselines, moving from static to dynamic thresholds, incorporating other streams of information, enhancing visibility through enrichment and validation, and more. Through these efforts, we could ultimately identify the golden signal or “North Star” measures that define critical parts of the business.
Following are some other keys to success in these efforts.
We wanted to make sure we were establishing metrics that will help us be more effective. The worst thing for IT is to have an application that doesn’t alert when there’s a problem, or that generates alerts all the time when there’s no problem. To prevent these issues, you have to have KPIs that capture the signature of the application.
Ultimately, a key part of establishing KPIs involved getting IT and business teams to trust specific metrics. We had to convince people that you don’t need to track thousands of metrics. It only takes a couple metrics to define the signature of an application, whether it’s based on one instance or thousands.
Given that, after an issue has occurred, part of our process is to go back and generate a baseline and look at a couple operational points in preceding days or weeks. Based on this inspection, we can identify KPIs, and show them to business teams. Consequently, we can explain that it’s only these specific KPIs that we should be looking at because these are the ones that would have predicted the issue.
Ultimately, it is important to have clear, meaningful KPIs that capture the imagination and rally the troops. For our application teams, a big part of that has meant focusing on the user experience. We’ve been looking to merge different sources of information, including customer sentiment data, adding to what we’re gathering from APM tools.
Through these efforts, we can build dashboards that show KPIs for application performance and customer sentiment for the application. We use that visibility to rationalize whether monitoring is correct. For example, if we know consumers are having a problem and we’re not seeing it in our tools, what are the metrics we should be looking at? Ultimately, we don’t want the help desk to be a better predictor of application performance than our monitoring tools.
Conversely, if we’re seeing issues in an application, but users aren’t being affected, we can use those insights to prioritize our efforts, and make sure we’re not spending time trying to tune aspects that don’t have any impact on the user experience.
Through these activities, we were able to set the stage for understanding which provable KPIs result in golden signals—measurements that capture the essence of our enterprise and are key to a productive user experience.
If we want to continue to lead in the marketplace, we need to be able to develop practitioners from within, rather than being limited to what talent is available in the market. We need to support the development of our own human capital and build a pipeline for staff advancement, offering meaningful roles and career paths.
To succeed, we need more folks taking ownership of specific responsibilities within IT, so more people are contributing to making incremental changes and enhancing our software. We wanted to equip staff with the skills, technologies, and best practices that help make them reliable contributors.
Through our application of BizOps principles, we sought to bring IT and business together. In this respect, the big win was to use corporate training resources and make training part of people’s development plans. That meant your chance to receive a bonus was tied to training. This really helped jumpstart our efforts, motivating teams to organize, change behaviors, and learn new skills.
To achieve our objectives, we focused on two key learning paths:
It was vital that staff moved from a focus on managing IT resources to managing business implementations. We started by working to eliminate the lines that traditionally separated IT and business. A big part of that effort required us to establish views, dashboards, and reports that reflected both perspectives. This also helped us accelerate the move away from having staff concentrate on individual metrics and instead having them become focused on the metrics that are significant for good business outcomes.
We also wanted to harness the domain expertise of practitioners and help them move from solely being data consumers to becoming effective data analysts. We wanted to teach them how to search for trends and correlations, find the metrics that matter, visualize data, and use it to make a compelling case.
As part of the onboarding process, we stress the importance of having staff using monitoring data. A portion of our teams must complete training and demonstrate competencies in activities that underpin new roles and responsibilities. As our technology implementations evolve, we continue to move people from novice to consumer, and ultimately to practitioner.
Longer term, we look to have some staff move into architect, application specialist, data engineer, and data analyst roles. Through these efforts, we’ve been able to build a pipeline of evolving roles and establish a large number of staff members who understand KPIs and who are adept at finding out what really matters in software and business systems.
The foundation for effective analytics begins with technology. To continue to refine our capabilities, we’ve adopted an approach often referred to as the “AI hierarchy of needs.” The model is represented as a pyramid. The process begins with the collection phase, and includes a number of steps, including transforming and aggregating, until you reach the top, which is AI and deep learning.
Source: Hacker Noon, “The AI Hierarchy of Needs,” Monica Rogati, June 12, 2017
It is important to note that, as we move up the analytics pipeline, we’re significantly reducing the volume of data that we have to handle. This is important. Achieving an analytics-driven enterprise requires vast amounts of data. The data pipeline needs to be curated at each stage to focus on what is meaningful, and help us establish traction and scale.
Google accomplishes this by reducing time-series data to a rate, and identifying how the metric changes with time. In this way, you can avoid having to figure out what thresholds to set, and instead track anything that has an unexpected rate of change, whenever and wherever that appears.
As part of this process, you need expertise to validate KPIs for applications. An essential first step is reducing metrics volume. Instead of having hundreds or thousands of metrics per agent, we need to have half a dozen that correctly characterize the application. Only a small percentage of data gets promoted to the next level. This makes it more reliable to notice the unexpected. This is the kind of effort our SREs dive into.
Ultimately, this process is critical. You can’t simply take a “boil the ocean” approach and hand your entire enterprise data set to a machine learning platform and expect it to figure everything out. Instead, you need to reduce the data to baseline trends and incidents. This is where machine learning can be applied. It can mimic what you are doing and predict relationships you had not considered before. Finally, to be able to trust the system, you need to have consistent data validation at each stage of the pipeline.
Within our organization, we’re intent on applying best practices, particularly in the following areas:
We wanted to start by using new technology to scout out new opportunities, and get practical experience with analytics techniques. Moving forward, we can then take these techniques and apply them to our broader environment, including both our legacy monolithic environments and our containerized workloads. In architecting our solutions, we wanted to ensure that the infrastructure that gets produced in small pilot applications is reusable for bigger implementations.
Our approach was to start with identifying an application that we could put together in the short term, and that would give us metrics that are both analytic and very business focused, such as sentiment, and aligning that with classical IT performance monitoring. After that, we could continue to repeat, identifying other candidates in the application portfolio to apply these capabilities to.
It was also important to start with a small team on the cutting edge of technology, and get something done that demonstrates cost reductions. This was invaluable in gaining buy-in and acquiring the insights we needed to continue to expand and refine our analytics infrastructure.
Our chatbot capability is the initial application we started with. This effort started as part of an IT transformation initiative, with the initial objective of leveraging automation to realize cost savings. It is a great starting point as it is an application that evolves and improves with analytics and machine learning. The bulk of the platform is built with open source technologies, which presents us with an opportunity to look at new features and rapidly prototype new functionality.
The chatbot generates a large volume of data. We monitor from a classical IT performance point of view, and we also leverage monitoring to continually refine capabilities that the chatbot presents. The early results from this implementation are promising. Approximately 60% of work identified as toil was automated. In addition, in the first sprint alone, about 3,200 manual tickets were eliminated.
Moving forward, we’ll continue to look to analytics and machine learning to help us manage our performance from an IT perspective. Further, the same set of information is also going to business teams, so analysts can refine the application and uncover opportunities to enhance and grow service capabilities. Now, these capability decisions aren’t being made based on gut instincts; they’re based on hard measurements, trends, and opportunities.
I was able to join a number of industry experts and practitioners at the BizOps Virtual Summit event, where I gave an in-depth presentation on our IT transformation strategies and results. To learn more about the success we’re experiencing, be sure to visit the BizOps Virtual Summit resource center page. At this page, you can access my complete presentation as well as those of a range of other industry experts and practitioners.