Successful Data Science Initiatives: 5 Key Building Blocks

    There’s not much doubt that the potential for data science is massive. What many teams are finding is that the hurdles to realizing this potential are also massive. In this post, I look at some of the key building blocks that teams need to start with in order to realize maximum success from their data science initiatives.

    1. Build the Right Team

    “Let’s build the right data science team.” It’s very easy to say, not nearly so easy to pull off. We all understand that having data science expertise is now critical to the success of artificial intelligence (AI) initiatives, and ultimately to the business’ long-term prospects.

    The problem is that that’s true for your organization, for your competitors, and pretty much every other company in every other industry. One survey found that 67% of organizations are expanding their data science teams, and, between 2015 and 2019, AI-related hiring growth grew 74%. While the demand is pressing and widespread, the supply of top talent remains scarce. The same survey reported a shortfall of 250,000 data science experts in 2020. Following are some strategies for overcoming these obstacles.

    Develop Internal Talent

    Particularly as organizations move workloads to external clouds and automation continues to grow more widespread, many organizations will be able to redeploy some of their engineering experts and assign them to top-level, strategic data science initiatives. When done right, this can be a true win/win, where organizations build up staffing to drive key initiatives forward, and staff can build up experience in an area that will undoubtedly represent a hot job market in the long term.

    These internal staff members have the advantage of understanding your business, and they have skills in IT operations, data management, and analytics that can serve as a strong foundation. Look for engineers on staff that have some relevant experience and, perhaps more importantly, a drive to learn about this area. Offer these team members opportunities for education, and give them assignments to start working with the data science staff on hand.

    Find and Hire the Right People

    While it may be difficult to hire top data science expertise, it can be done. Finding and hiring a candidate that checks every box on your list of criteria may not be realistic. Start by identifying the must haves in order to cast a wider net.

    Make sure you’re harnessing the networks of existing data science staff to learn about the teams doing innovative work. Get engaged in forums, user groups, and technology communities to build connections with organizations and people. Finally, keep in mind that an optimal team will be one that has a diverse set of backgrounds and areas of expertise. By pooling different team members with unique strengths, backgrounds, and skills, teams can ultimately be established that complement one another, and enable strong results.

    2. Use the Right Data

    In plotting an effective data science initiative, it is important to start by assessing the data available. In this effort, it can be helpful to assess data across these categories:

    • Known data sets. These existing data sets effectively represent the data that’s being used today to guide decisions, plans, and strategies. At a basic level, this is the CRM data that’s used to forecast sales numbers, for example. While this data is already being used and is well understood, data science teams can often leverage this data in new ways.
    • Noisy data. This category of data represents the redundant, inaccurate, irrelevant, and trivial data that can conspire against data science teams. As outlined in my blog post on building data lakes, this is often the result of teams amassing data without first really honing in on the objectives of an initiative and the questions that need to be answered. Ultimately, the volume of data aggregated, rather than being an asset, starts to work against teams in a very fundamental way.
    • Hidden or “dark” data. In their Information Technology Glossary, Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” This data can currently exist in a range of disparate locations and domains. As organizations continue to pursue their digital initiatives, the volume of data that falls into this category can proliferate dramatically. It is this class of data that holds massive potential for data science teams. Often, it is through leveraging a combination of known data and this hidden data that teams can unlock significant value.

    By assessing the data available, and understanding its potential and downsides, teams can start to ensure they’re using the optimal mix of information to power their initiatives.

    3. Democratize Data

    As outlined above, the dearth of data science talent is a problem today, and it isn’t going to suddenly disappear any time soon. While the process will take time, ultimately, the long-term solution is to democratize data science. It is only by empowering teams from across the organization to harness AI that businesses will be prepared to navigate the challenges of the future.

    For some time now, analysts have been writing about the concept of the citizen data scientist. At a high level, this approach refers to capabilities and practices that allow users to extract insights from data without needing to be as skilled and technically sophisticated as expert data scientists. By establishing platforms that make it easy for non-data scientists, and even non-technologists, to access data, ask questions, and experiment with models, businesses will be able to open up an entirely unprecedented level of AI-powered innovation. To learn more, see our blog post on cultivating the citizen data scientist.

    4. Use Data Responsibly

    Depending on an organization’s geological reach or its industry, there may be a number of external privacy mandates in place, including regulations like the EU’s General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), the Health Insurance Portability and Accountability Act (HIPAA), and the Payment Card Industry Data Security Standard (PCI DSS), to name but a few. Complying with these types of regulations isn’t anything new. However, what is new is the need to navigate compliance while moving forward with data science initiatives.

    With data science, teams gain a new level of power in terms of how data can be mined, and with this increasing power comes increasing responsibility. To start, teams must establish policies and guardrails that ensure data science initiatives don’t encroach on, or completely violate these rules.

    Perhaps even more fundamentally, data science teams need to ensure they’re aligning with corporate standards for ethics and operating with transparency. Beyond legal risks and the potential for fines for non-compliance, failing to operate in an ethical manner can expose a business to backlash, endangering the most valuable of corporate assets: the customer and their loyalty. If not mitigated, these risks can therefore outweigh any of the potential upsides of a data science initiative.

    5. Harness the Power of Packaged Data Science Platforms

    In recent years, a wide range of workloads and services have been moved to the cloud, and data science isn’t an exception. For many organizations, the agility and scalability of these services have made cloud deployments the go-to alternative for analytics and big data initiatives from early on.

    Cloud infrastructure is only the beginning, however. Now, cloud-based services like Amazon SageMaker and Microsoft Azure ML bring together a range of technologies that represent a complete data science platform, significantly streamlining the effort required to move from set up to AI-fueled intelligence.

    Moving forward, advancements in automation will also offer profound advantages. Historically, data science initiatives have meant a significant amount of labor, particularly in terms data aggregation and clean up. Automation will fuel significant improvements in these areas. In addition, they’ll also provide teams with an ability to build and test models with a minimum of manual effort.

    Conclusion

    Today, many business leaders are looking to leverage data science, but they’re not where they want to be yet. By starting with an understanding of the core building blocks that are required, teams can set the stage for the realization of the breakthroughs AI and machine learning can provide. To learn more, be sure to view our blog post, “Putting Machine Learning Algorithms to Work.”