Five Steps to Building a Data Strategy for AI

By Linton Ward, Distinguished Engineer, IBM

December 4, 2018

Our data-centric world is driving many organizations to apply advanced analytics that use artificial intelligence (AI). AI provides intelligent answers to challenging business questions. AI also enables highly personalized user experiences, built when data scientists and analysts learn new information from data that would otherwise go undetected using traditional analytics methods.

AI-driven analytics delve more deeply into organizational data, deriving smarter insights that can give businesses a powerful competitive edge.

Applying critical thinking to AI analytics

A well-considered data strategy is essential from the start. When organizations identify a business problem to be solved—and the decisions to be supported by the analytics—they reach the point where they need to think critically about the data required to solve that problem. Here’s a five-step process for helping ensure a successful AI analytics project.

1. Make a plan

Many enterprises struggle with data silos that can render a unified view of analytical data highly challenging. Achieve clarity on the goals of your analytics project first. Then, identify potential data sources across the enterprise. Integrating this data may require a data lake in addition to conventional enterprise data warehouses.

For example, relational databases include a wealth of structured, quantitative data. Quantitative data is useful for answering questions such as how many units were sold and when—and with what other products. However, structured data is much less useful for questions such as which product might have been sold with another or suggesting a new line of business to pursue. Augmenting structured data is necessary to answer these kinds of soft, strategic questions.

2. Bring together a diversity of data

Data required to answer strategic questions is often qualitative in nature. Qualitative data generally comes from unstructured sources, such as text documents or notes, external website content, social media posts, and images. Organizations need to determine how they can such data to get additional value.

One example might involve the Internet of Things. Organizations with sensor data streaming in from a smart device, for instance, might augment the quantitative data with engineering notes or other types of softer data to enhance machine reliability and repair prediction.

3. Define the data architecture

Organizations that have undergone mergers and acquisitions or have diverse lines of business often have many diverse data sets—including different views of the same data. This situation raises several questions: Who owns the data? What is the best version to use? What is the right data architecture?

To address these questions, organizations need to move beyond database administration to architect data across diverse sources. The data sources must then be integrated in a meaningful way. While the initial prototyping for an analytics project might be ad hoc, a repeatable data architecture and data flow is necessary for long-term success.

A repeatable data flow can pull in data from various end points, spanning operational business processes to mobile devices to sensors. Enterprises may want to work with a companies that offer the tools and expertise required for defining the data architecture.

4. Establish data governance

Another key consideration is the data governance that helps ensure information culled from diverse sources is trustworthy, particularly for organizations in regulated industries. Along with protecting security and privacy, maintaining visibility into the data supply chain is also critical. They need to know where the data came from to validate it. Credible analytics models require the ability to detect and track down any issues in the data pipeline.

Emerging technologies provide more data governance for advanced analytics. For example, Hortonworks, IBM and others are part of the open source Apache Atlas project that has as its mission to bring data governance to data lake technology.

5.  Maintain a safe data pipeline

Establishing policies and procedures to create a process that allows data to flow continuously into the analytics pipeline allows enterprises to make the most of AI analytics. A vital step is to build security and privacy into both the design of the infrastructure and the software used to deliver this capability across the organization.

Gaining competitive advantage through AI

Data is one of the most valuable assets in any organization and can yield a unique competitive advantage when coupled with the power of AI. By following the steps outlined here, organizations can identify, collect, integrate and manage the data that is essential to AI-driven analytics. Learn about an optimized IBM AI infrastructure reference architecture for an advanced analytics enhanced with AI in your organization.

 

Return to Solution Channel Homepage
HPCwire