[Attend the IBM LSF, HPC & AI User Group Meeting at SC19 in Denver on November 19!]
Almost everyone is talking about artificial intelligence (AI). Some even believe that it will spark the next industrial revolution.1
But industries run on fuel, just like early steam locomotives ran on coal. Think about all the steps necessary to transform that fuel into useful work – finding the coal, mining it and transporting it, refining it and loading it into the coal car behind the engine, and finally shoveling it into a boiler that burned it to produce steam that spun the wheels of the world.
Data is the fuel that will power AI-driven industries of the near future. And just like the fuels of previous industrial revolutions – coal, oil, uranium, falling water, even sunlight – data requires a deliberate process and many dedicated tools to transform it from noise into insight.
The process that prepares data to power industry is called the data pipeline. The first step in this pipeline is to gather data from all its available sources. This in itself can be a daunting task. Healthcare in the USA, for example, produces an enormous amount of data each year, from sources such as 1.2 billion clinical documents,2 40 million MRIs,3 and 80 million CT scans,4 among many others.
[Also read: Building a Solid IA for Your AI]
Next, data must be prepared. This is also not a trivial task. In fact, business analysts and knowledge workers may spend up to 80% of their time finding and preparing data – leaving only 20% for performing actual data analysis.5
AI applications are different from previous computer programs. They learn. This fundamental distinction introduces the next step in the AI data pipeline – model training. Once data is gathered and prepared, then carefully selected data sets must be fed into machine learning and/or deep learning models to “train” them to perform their desired functions.
Finally, when adequately trained, AI-driven applications are moved into production to generate value for their user communities.
For enterprises that intend to thrive in the era of AI, it’s crucial to understand that just like mining, transporting, and refining coal demands substantial effort and infrastructure investment, so too will effective and efficient AI data pipelines. To achieve acceptable levels of insight and accuracy, AI applications require access to immense amounts of training data and processing power.6 Such requirements can make the infrastructure transformation necessary to enable AI seem complex, high risk, and costly.
[Also learn how AI Powers Up Data Management and Analytics]
Solution providers such as IBM and NVIDIA are stepping up to help make the preparation and execution of AI strategies for industry, research, and government faster, less risky, and more cost-efficient. One approach with a proven track record is converged infrastructure. These are essentially all-in-one solutions that integrate compute, networking, and storage in a single flexible platform intended to reduce design, deployment, and management overhead; lower costs; and simplify scalability. These solutions have steadily grown in popularity, with over half of enterprises planning to deploy them, while nearly a third already have.7
Converged infrastructure solutions reduce risk because all the elements have been pretested and validated. They speed the IT transformation process because in one deployment an organization can move from outdated infrastructure to the latest technology. Converged solutions offer the flexibility to mix and match different components, without worrying about compatibility or complex setup and configuration.
Converged infrastructure solutions can be designed to address specific application workloads and use cases. IBM and NVIDIA have done this with their new IBM Spectrum Storage for AI with NVIDIA DGX Systems offering. It’s an integrated compute and storage solution engineered to support the complete AI data pipeline lifecycle – from data ingest and preparation through training to inference – using the latest innovations of systems and software.
IBM Spectrum Storage for AI with NVIDIA DGX Systems provides the ready-to-deploy robust infrastructure and software that AI projects need to ramp up quickly and grow confidently. Designed for NVIDIA DGX-1 and DGX-2 systems, with IBM Spectrum Scale software-defined storage, and Mellanox networking, the converged solutions can be configured to start small and grow as organizations evolve. The NVIDIA DGX software stack includes access to the latest in NVIDIA optimized containers via the NGC container repository, plus the new RAPIDS framework to accelerate data science workflow.
IBM Spectrum Scale is deployed across NVMe-accelerated flash storage that has been tested to support 9 DGX-1 systems and 3 DGX-2 systems delivering 40 GB/sec of data throughput per 2U enclosure with linear scalability as more units are deployed. Multi-rack configurations are possible as well. The solution delivers AI workload performance comparable to that of local RAM. Spectrum Scale provides the flexibility to address storage requirements across the entire AI data pipeline – from ingest; through data classification, transformation, analytics, and model training; all the way to data archiving. It can also provide storage services across different storage choices, including IBM Cloud, AWS, IBM Cloud Object Storage, and tape, with shared metadata services provided by IBM Spectrum Discover. For more details about IBM Spectrum Storage for AI with NVIDIA DGX Systems test results read the reference architecture.
AI is not an option; it is a reality. Well-informed organizations realize that implementing effective AI-driven apps demands real commitment. IT infrastructure is required that can provide sufficient performance and throughput, with easily scalable systems designed to handle each stage of the AI data pipeline. You can use trial and error to get all the pieces right – or you can deploy purpose-engineered, validated solutions from vendors who are world leaders in AI technology. Once you have some experience, you may discover that the choice is an easy one to make.
 World Economic Forum: The Fourth Industrial Revolution https://www.weforum.org/about/the-fourth-industrial-revolution-by-klaus-schwab
2 HIMSS: Health Story Project https://www.himss.org/sites/himssorg/files/FileDownloads/HIMSS%20Health%20Story%20Project_FactSheet.pdf
3 Forbes: Want Fries with That? A Brief History of Medical MRI, Starting with A McDonald’s, April 2018 https://www.forbes.com/sites/elliekincaid/2018/04/16/want-fries-with-that-a-brief-history-of-medical-mri-starting-with-a-mcdonalds/#6e64b0223de0
4 CBSNews: Report: Many medical imaging tests performed in U.S. are unnecessary, April 2015
5 InfoWorld: The 80/20 data science dilemma, September 2017 https://www.infoworld.com/article/3228245/the-80-20-data-science-dilemma.html
6 TechTarget Whatis definition: Deep Learning (https://searchenterpriseai.techtarget.com/definition/deep-learning-deep-neural-network)
7 ESG: Economic Value Validation: Quantifying the Value of VersaStack, a Converged Infrastructure Solution by IBM and Cisco, April 2016