Six Steps to Better AI Projects

By Andy Morris, IBM Cognitive Infrastructure

August 9, 2019

According to data from 451 Research’s Voice of the Enterprise: AI & Machine Learning 2H 2018 survey, 20% of respondents have already implemented machine learning in the enterprise and an additional 20% have a machine-learning initiative in the proof-of-concept stage.

Many organizations that are building their AI practices are running into some issues along the way. Problems such as a lack of skilled resources, the complexity of deploying and maintaining models, or the inefficient allocation of limited compute resources begin to plague enterprises at this stage. 451 Research finds that there are six technical elements that could be leveraged to launch more successful AI trials:

Start with an infrastructure optimized for AI development

Look for a solution that aggregates prominent open source deep-learning frameworks alongside development and management tools so that enterprise users can more easily build and scale machine-learning pipelines. Common software includes Caffe, TensorFlow, PyTorch and Keras frameworks, and complimentary modules such as LMS and SnapML.

Give your large models breathing room

The internal bandwidth of many accelerated servers is a significant hurdle for data scientists looking to train complex deep-learning models using expansive data sets. Solutions that directly connect CPU-based hardware with GPUs can significantly boost data transfer speeds to system memory. This allows users to tackle projects where model size or data size are significantly larger than the limited memory available on the GPUs, leading to more accurate models and improving model training time.

[Also learn now to not squander your AI resources.]

Watch the time you spend on training models

Another significant barrier for deep-learning projects is the substantial time it takes to train models, which can slow development cycles and delay project timelines. A distributed machine-learning library for GPU acceleration supporting logistic regression, linear regression and supporting vector machine models can accelerate the training process. Along with preinstalled models, this makes the building and training cycles significantly more efficient.

Maximize available compute capacity

Scaling jobs across compute nodes is another challenge at the leading edge of deep learning. Look to leverage EDT, a feature that allows users to both distribute jobs across multiple compute nodes and elastically allocate GPU resources. The dynamic scaling enabled by EDT allows researchers to more easily prioritize machine-learning jobs. A second option to scaling is the Distributed Deep Learning (DDL) feature, which allows users to allocate a training job across multiple servers with minimal communication inefficiencies.

[Read how to avoid AI redo’s by starting with the right infrastructure.]

Optimize tuning of hyper parameters

Defining and tuning hyper parameters is often a time-consuming and tedious part of the machine-learning process. Look for hyper parameter optimization feature that allow users to automate this process by building and comparing a series of models in parallel.

Reduce bottlenecks in job scheduling

Finally, consider Watson Machine Learning Accelerator which uses IBM Spectrum Conductor, a Spark-based, machine-learning workload lifecycle manager and scheduler. This product helps make sure researchers are utilizing compute resources at their maximum capacity. IBM views Spectrum Conductor as a real differentiator as enterprises manage access to what are still quite rare and expensive compute resources. It has found data scientists overcompensating in terms of blocking CPU or GPU resources, and claims Spectrum can schedule jobs more intelligently based on the nature of the job, helping enterprises scale their machine-learning efforts.

IBM Watson Machine Learning Accelerator is the way enterprises can alleviate these problems and “supercharge” their AI practice, according to 451 Research.

“Watson Machine Learning Accelerator provides an extensive number of tools that address several of the common concerns of data scientists. By bringing together IBM technology and expertise across the stack, IBM has created a software offering that should accelerate enterprise machine learning.”

451 calls Watson Machine Learning Accelerator a solution for businesses looking to “apply machine learning at scale within their organizations.” Data scientists don’t have to learn proprietary tools in order to take advantage of the administrative, development and management tools that come along with Watson Machine Learning Accelerator. IT leaders will love the administrative and management side of the software because it meets security and compliance needs while ensuring maximum usage of accelerated server assets.

Customers such as BP, Wells Fargo and Deep Zen are already taking advantage of Watson Machine Learning Accelerator today to drive business insight in their organizations. Read more about how they are using it and what makes Watson Machine Learning Accelerator different in 451 Research’s report on this exciting technology for AI.

 

 

Return to Solution Channel Homepage
HPCwire