Managing infrastructure complexity in the age of AI
When most of us hear the term autonomous vehicles, we conjure up images of driverless Waymos or robotic transport trucks driving long-haul highway routes. While widespread use of autonomous vehicles is a few years off, the technology is being deployed in applications ranging from autonomous mining equipment to forklifts to driverless Zambonis. For most automotive manufacturers, computer-aided engineering (CAE) HPC data centers are being tapped to help enable autonomous systems and other new AI applications. CAE data centers need to accommodate these new workloads while controlling costs and minimizing disruption to existing operations.
New applications for AI in vehicles
Among the many applications of AI in the auto industry, autonomous vehicles get most of the attention. There are degrees of autonomous operation. In addition to designing cars that drive themselves, manufacturers are developing semi-autonomous features including automated-braking, lane-change assist, eye tracking, and self-parking features. These innovations not only make vehicles safer and easier to drive, but they provide critical competitive market differentiation.
Read also: AI in Action – Autonomous Vehicles
Human drivers make decisions with relative ease. We know how to brake, steer, and adjust our driving based on weather, traffic, and potential hazards. In autonomous vehicles, these same decisions need to be made by predictive algorithms that fuse inputs from many sources including LIDAR, cameras, inertial sensors, and GPS to arrive at decisions with a high level of confidence. Vehicle control systems run a variety of predictive models, continuously evaluating inputs and inferring results many times per second.
Diverse applications for AI and machine learning
Soon, fast 5G networks will make it practical to stream more data directly from vehicles, enabling new applications and revenue models. This will drive the need for even more data handling and storage capacity. Also, autonomous systems are just one of many applications for AI in automotive. Manufacturers are adopting machine learning for a variety of applications including:
- Predictive maintenance & condition monitoring – developing models to proactively identify failures both in the vehicle and in the manufacturing process
- Quality control – Systems that can visually inspect manufactured components to spot defects earlier, improve quality, and avoid costly post-sale service issues
- Manufacturing optimization – learning systems that adjust manufacturing processes in real-time to maximize yield given sensor readings and various process-related constraints
- Business-oriented applications – Warranty reserve estimation, demand forecasting, propensity to buy, etc.
Diverse frameworks make multi-tenancy a must
Most CAE centers run large-scale workloads including structural analysis, computational fluid dynamics, and crash simulations. Engineering teams share software tools and HPC clusters relying on workload managers to prioritize jobs and allocate resources appropriately.
From an infrastructure standpoint, the systems used for engineering simulation are nearly identical to those required for large-scale data analytics and AI model training. These solutions all involve clusters of powerful Linux based systems, GPUs, fast interconnects, high-performance storage, and distributed software frameworks.
Rather than have separate infrastructure silos for HPC and AI workloads, it only makes sense to consolidate them on the same infrastructure. Organizations can benefit from economies of scale, simplified management, greater flexibility, and can realize better productivity by sharing resources across applications.
The back-end is where the action is
To build and train the models that power in-vehicle control systems takes an enormous amount of computing power. Training a single deep learning model to recognize objects such as pedestrians, signs, and other vehicles can take days on a large HPC cluster. Also, models need to be continuously validated and retrained against new data sets.
Data management is an especially hard challenge. A vehicle with half a dozen cameras and multiple Radar and LIDAR sensors can generate anywhere between ~1.4 TB to ~19 TB/hour by some estimates1. Much of this raw data will need to be collected, parsed, aggregated, and transformed into formats suitable for model training and validation.
The more data we can gather and analyze, the better the predictive model and the safer the vehicle. Data handling at this scale requires fast in-memory distributed frameworks such as Spark along with HPC scale file and object storage.
Read also: High-performance Spark – Improving on Big Data’s Swiss Army Knife to Enable AI
Bringing advanced AI tools to the HPC data center
IBM Watson Machine Learning Accelerator (MLA) is a AI deep learning environment at home in both HPC data centers and the cloud. It runs on x86 or IBM Power servers and helps automate data preparation and model development. Watson MLA also provides a complete run-time environment for training, deployment, and ongoing management of machine learning and deep learning models. It is pre-integrated with popular AI frameworks and tools including TensorFlow, Caffe, Spark, Pytorch, and scikit-learn for simplified deployment.
With Watson MLA the AI environment becomes dynamic and multitenant enabling accelerated model development for multiple data science teams. It supports supports automated hyper-parameter optimization, Auto ML, and elastic training re-allocating resources including GPUs between long-running models at runtime. It also supports IBM’s Snap ML proven to run machine learning models up to 46x faster than competing solutions.
Watson MLA is built on the same IBM Spectrum LSF foundation used for engineering simulation by leading automotive and aerospace firms. This means that customers can deploy a single shared environment that supports their full range of workloads from HPC simulation to AI model training and deployment. Also, since the foundation is open, users can easily add other distributed software frameworks and avoid the need for separate application-specific clusters.
The race to bring practical AI solutions to market is having a profound impact on CAE and HPC data centers. For firms deploying applications for autonomous systems and AI, IBM Watson Machine Learning Accelerator and IBM Spectrum Computing provide a unique opportunity to consolidate AI and engineering simulation environments for greater efficiency and flexibility.
References:
[1]Based on data from Waymo test vehicles – https://www.tuxera.com/blog/autonomous-and-adas-test-cars-produce-over-11-tb-of-data-per-day/