Autonomous vehicles will transform our daily lives and our communities. What seemed like science fiction a decade ago is now visible as test vehicles gather data, tune sensors and develop the artificial intelligence (AI) to make cars self-driving and safer. Every major auto company, their suppliers and startups across the globe are using the latest technology in an arms race to the future where cars drive themselves.
It isn’t enough for the vehicle to navigate itself. It must also be prepared for the unexpected. There can’t be instructions for every pedestrian, careless driver, obstruction or fault. Instead, the industry is using AI frameworks to build vehicle controls that recognize and react. Taking data from dozens of sensors, the controls recognize the difference between a stationary rock and an animal crossing the street and quickly steers, brakes or accelerates to avoid it. To do this faster and more safely than a human driver, they use machine learning and deep learning effectively to “teach” systems to classify, measure and react to unanticipated scenarios based upon previous data. Every run of each vehicle adds more information and the feedback improves the system.
As more vehicles travel, more data is captured. In fact, a lot more. Between lidar, radar, cameras, GPS, internal systems and mechanical sensors, industry experts estimate an autonomous vehicle can generates close to 10TB per hour. This data comes in many forms, but what is critical is that it is all properly identified as being from that one car at that one time, and it is in proper order to reconstruct the inputs, system recognition, reaction, and it allows humans to evaluate whether the vehicle performed optimally.
Evaluating the effectiveness of the AI programs that drive the car requires a continuous process of testing. By going through many series of scenarios, the current program is tested against the latest version. This is a virtuous cycle of champion and challenger repeated through a growing volume of data and unique test cases to improve the AI and the systems behind it. To support this testing, high-speed scalable storage drives clusters of servers, each with multiple GPUs programmed to quickly crunch through the data. Scalability in a balanced system with sufficient data bandwidth, fast storage, networks and servers is critical to the team’s productivity.
Managing the data flow
Properly managing this data is not a trivial task. In early research, data is stored within the test vehicle. Cables are attached after testing and high-speed dedicated networks move the TBs of data to high-speed storage for computational analysis and human review. As development matures, more cars are on the road, which increases the transfer complexity and shift to wireless networks. Data value will be strongly based upon the metadata–the data about the data–which will identify the vehicle, sensor, time and location where that data was gathered. With proper metadata management, developers will be able to quickly augment specialized training and testing sets to help develop features or test a particular situation.
Long-term data retention and rapid retrieval is a third challenge for autonomous vehicles. There is a large amount of data needed for development and the potential for massive data growth with each new car on the road. Each vehicle, each near-miss, and every accident that has ever happened in a modern, instrumented vehicle is the raw data to improve the autonomous vehicle’s AI systems. As improvements in systems and software get developed, automakers will need to demonstrate that previous incidents will be avoided in the future. Similarly, should human error or some other unique circumstance result in an incident, the automakers will need a library of similar situations that ended well.
As we look at the data requirements for the autonomous vehicle, three distinct data patterns emerge.
- The high-speed and scalable data required for continuous development.
- Rapid data identification, tracking, and access using metadata–the data about the data–to manage the complexity.
- Long-term storage that delivers low cost and high reliability, but can handle the rapid retrieval of random data.
Only IBM has the portfolio of solutions that can support these requirements. The leading scalable filesystem, IBM Spectrum Scale, to drive fast compute such as those delivered by Nvidia DXG servers; the leading object storage, IBM Cloud Object Storage, that can grow efficiently and works with analytics packages such as Spark to rapidly identify and classify data; and the metadata engines to automate and track data movement between them as needed.