AWS customer Descartes Labs uses HPC to understand the world and to handle the flood of data that comes from sensors on the ground, in the water, and in space. The company has been cloud-based from the start, and focuses on geospatial applications that often involves petabytes of data.
CTO & Co-Founder Mike Warren told me that their intent is to never be limited by compute power. In the early days of his career, Mike worked on simulations of the universe and built multiple clusters and supercomputers including Loki, Avalon, and Space Simulator. Mike was one of the first to build clusters from commodity hardware, and has learned a lot along the way.
After retiring from Los Alamos National Lab, Mike co-founded Descartes Labs. In 2019, Descartes Labs used AWS to power a TOP500 run that delivered 1.93 PFLOPS, landing at position 136 on the TOP500 list for June 2019. That run made use of 41,472 cores on a cluster of C5 instances. Notably, Mike told me that they launched this run without any help from or coordination with the EC2 team (because Descartes Labs routinely runs production jobs of this magnitude for their customers, their account already had sufficiently high service quotas). To learn more about this run, read Thunder from the Cloud: 40,000 Cores Running in Concert on AWS. This is my favorite part of that story:
We were granted access to a group of nodes in the AWS US-East 1 region for approximately $5,000 charged to the company credit card. The potential for democratization of HPC was palpable since the cost to run custom hardware at that speed is probably closer to $20 to $30 million. Not to mention a 6–12 month wait time.
After the success of this run, Mike and his team decided to work on an even more substantial one for 2021, with a target of 7.5 PFLOPS. Working with the EC2 team, they obtained an EC2 On-Demand Capacity Reservation for a 48 hour period in early June. After some “small” runs that used just 1024 instances at a time, they were ready to take their shot. They launched 4,096 EC2 instances (C5, C5d, R5, R5d, M5, and M5d) with a total of 172,692 cores. Here are the results:
- Rmax – 9.95 PFLOPS. This is the actual performance that was achieved: Almost 10 quadrillion floating point operations per second.
- Rpeak – 15.11 PFLOPS. This is the theoretical peak performance.
- HPL Efficiency – 65.87%. The ratio of Rmax to Rpeak, or a measure of how well the hardware is utilized.
- N: 7,864,320 . This is the size of the matrix that is inverted to perform the Top500 benchmark. N2 is about 61.84 trillion.
- P x Q: 64 x 128. This is is a parameter for the run, and represents a processing grid.
Read the full blog to learn how Descartes Labs built one of the world’s most powerful supercomputers on AWS and took it down in only 24 minutes.