Some of today’s most popular consumer technology devices were born at Amazon Lab126. The California-based research and development organization has created such high-profile devices as the Amazon Kindle e-reader and the Amazon Echo smart speaker.
Amazon Lab126 Devices teams use high-performance compute (HPC) capacity and machine learning capabilities to scale design environments to accelerate product development, gain efficiencies, and speed time-to-market. However, its aging, costly, on-premises HPC environment could not deliver the scalability and ease of use the teams required. “We run large simulations with long runtimes, such as looking at mechanical and thermal responses of consumer devices under certain conditions,” says Shankar Ganapathysubramanian, senior manager of the architecture team at Amazon Lab126. “We needed more compute capacity to support these workloads.” Amit Gaikwad, senior manager of wireless engineering for Amazon Lab126, adds, “We were architecting and building more customer-facing solutions, and the on-premises HPC environment didn’t give us the scalability and speed we needed.”
In 2018, Amazon Lab126 built a flexible HPC reference framework on AWS, which replaces its on-premises HPC solution and enables an AWS-based, multi-user R&D environment for scale-out workloads such as HPC and machine learning. The new framework integrates and simplifies compute-heavy Amazon Elastic Compute Cloud (Amazon EC2) instances with a fast network backbone, unlimited storage, and budget and cost management. It relies on Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic File System (Amazon EFS) for data storage. Amazon Lab126 also uses Amazon FSx for Lustre for the most I/O-intensive workloads and AWS Backup to make the cluster more fault-resilient. Crozes says, “AWS Backup was the perfect solution for automating the protection of the production environment. It would have taken us many iterations to create a solution like that, which protects all the teams’ data, manages retention/lifecycle, and is simple to use.”
Running HPC Jobs Three Times Faster and Driving Product Innovation
Lab126 product designers and engineers have seen performance gains on the new HPC cluster. For example, the wireless device connectivity team improved cycle times for structural device drop simulations that study how cell phones behave when they hit the ground or another surface. “We saw a threefold increase in speed for our entire design cycle by using a scale-out computing HPC framework on AWS,” says Ganapathysubramanian. “We can run more simulations now because it’s easier to parallelize the workloads. Using the on-premises HPC solution, it would often take weeks to generate data. Now we can do it in hours.”
With the new framework on AWS, Amazon Devices designers and engineers can scale on demand to meet the requirements of specific workloads. “We have very large runtimes that require a lot of compute just to analyze wireless connectivity data,” says Gaikwad. “Using this solution, our engineers across the globe can scale the solution three times faster than before. And they can scale down just as easily, so if they don’t need 100 GPUs for a job, they don’t have to use them.”
Because of the scalability and simplicity of the AWS-based HPC environment, Amazon Devices teams are spending less time on hardware management and more time on innovation.
Amazon Lab126 is now entering the next phase of its HPC solution, powered by the scale-out computing framework on AWS. “We will continue to address the needs of our customers,” says Jake Boswell, Senior Manager of Design Technology for Amazon Lab126. “We’re looking to make the reference architecture even simpler and to extend the framework into additional areas to support innovation.”
Read the full story and watch a video with Jake Boswell talking about how HPC on AWS helps to ensure team collaboration and foster innovation.
Learn more about Amazon HPC solution.
Get started with running your CAE/CFD workloads now – fill the form and get a $100 AWS credit!