Overview
Climate technology leader BlocPower wanted to build a powerful, cost-effective data processing pipeline so that it could process over 100 million energy profiles of buildings and better understand how it can optimize energy efficiency across the United States. The company sources its energy profiles using EnergyPlus, the US Department of Energy’s open-source whole-building modeling engine. BlocPower needed to adopt a set of high performance computing (HPC) solutions that would be compatible with its C++ software development kit.
BlocPower turned to Amazon Web Services (AWS) and adopted several highly efficient, cloud-based data-processing solutions that balance performance and cost. Within 3 months, the company finished building its entire data pipeline in the cloud, which empowered it to deploy BlocMaps, a software-as-a-service (SaaS) solution that provides actionable insights for decarbonizing buildings to property owners, utility companies, municipalities, states, and other groups. By working on AWS, BlocPower has processed over 30 TB of data at speeds 16,000 times greater than it could previously, helping it use data-driven insights to promote environmental justice and equitable housing in underserved communities.
Opportunity | Searching for Cost-Effective HPC
BlocPower aims to make buildings in America smarter, greener, and healthier. The company is committed to fostering diversity, equity, and inclusion, and its workforce consists of 60 percent minorities and 30 percent women. BlocPower has helped thousands of low- and moderate-income building owners, tenants, and building managers in 24 cities across New York State, California, Wisconsin, and Massachusetts understand the possibilities of energy efficiency and renewable energy retrofits of their buildings. Additionally, BlocPower has successfully implemented electrification, solar, and other energy efficiency measures in over 1,200 buildings as of 2022.
BlocPower believes that the United States needs to electrify to reduce risks from climate change. To accelerate electrification projects and engineer practical energy solutions, BlocPower collects data from over 100 million buildings from external sources, such as the Department of Energy’s National Laboratories. These laboratories store their data using intermediate data format files, which require BlocPower to use EnergyPlus to process and render simulations of individual buildings. “There are over 130 million buildings in the United States that account for about 30 percent of our carbon emissions,” says Ankur Garg, director of data architecture and analytics at BlocPower. “However, most of the data from these buildings is not compiled in a clean way so that we can run analytics on top of it.”
BlocPower sought to build a data processing pipeline that uses HPC to run intermediate data format files through EnergyPlus, extract the necessary data, and scale to support massive parallel processing. Because the company has been cloud native to AWS since 2016, it turned to AWS to find scalable compute and data processing solutions that would work alongside the C++ software development kit. BlocPower learned about AWS Batch, which provides fully managed batch processing at virtually any scale. “To process data on premises, it would’ve cost us potentially millions of dollars,” says Garg. “We can scale to process our data using AWS Batch for a few hundred dollars every month.”
Our data processing would’ve taken thousands of hours on premises. Using AWS Batch, we can process that data in under 1 hour.”
Ankur Garg
Director of Data Architecture and Analytics, BlocPower
Solution | Building a Scalable Data Processing Pipeline on AWS
BlocPower containerizes its workloads using Amazon Elastic Container Service (Amazon ECS), which makes it easy to run highly secure, reliable, and scalable containers. Using this service, the company quickly spun up 500 containers that host 32 virtual CPUs each, all of which are orchestrated by AWS Batch. BlocPower has accelerated its data processing speeds by 16,000 times and processed over 30 TB of data. “Our data processing would’ve taken thousands of hours on premises,” says Garg. “Using AWS Batch, we can process that data in under 1 hour.”
BlocPower’s HPC compute environment uses Amazon Elastic Compute Cloud (Amazon EC2), which provides secure and resizable compute capacity for virtually any workload. And to optimize its compute costs, BlocPower adopted diversified Amazon EC2 Spot Instances, which help companies run fault-tolerant workloads for up to 90 percent off. “Using Spot Instances made our data processing very cost effective,” says Garg. BlocPower also runs its workloads using Amazon EC2 C6g Instances, which deliver better price performance for compute-intensive workloads.
The company houses its raw and refined data lake using Amazon Simple Storage Service (Amazon S3), an object storage service built to retrieve virtually any amount of data from anywhere. Since adopting this solution, BlocPower has scaled to import over 100 million files in Amazon S3 buckets. BlocPower also relies on Amazon Redshift, which uses SQL to analyze structured and semistructured data across data warehouses, operational databases, and data lakes. To maximize its cost savings, BlocPower runs its clusters in bursts using Amazon Redshift Serverless, which makes it easier to run and scale analytics without having to manage your data warehouse infrastructure. Using these solutions, the company has streamlined its data management, improved the performance of its query processes, and can run advanced analytics that help it gain insights into improving the energy efficiency of buildings. For visualizing its data, the company uses Amazon QuickSight, a cloud-native, serverless business intelligence service.
After BlocPower completed its data processing pipeline, it quickly deployed BlocMaps, an SaaS solution that provides users with climate justice data and the tools to create a sustainable electrification program and address inequities in their communities. The company received DevOps training from the AWS team, which helped it finish the development of this SaaS solution in 3 months. “AWS has provided complimentary trainings,” says Garg. “The AWS team is very supportive and takes time to help us.” On the backend of its SaaS offering, BlocPower uses machine learning to deliver relevant information to users. To run its models, the company uses Amazon SageMaker, which helps users build, train, and deploy machine learning models for virtually any use case with fully managed infrastructure, tools, and workflows.
Reminder: Read the full case study to learn more. You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel.