This post was contributed by Ampersand’s Jeffrey Enos, Senior Machine Learning Engineer, Daniel Gerlanc, Senior Director for Data Science, and Brandon Willard, Data Science Lead.
Ampersand is a data-driven TV advertising technology company that provides aggregated TV audience impression insights and planning on 42 million households, in every media market, across more than 165 networks and apps and in all dayparts (broadcast day segments). With a commitment to privacy, Ampersand enables advertisers to reach their target audience by building viewer profiles to help advertisers understand which networks and at what times their ads are most likely to be seen by their desired audience.
The Ampersand Data Science (ADS) team estimated that building their statistical models would require up to 600,000 physical CPU hours to run, which would not be feasible without using a massively parallel and large-scale architecture in the cloud. After trying several solutions, Ampersand turned to AWS Batch, a highly scalable batch and ML scheduler and orchestrator that gives users access to a large quantity of compute resources and allows them to run their containerized jobs with the best fit of CPU, memory, and GPU resources. The scalability of AWS Batch enabled Ampersand to compress their time of computation over 500x through massive scaling while optimizing their costs using Amazon EC2 Spot.
In this blog post, we will provide an overview of how Ampersand built their TV audience impressions (“impressions”) models at scale on AWS, review the architecture they have been using, and discuss optimizations they conducted to run their workload efficiently on AWS Batch.
Modeling TV impressions at scale
ADS builds statistical models that predict impressions. Using insights from these models, Ampersand constructs successful advertising campaigns for its clients.
The company’s data scientists use Bayesian methods to predict future impressions for different combinations of geographic regions, demographics, and cable television networks (e.g., CNN, ESPN, AMC) over time. ADS refers to a single region, demographic, network combination as a DNR.
A single model is built for each DNR that predicts how many people are watching at any given time. Each model is a fully Bayesian Hidden Markov Model (HMM) that explicitly characterizes the “states” of impressions, with the most basic states being “viewing” and “not-viewing”. Each state has corresponding parameters that are used to help predict transitions from one state to the other at any given time. The models are estimated with Markov Chain Monte Carlo (MCMC) and produce sample results that can be used to directly estimate arbitrary functions of its impressions’ predictions and for uncertainty propagation. More specifically, these sample results are combined to make predictions for higher-level aggregate categories. You can experiment with HMMs using this Python notebook.
ADS writes their model specifications and fits their models using open-source Python packages. Their HMM implementation uses the open source packages Aesara and AePPL, projects for which Brandon Willard is the lead developer. A similar implementation using the standard PyMC3 NUTS sampler alongside a custom forward-filter backward sample (FFBS) step is provided by Ampersand’s own pymc3-hmm package.
Ampersand’s computational ML architecture on AWS
ADS’s data architecture relies on several AWS services to ingest aggregated impressions data, transform it, and then use it to fit their models at scale. We will go over the flow of the architecture then discuss optimizations ADS conducted to run at scale…