Encoding workflow dependencies in AWS Batch

Most users of HPC or Batch systems need to analyze data with multiple operations to get meaningful results. That’s really driven by the nature of scientific research or engineering processes – it’s rare that a single task generates the insight you need.

AWS Batch customers are no exception, of course, which is why Batch supports nested dependencies to create relationships between jobs. In today’s post, we’ll walk you through how to encode job dependencies in Batch, using a simple machine learning pipeline as an example.

Our scenario

Let’s take the following high-level diagram depicting a simple machine learning pipeline.

*Figure 1- A simplified diagram of a machine learning model training workflow, showcasing the serial dependencies between each operation on data to build the final trained model.*

The diagram shows a straight-forward set of serial steps (or jobs) that transform data into a trained model that can be used for inference within some other system. Even for this simple example workflow there are a few things to note: (1) each step can have different requirements for CPU, memory, storage, or even a GPU, so it makes sense to encapsulate each as a separate job definition; (2) steps further in the chain depend on data from an earlier step; and because of that, (3) if an earlier step does not succeed, then it makes no sense to provision resources for the dependent processes that follow.

When using AWS Batch, you have a choice for how to encode the dependencies between jobs:

Use the SubmitJob API to define the dependencies as you are submitting the job requests. This job dependencies are directly encoded within, and managed by, Batch.
Define the job dependencies outside of Batch using a workflow framework, such as Apache Airflow or AWS Step Functions.

I’ll describe both of these methods, starting with directly encoding job dependencies in the Batch API. I’ll use AWS Step Functions, which integrates natively with AWS Batch, to describe encoding dependencies outside of Batch.

Encoding job dependencies at runtime with AWS Batch

When you submit a job request to AWS Batch, you have the option of defining a dependency on a previously submitted job. This is what we mean by “runtime” – that some API requests have been made and you can refer to these in subsequent requests in an iterative manner and that job dependencies are not formally defined before submitting requests are acknowledged by the Batch service.

The following example shows how to submit a job using the AWS CLI and then using the returned job ID in a subsequent job request to define the dependency…

Read the full blog to learn more about Encoding workflow dependencies in AWS Batch.

Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel.