Practical Tips for HPC and AI Container Management

By Andy Morris, IBM Cognitive Infrastructure

July 16, 2019

Containers, and Docker in particular, are having a dramatic impact on how organizations build applications. While the technology behind containers isn’t new, containers have found themselves at the heart of the DevOps movement – a new way of building scalable cloud services. Fueled by agile development methods, and the need to deploy “always-on” services, developers have embraced micro-services architectures and CI/CD 1 pipelines. Containers perfectly fit the bill as a way to automatically package, tag, distribute, and manage these modular software components.

The innovation of containers

At a high-level, containers virtualize an OS instance, making it sharable by multiple tenants such that each tenant is unaware of the other. With containers, users can essentially package-up and freeze software functionality complete with libraries and config files, and easily re-deploy the container across any system or cloud.

Under the covers, a container is essentially a collection of Linux processes running on a shared kernel. Processes in each container have their own view of the process space, network, and file system. Container images store only the “deltas” from the underlying OS, and since starting and stopping processes on a running Linux instance is fast, containers can be created or destroyed in mere seconds.

While there are many container implementations (LXC, RKT, lmctfy, etc.), Docker has emerged as a de-facto standard. In HPC circles, Singularity 2 is also popular owing to its single-image file format 3, MPI and InfiniBand support, and the avoidance of a Docker daemon on each host.

It’s worth noting that containers are not portable across binary architectures. Although Docker is the same across platforms, IBM Power Systems users will want to download Docker for Power and obtain pre-built Docker images from Power Systems repositories.

Container managers

Just as there are multiple container formats, there are also several container managers. The lines get blurry between container runtimes (Docker, Singularity, LXC, etc.) and container managers (Swarm, Kubernetes, Mesos, etc.). Your choice of container manager will depend on your needs:

  • Docker CE / Docker EE: Docker provides a free container run-time and a commercial enterprise edition. Users with simple requirements, deploying a few monolithic containers for example, may run Docker only and avoid the complexity of a full-blown container manager.
  • Kubernetes: For enterprise requirements, Kubernetes has emerged as the preferred platform for containerized applications. IBM Cloud Private (ICP) and Red Hat OpenShift are full-featured Kubernetes envrionments that run on-premises and across multiple clouds on both Intel and IBM Power Systems.
  • IBM Spectrum LSF: For HPC users embracing containers, IBM Spectrum LSF has multiple enhancements aimed at managing containerized workloads. It transparently launches and manages Docker and Singularity containers and provides HPC and AI-specific features not available in Kubernetes 4.

Regardless of how you manage containerized apps, there are some best practices to consider when building and deploying containers:

Keep control over your base images: When building application services in containers, a best practice is to use minimalist base images and consider making file systems read-only. Deploy only the services that you need in each container. Doing so reduces download time, and also reduces the “attack-surface” available to malicious actors.

Use a private registry: While public registries are convenient, relying on third-parties for images is risky. IBM provides options for serving containers including a fully managed private cloud registry and registries included with IBM Cloud Private and Red Hat OpenShift. Security can be improved with digitally signed images and continuous vulnerability scanning.

Automate your build/deployment pipeline: Automating your pipeline is an important way to improve efficiency, reliability, and security of your apps. DevOps workflows can be automated using IBM Open Toolchain and a variety of popular CI/CD tools.

[Read: IBM best practices for implementing a CI/CD secure container image pipeline for your K8s apps.]

Avoid running containers as root: Applying the concept of least privilege is another security best practice. In the application’s YAML file, define a proper securityContext 5 to avoid containers running as root. For HPC users, IBM Spectrum LSF avoids the problem where containers run as children of the Docker daemon (root by default) running containers under the user that submitted the job.

Don’t store credentials in containers: Applications frequently need credentials or tokens to access databases or web-services. It’s surprising how many developers store credentials in Pod definitions, embed them in containers, or store them on a mounted volume for convenience. A far better practice is to use Secrets 6 in Kubernetes to keep credentials separate from your application.

NVIDIA Docker is your friend: For users running GPU workloads (Tensorflow, PyTorch or Caffe), NVIDIA Docker simplifies deployment. NVIDIA Docker provides version agnostic CUDA images so that applications compiled against different CUDA libraries can share the same Docker host and underlying GPU. IBM provides freely available NVIDIA Docker images for multiple deep learning applications at DockerHub 7.

[Read also: Ensuring cross-cloud compatibility for GPU workloads.]

Regardless of how you are using and managing containers, these simple tips can help you build containerized applications that will be more reliable, more secure, and easier to scale and maintain.

 


Resources:

  1. CI/CD refers to continuous integration / continuous deployment.
  2. Learn more about Singularity at https://sylabs.io/singularity/
  3. Docker files are made up of multiple layers stored as discrete files https://docs.docker.com/v17.09/engine/userguide/storagedriver/imagesandcontainers/#container-and-layers
  4. Many examples, including GPU-aware scheduling, gang-scheduling, advanced reservations, hierarchical fairshare scheduling, SLA scheduler, time-based configuration, job dependencies, job arrays, and more.
  5. Configure a Security Context for a Pod or Container – https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
  6. Secrets in Kubernetes – https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.1/manage_applications/create_secrets.html
  7. IBM Power NVIDIA Docker images – https://hub.docker.com/r/ibmcom/powerai/
Return to Solution Channel Homepage

IBM Resources

Follow @IBMSystems

IBM Systems on Facebook

Do NOT follow this link or you will be banned from the site!
Share This