Wallaroo provides support for ML models that use GPUs. The following templates demonstrate how to create a nodepool in different cloud providers, then assign that nodepool to an existing cluster. These steps can be used in conjunction with Wallaroo Enterprise Install Guides.
Note that deploying pipelines with GPU support is only available for Wallaroo Enterprise.
For standard Wallaroo installations, GPU nodepools must include the following taints and labels:
Taint | Label |
---|---|
wallaroo.ai/pipelines=true:NoSchedule | wallaroo.ai/node-purpose: pipelines {custom label} , for example: wallaroo/gpu:true |
For custom tolerations and labels, see the Taints and Tolerations Guide.
The following script creates a nodepool with NVidia Tesla K80 gpu using the Standard_NC6 machine type and autoscales from 0-3 nodes. Each node has one GPU in this example so the max .gpu()
that can be requested by a pipeline step is 1.
For detailed steps on adding GPU to a cluster, see Microsoft Azure Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS) guide.
Note that the labels
are required as part of the Wallaroo pipeline deployment with GPU support. The label below is an example, but a label must be provided.
RESOURCE_GROUP="YOUR RESOURCE GROUP"
CLUSTER_NAME="YOUR CLUSTER NAME"
GPU_NODEPOOL_NAME="YOUR GPU NODEPOOL NAME"
az extension add --name aks-preview
az extension update --name aks-preview
az feature register --namespace "Microsoft.ContainerService" --name "GPUDedicatedVHDPreview"
az provider register -n Microsoft.ContainerService
az aks nodepool add \
--resource-group $RESOURCE_GROUP \
--cluster-name $CLUSTER_NAME \
--name $GPU_NODEPOOL_NAME \
--node-count 0 \
--node-vm-size Standard_NC6 \
--node-taints "wallaroo.ai/pipelines=true:NoSchedule" \
--labels wallaroo.ai/node-purpose=pipelines \
--aks-custom-headers UseGPUDedicatedVHD=true \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 3 \
The following script creates a nodepool uses NVidia T4 GPUs and autoscales from 0-3 nodes. Each node has one GPU in this example so the max .gpu()
that can be requested by a pipeline step is 1.
Google GKE automatically adds the following taint to the created nodepool.
NO_SCHEDULE nvidia.com/gpu present
Note that the labels
are required as part of the Wallaroo pipeline deployment with GPU support. The label below is an example, but a label must be provided.
gcloud container \
--project $GCP_PROJECT \
node-pools create $GPU_NODEPOOL_NAME \
--cluster $GCP_CLUSTER \
--region $REGION \
--machine-type "a2-ultragpu-1g" \
--accelerator "type=nvidia-tesla-a100,count=1,gpu-driver-version=default" \
--image-type "COS_CONTAINERD" \
--disk-type "pd-balanced" \
--disk-size "100" \
--node-labels wallaroo.ai/node-purpose=pipelines {add custom label here} \ # node-purpose is a required label; custom label - at least one custom label is required
--node-taints=wallaroo.ai/pipelines=true:NoSchedule \
--metadata disable-legacy-endpoints=true \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--num-nodes "1" \
--enable-autoscaling \
--min-nodes "0" \
--max-nodes "1" \
--location-policy "BALANCED" \
--enable-autoupgrade \
--enable-autorepair \
--max-surge-upgrade 1 \
--max-unavailable-upgrade 0
The following steps are used to create a AWS EKS Nodepool with GPU nodes.
kubectl
commands.labels
are required as part of the Wallaroo pipeline deployment with GPU support. The label below is an example, but a label must be provided.g5.2xlarge
. Modify as required.eksctl create nodegroup --config-file=<path>
Sample config file:
# aws-gpu-nodepool.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: YOUR CLUSTER NAME HERE # This must match the name of the existing cluster
region: YOUR REGION HERE
managedNodeGroups:
- name: YOUR NODEPOOL NAME HERE
instanceType: g5.2xlarge
minSize: 1
maxSize: 3
labels:
wallaroo.ai/node-purpose: "pipelines"
wallaroo-gpu-label: "true"
taints:
- key: wallaroo.ai/pipelines
value: "true"
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine
k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
iam:
withAddonPolicies:
autoScaler: true
containerRuntime: containerd
amiFamily: AmazonLinux2
availabilityZones:
- INSERT YOUR ZONE HERE
volumeSize: 100
The following tutorials demonstrate deploying a pipeline with the specified architecture.