Wallaroo provides support for ML models that use GPUs. The following templates demonstrate how to create a nodepool in different cloud providers, then assign that nodepool to an existing cluster. These steps can be used in conjunction with Wallaroo Enterprise Install Guides.
Nodepool | Taints | Labels | Description |
---|---|---|---|
default | N/A | wallaroo.ai/node-purpose: general | For general Wallaroo services. No taints are applied to this nodepool to allow any process not assigned with a deployment label to run in this space. |
persistent | wallaroo.ai/persistent=true:NoSchedule | wallaroo.ai/node-purpose: persistent | For Wallaroo services with a persistentVolume settings, including JupyterHub, Minio, etc. |
pipelines-x86 | wallaroo.ai/pipelines=true:NoSchedule | wallaroo.ai/node-purpose: pipelines | For deploying pipelines for default x86 architectures. The taints and label must be applied to any nodepool used for model deployments. |
{custom} | wallaroo.ai/pipelines=true:NoSchedule | wallaroo.ai/node-purpose: pipelines {custom label} , for example: wallaroo/gpu:true | Custom named nodepools used to access non-default architectures (gpu, ARM, etc). |
The specific nodepool names may differ based on your cloud services naming requirements; check with the cloud services provider for the nodepool name requirements and adjust as needed.
The following steps add a new node to an existing OpenShift cluster. For full details, see the OpenShift guide Adding worker nodes to an on-premise cluster.
The following examples details adding adding new nodepools with CUDA compatible GPUs.
The following procedure adds a new nodepool named to pipelines-l4
an existing cluster using the ibmcloud
command line interface (CLI) using the gx3.16x80.l4
VPC flavor, which has the following specifications:
For additional flavors, see VPC flavors. Modify as needed for the organizations requirements.
The following software or runtimes are required for Wallaroo 2025.1. Most are automatically available through the supported cloud providers.
Software or Runtime | Description | Minimum Supported Version | Preferred Version(s) |
---|---|---|---|
OpenShift | Container Platform | 4.17 | 4.18 |
Kubernetes | Cluster deployment management | 1.29 with Container Management set to containerd . | 1.31 |
kubectl | Kubernetes administrative console application | 1.31 | 1.31 |
The following steps creates the nodepool with CUDA hardware using the ibmcloud
CLI.
Retrieve the following information:
VPC_ID
: The ID of the IBM Cloud® Virtual Private Cloud. Retrieved with ibmcloud oc vpcs
. For example:VPC_ID="r006-cac4bfbe-d04d-481a-a099-ba243ea64afd"
SUBNET_ID_1
: The subnet ID used. This is retrieved with the following command:ibmcloud oc subnets --provider vpc-gen2 --vpc-id <your-vpc-id> --zone <your-zone-name>
Set the environmental variables. Modify as needed to match the target cluster. The variable WORKER_POOL_NAME
is used to determine the new nodepool. This example uses pipelines-l4
.
# Set the environmental variables
CLUSTER_NAME="samplecluster"
ZONE_1="us-south-1"
SUBNET_ID_1="0717-6f46918e-2107-48ae-b023-eb053601697b"
L4_GPU_FLAVOR="gx3.16x80.l4"
# the name of the new node.
WORKER_POOL_NAME="pipelines-l4"
# the name of the acceleration label used
ACCELERATION_LABEL="l4"
The following command is used to retrieve zone details, including the subnet id.
ibmcloud oc subnets --provider vpc-gen2 --vpc-id <your-vpc-id> --zone <your-zone-name>
Create the nodepool and add it to the target cluster with the standard and the custom label.
# Create the nodepool and add it to the cluster.
ibmcloud oc worker-pool create vpc-gen2 --cluster "$CLUSTER_NAME" --name "$WORKER_POOL_NAME" --flavor "$L4_GPU_FLAVOR" --size-per-zone 1 \
--label wallaroo.ai/node-purpose=pipelines \
--label wallaroo.ai/accelerator=$ACCELERATION_LABEL
Add the nodepool to the zone and subnet.
# add to the zones and subnet
ibmcloud oc zone add vpc-gen2 --zone "$ZONE_1" --subnet-id "$SUBNET_ID_1" --cluster "$CLUSTER_NAME" --worker-pool "$WORKER_POOL_NAME"
Wait for the workers for the nodepool to be ready.
# wait for the nodepool to be ready
echo "Waiting for nodes in '$WORKER_POOL_NAME' pool to be ready before tainting..."
kubectl wait --for=condition=Ready node -l ibm-cloud.kubernetes.io/worker-pool-name=$WORKER_POOL_NAME --timeout=15m
Add the standard taints. If non-standard taints are used, modify as needed.
# apply the taints
echo "Tainting '$WORKER_POOL_NAME' pool nodes..."
kubectl taint nodes -l ibm-cloud.kubernetes.io/worker-pool-name=$WORKER_POOL_NAME wallaroo.ai/pipelines=true:NoSchedule --overwrite
kubectl taint nodes -l ibm-cloud.kubernetes.io/worker-pool-name=$WORKER_POOL_NAME nvidia.com/gpu=$ACCELERATION_LABEL:NoSchedule --overwrite
IMPORTANT NOTE: Verify that the label is communicated to developers for model deployment. Labels are required for deploying models in Wallaroo with GPUs enabled. For more details, see Deployment Configuration with the Wallaroo SDK: GPU Support
The following tutorials demonstrate deploying a pipeline with the specified architecture.