Create CUDA GPU Nodepools for Openshift Air-Gap Clusters

How to create CUDA GPU nodepools for Openshift clusters.

Wallaroo provides support for ML models that use GPUs. The following templates demonstrate how to create a nodepool in different cloud providers, then assign that nodepool to an existing cluster. These steps can be used in conjunction with Wallaroo Enterprise Install Guides.

NodepoolTaintsLabelsDescription
defaultN/Awallaroo.ai/node-purpose: generalFor general Wallaroo services. No taints are applied to this nodepool to allow any process not assigned with a deployment label to run in this space.
persistentwallaroo.ai/persistent=true:NoSchedulewallaroo.ai/node-purpose: persistentFor Wallaroo services with a persistentVolume settings, including JupyterHub, Minio, etc.
pipelines-x86wallaroo.ai/pipelines=true:NoSchedulewallaroo.ai/node-purpose: pipelinesFor deploying pipelines for default x86 architectures. The taints and label must be applied to any nodepool used for model deployments.
{custom}wallaroo.ai/pipelines=true:NoSchedulewallaroo.ai/node-purpose: pipelines
{custom label}, for example: wallaroo/gpu:true
Custom named nodepools used to access non-default architectures (gpu, ARM, etc).

The specific nodepool names may differ based on your cloud services naming requirements; check with the cloud services provider for the nodepool name requirements and adjust as needed.

Generic OpenShift Add Nodes Procedure

The following steps add a new node to an existing OpenShift cluster. For full details, see the OpenShift guide Adding worker nodes to an on-premise cluster.

Add CUDA Nodepool to Cloud Environment Examples

The following examples details adding adding new nodepools with CUDA compatible GPUs.

Add Nodepool to IBM Cloud Openshift Cluster Procedure

The following procedure adds a new nodepool named to pipelines-l4 an existing cluster using the ibmcloud command line interface (CLI) using the gx3.16x80.l4 VPC flavor, which has the following specifications:

  • 16 cores
  • 80GB memory
  • 32Gbps network speed
  • 1 L4 GPU

For additional flavors, see VPC flavors. Modify as needed for the organizations requirements.

Install Software Requirements

The following software or runtimes are required for Wallaroo 2025.1. Most are automatically available through the supported cloud providers.

Software or RuntimeDescriptionMinimum Supported VersionPreferred Version(s)
OpenShiftContainer Platform4.174.18
KubernetesCluster deployment management1.29 with Container Management set to containerd.1.31
kubectlKubernetes administrative console application1.311.31

Create the Openshift IBM Cloud CUDA Nodepool

The following steps creates the nodepool with CUDA hardware using the ibmcloud CLI.

  1. Retrieve the following information:

    • VPC_ID: The ID of the IBM Cloud® Virtual Private Cloud. Retrieved with ibmcloud oc vpcs. For example:
      • VPC_ID="r006-cac4bfbe-d04d-481a-a099-ba243ea64afd"
    • SUBNET_ID_1: The subnet ID used. This is retrieved with the following command:
      • ibmcloud oc subnets --provider vpc-gen2 --vpc-id <your-vpc-id> --zone <your-zone-name>
  2. Set the environmental variables. Modify as needed to match the target cluster. The variable WORKER_POOL_NAME is used to determine the new nodepool. This example uses pipelines-l4.

    # Set the environmental variables
    CLUSTER_NAME="samplecluster"
    ZONE_1="us-south-1"
    SUBNET_ID_1="0717-6f46918e-2107-48ae-b023-eb053601697b"
    L4_GPU_FLAVOR="gx3.16x80.l4"
    # the name of the new node.
    WORKER_POOL_NAME="pipelines-l4"
    # the name of the acceleration label used
    ACCELERATION_LABEL="l4"
    

    The following command is used to retrieve zone details, including the subnet id.

    ibmcloud oc subnets --provider vpc-gen2 --vpc-id <your-vpc-id> --zone <your-zone-name>
    
  3. Create the nodepool and add it to the target cluster with the standard and the custom label.

    # Create the nodepool and add it to the cluster.
    ibmcloud oc worker-pool create vpc-gen2 --cluster "$CLUSTER_NAME" --name "$WORKER_POOL_NAME" --flavor "$L4_GPU_FLAVOR" --size-per-zone 1 \
        --label wallaroo.ai/node-purpose=pipelines \
        --label wallaroo.ai/accelerator=$ACCELERATION_LABEL
    
  4. Add the nodepool to the zone and subnet.

    # add to the zones and subnet
    ibmcloud oc zone add vpc-gen2 --zone "$ZONE_1" --subnet-id "$SUBNET_ID_1" --cluster "$CLUSTER_NAME" --worker-pool "$WORKER_POOL_NAME"
    
  5. Wait for the workers for the nodepool to be ready.

    # wait for the nodepool to be ready
    echo "Waiting for nodes in '$WORKER_POOL_NAME' pool to be ready before tainting..."
    kubectl wait --for=condition=Ready node -l ibm-cloud.kubernetes.io/worker-pool-name=$WORKER_POOL_NAME --timeout=15m
    
  6. Add the standard taints. If non-standard taints are used, modify as needed.

    # apply the taints
    echo "Tainting '$WORKER_POOL_NAME' pool nodes..."
    kubectl taint nodes -l ibm-cloud.kubernetes.io/worker-pool-name=$WORKER_POOL_NAME wallaroo.ai/pipelines=true:NoSchedule --overwrite
    kubectl taint nodes -l ibm-cloud.kubernetes.io/worker-pool-name=$WORKER_POOL_NAME nvidia.com/gpu=$ACCELERATION_LABEL:NoSchedule --overwrite
    

    IMPORTANT NOTE: Verify that the label is communicated to developers for model deployment. Labels are required for deploying models in Wallaroo with GPUs enabled. For more details, see Deployment Configuration with the Wallaroo SDK: GPU Support

Deployment Tutorials

The following tutorials demonstrate deploying a pipeline with the specified architecture.