Large Language Models Infrastructure Requirements
Table of Contents
The following details how to set up the Kubernetes infrastructure for Large Language Model (LLM) packaging and deployments.
For access to these sample models and a demonstration on using LLMs with Wallaroo:
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today
For LLMs, the main infrastructure considerations are:
- Ephemeral Storage: Ephemeral storage is the RAM needed for the initial LLM upload and packaging for deployment. Typically this is assigned to the nodepool labeled
general
.- Recommended RAM:
250 Gi
.
- Recommended RAM:
- Instance Type with GPU and AI Accelerators: The nodes the LLMs are deployed to typically GPUs and other AI accelerators to inference performance in near real time.
The following configuration options are based on the Wallaroo Infrastructure Configuration Guides with modifications for the heavy requirements LLMs can bring.
For full details on setting up a Kubernetes cluster and installing Wallaroo, see the Wallaroo Install guides.
Nodepools Explained
The Kubernetes cluster hosting the Wallaroo instance typically has the following nodepools:
general
: Thegeneral
nodepool is where most Wallaroo services run, including:- Model packaging
- Wallaroo dashboard services
- Other nodepools: Other nodepools can be configured to run specific Wallaroo services. For deploying LLM models, the provided Cloud Configurations detail how to create specific nodepools optimized for LLM model deployments.
Quantized LLM Nodepools Required Taints and Tolerations
Nodepools hosting LLM deployments require the following Kubernetes taints and labels.
Taint | Label |
---|---|
wallaroo.ai/pipelines=true:NoSchedule | wallaroo.ai/node-purpose: pipelines |
For the examples provided in Cloud Configurations include these taints as part of the configuration details.
GPU Enabled LLM Nodepool Taints and Labels
Nodepools set up for LLM model deployment include a the following taints and labels to ensure the models are deployed to the correct nodepool and provide them with the best resources for their service requirements. For nodepools with GPUs, custom deployment labels are a required part of the model’s deployment configuration.
Taint | Label |
---|---|
wallaroo.ai/pipelines=true:NoSchedule | wallaroo.ai/node-purpose: pipelines (Required)wallaroo/gpu:true |
For the examples provided in Cloud Configurations include sample labels as part of the configuration details.
Cloud Configurations
The following details how to configure a Wallaroo installation for different cloud platforms for LLM deployments. For the general
nodepool, these are modifications to the Wallaroo Install guides, and are best performed before installing Wallaroo.
The GPU based nodepools are added at any time. It is recommended to add them during the initial install process. Note that GPU nodepools requires two labels:
wallaroo.ai/node-purpose: pipelines
(Required)- Any custom label; these are used for deploying LLMs to specify the nodepool to used.
For details on how to set deployment configurations for model deployment, see the Wallaroo Deployment Configuration guide.
Amazon Web Services
For deployments of LLMs in Amazon Web Services (AWS), the following configuration is recommended. Modify depending on your particular requirements.
- Ephemeral Storage: 250 GB
- Recommended Instance Type: P3.16xlarge
The following GPU nodepool and general configurations are for Amazon eksctl deployments.
AWS GPU Nodepool Sample
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: YOUR CLUSTER NAME HERE # This must match the name of the existing cluster
region: YOUR REGION HERE
managedNodeGroups:
- name: YOUR NODEPOOL NAME HERE
instanceType: p3.16xlarge
minSize: 0
maxSize: 1
labels:
wallaroo.ai/node-purpose: "pipelines" # required label
wallaroo.ai/accelerator: "a100" # custom label - at least one custom label is required
taints:
- key: wallaroo.ai/pipelines
value: "true"
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: pipelines
k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
iam:
withAddonPolicies:
autoScaler: true
containerRuntime: containerd
amiFamily: AmazonLinux2
availabilityZones:
- INSERT YOUR ZONE HERE
volumeSize: 100
AWS General Nodepool Sample
- name: general
instanceType: m5.2xlarge
desiredCapacity: 3
volumeSize: 250
containerRuntime: containerd
amiFamily: AmazonLinux2
availabilityZones:
- us-east-1a
labels:
wallaroo.ai/node-purpose: general
Microsoft Azure
For deployments of LLMs in Microsoft Azure, the following configuration is recommended. Modify depending on your particular requirements.
- Ephemeral Storage: 250 GB
- Recommended Instance Type: NC24ADS-v4
The following GPU nodepool and mainpool configurations are based on using the Azure Command-Line Interface (CLI).
Azure GPU Nodepool Sample
RESOURCE_GROUP="YOUR RESOURCE GROUP"
CLUSTER_NAME="YOUR CLUSTER NAME"
GPU_NODEPOOL_NAME="YOUR GPU NODEPOOL NAME"
az extension add --name aks-preview
az extension update --name aks-preview
az feature register --namespace "Microsoft.ContainerService" --name "GPUDedicatedVHDPreview"
az provider register -n Microsoft.ContainerService
az aks nodepool add \
--resource-group $RESOURCE_GROUP \
--cluster-name $CLUSTER_NAME \
--name $GPU_NODEPOOL_NAME \
--node-count 0 \
--node-vm-size Standard_NC24ads_A100_v4\
--node-taints wallaroo.ai/pipelines=true:NoSchedule \
--aks-custom-headers UseGPUDedicatedVHD=true \
--enable-cluster-autoscaler \
--min-count 0 \
--max-count 1 \
--labels wallaroo.ai/node-purpose=pipelines {add custom label here} # node-purpose is a required label; custom label - at least one custom label is required
Azure Mainpool Nodepool Sample
az aks create \
--resource-group $WALLAROO_RESOURCE_GROUP \
--name $WALLAROO_CLUSTER \
--node-count 3 \
--generate-ssh-keys \
--vm-set-type VirtualMachineScaleSets \
--load-balancer-sku standard \
--node-vm-size $WALLAROO_VM_SIZE \
--node-osdisk-size 250
--nodepool-name general \
--attach-acr $WALLAROO_CONTAINER_REGISTRY \
--kubernetes-version=1.30 \
--zones 1 \
--location $WALLAROO_GROUP_LOCATION
--nodepool-labels wallaroo.ai/node-purpose=general
Google Cloud Platform
- Ephemeral Storage: 250 GB
- Recommended Instance Type:
The following GPU nodepool and mainpool configurations are based on using the Google gcloud Command Line Interface (CLI).
GCP GPU Nodepool Sample
Before setting up the GCP GPU nodepool, install the Nvidia drivers to the Kubernetes cluster with the following command.
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded-latest.yaml
GCP_PROJECT="YOUR GCP PROJECT"
GCP_CLUSTER="YOUR CLUSTER NAME"
GPU_NODEPOOL_NAME="YOUR GPU NODEPOOL NAME"
REGION="YOUR REGION"
gcloud container \
--project $GCP_PROJECT \
node-pools create $GPU_NODEPOOL_NAME \
--cluster $GCP_CLUSTER \
--region $REGION \
--machine-type "a2-ultragpu-1g" \
--accelerator "type=nvidia-tesla-a100,count=1, gpu-driver-version=default" \
--image-type "COS_CONTAINERD" \
--disk-type "pd-balanced" \
--disk-size "100" \
--node-labels wallaroo.ai/node-purpose=pipelines {add custom label here} \ # node-purpose is a required label; custom label - at least one custom label is required
--node-taints=wallaroo.ai/pipelines=true:NoSchedule \
--metadata disable-legacy-endpoints=true \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--num-nodes "1" \
--enable-autoscaling \
--min-nodes "0" \
--max-nodes "1" \
--location-policy "BALANCED" \
--enable-autoupgrade \
--enable-autorepair \
--max-surge-upgrade 1 \
--max-unavailable-upgrade 0
GCP Mainpool Nodepool Sample
gcloud container clusters \
create $WALLAROO_CLUSTER \
--region $WALLAROO_GCP_REGION \
--node-locations $WALLAROO_NODE_LOCATION \
--machine-type $DEFAULT_VM_SIZE \
--disk-size 250 \
--network $WALLAROO_GCP_NETWORK_NAME \
--create-subnetwork name=$WALLAROO_GCP_SUBNETWORK_NAME \
--enable-ip-alias \
--labels=wallaroo.ai/node-purpose=general \
--cluster-version=1.30