Deployment on ROCm


Table of Contents

LLM models deployed under the Wallaroo SGLang framework can take advantage of AMD ROCm. AMD ROCm provides support for Generative AI with powerful x64 support.

AI/ML models can be deployed in centralized Wallaroo OPs instances and Edge devices on a variety of infrastructures and processors. The CPU infrastructure is set during the model upload and packaging stage.

LLMs packaged with the Wallaroo SGLang framework specified with the ROCm accelerator during the upload and automated model packaging can be deployed on Wallaroo Ops instances or multicloud deployments.

ROCm Support

For details on using AMD ROCm with Wallaroo and setting up a demonstration:

Model Packaging and Deployments Prerequisites for ROCm

To upload and package a model for Wallaroo Ops or multicloud edge deployments, the following prerequisites must be met.

  • Wallaroo Ops
    • At least one ROCm node deployed in the cluster.
  • Edge Devices
    • Enable Edge Registry Services in the Wallaroo instance to publish the pipeline to an OCI (Open Container Initiative) registry for edge deployments.
    • ROCm processor support for the edge device.

AI Workloads for ROCm via the Wallaroo SDK

The Wallaroo SDK provides ROCm support for models uploaded for Wallaroo Ops or multicloud edge deployments.

Upload Models for ROCm via the Wallaroo SDK

Models are uploaded to Wallaroo via the wallaroo.client.upload_model method. The infrastructure is set with the optional arch parameter, which accepts the wallaroo.engine_config.Acceleration object.

wallaroo.client.upload_model has the following parameters. For more details on model uploads, see Automated Model Packaging.

ParameterTypeDescription
namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)The framework of the model from wallaroo.framework
input_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Non-Native Wallaroo Runtimes: (Required)
The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Non-Native Wallaroo Runtimes: (Required)
The output schema in Apache Arrow schema format.
convert_waitbool (Optional)
  • True: Waits in the script for the model conversion completion.
  • False: Proceeds with the script without waiting for the model conversion process to display complete.
accelwallaroo.engine_config.Acceleration (Optional)The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
  • wallaroo.engine_config.Acceleration._None (Default): No accelerator is assigned. This works for all infrastructures.
  • wallaroo.engine_config.Acceleration.ROCM: AI acceleration for AMD ROCm

Upload Model for AMD ROCm Acceleration Example

The following demonstrates uploading a model for deployment on the AMD ROCm.

import wallaroo

# set the Wallaroo client
wl = wallaroo.Client()

# upload the model and save the reference to a variable
rocm_model = wl.upload_model(
    name="sample_model",
    path=model_file_path,
    framework=framework, # the wallaroo.framework.Framework
    input_schema = input_schema, # input schema in PyArrow Schema format
    output_schema = output_schema, # input schema in PyArrow Schema format
    accel = wallaroo.engine_config.Acceleration.ROCm
)

Deploy Models for ROCm via the Wallaroo SDK

Models are added to pipeline as pipeline steps. Models are then deployed through the wallaroo.pipeline.Pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig] = None) method.

For full details, see Pipeline Deployment Configuration.

When deploying a model in a Wallaroo Ops instance, the deployment configurations inherits the model architecture setting. No additional changes are needed to set the architecture when deploying the model. Other settings, such as the number of CPUs, etc can be changed without modifying the architecture setting.

To change the architecture or acceleration settings for model deployment, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure.

The following demonstrates deploying a generic AI/ML model with the architecture set to ROCm. For this example, the model is deployed with a pre-determined deployment configuration saved to deployment_config.

# create the pipeline
pipeline = wl.build_pipeline("sample_pipeline")

# set the pipeline model step as the model set to the ROCm accelerator

pipeline.add_model_step(rocm_model)

# deploy the pipeline with the deployment configuration

pipeline.deploy(deployment_configuration)
  
namesample_pipeline
created2024-03-05 16:18:38.768602+00:00
last_updated2024-04-03 21:46:21.865211+00:00
deployedTrue
archx86
accelrocm
tags
versionsd033152c-494c-44a6-8981-627c6b6ad72e
stepssample_model
publishedFalse

Publish Pipeline for ROCm via the Wallaroo SDK

Publishing the pipeline to uses the method wallaroo.pipeline.publish(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]).

This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Pipeline Deployment Configuration.

The deployment configuration for the pipeline publish inherits the model’s architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s architecture settings.

Pipelines do not need to be deployed in the Wallaroo Ops instance before publishing the pipeline.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

The following demonstrates deploying the generic model uploaded earlier.

# default deployment configuration
publish = pipeline.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
display(publish)

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.
ID15
Pipeline Namesample_pipeline
Pipeline Versiond033152c-494c-44a6-8981-627c6b6ad72e
StatusPublished
Engine URLregistry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-ppc64le:v2026.1.0-main
Pipeline URLregistry.example.com/uat/pipelines/sample_pipeline:d033152c-494c-44a6-8981-627c6b6ad72e
Helm Chart URLoci://registry.example.com/uat/charts/sample_pipeline
Helm Chart Referenceregistry.example.com/uat/charts@sha256:7e2a314d9024cc2529be3e902eb24ac241f1e0819fc07e47bf26dd2e6e64f183
Helm Chart Version0.0.1-d033152c-494c-44a6-8981-627c6b6ad72e
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'rocm', 'arch': 'x86', 'gpu': True}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-03 21:50:14.306316+00:00
Updated At2024-04-03 21:50:14.306316+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=[registry.example.com/uat/pipelines/architecture-demonstration-rocm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83](https://registry.example.com/uat/pipelines/sample_pipeline:d033152c-494c-44a6-8981-627c6b6ad72e) \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-rocm:v2026.1.0-main

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/architecture-demonstration-ROCm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Tutorials

The following tutorials demonstrate deploying models on the ROCm accelerator.