Deployment on ROCm

LLM models deployed under the Wallaroo SGLang framework can take advantage of AMD ROCm. AMD ROCm provides support for Generative AI with powerful x64 support.

AI/ML models can be deployed in centralized Wallaroo OPs instances and Edge devices on a variety of infrastructures and processors. The CPU infrastructure is set during the model upload and packaging stage.

LLMs packaged with the Wallaroo SGLang framework specified with the ROCm accelerator during the upload and automated model packaging can be deployed on Wallaroo Ops instances or multicloud deployments.

ROCm Support

For details on using AMD ROCm with Wallaroo and setting up a demonstration:

Contact your Wallaroo Support Representative OR
Schedule Your Wallaroo.AI Demo Today

Model Packaging and Deployments Prerequisites for ROCm

To upload and package a model for Wallaroo Ops or multicloud edge deployments, the following prerequisites must be met.

Wallaroo Ops
- At least one ROCm node deployed in the cluster.
Edge Devices
- Enable Edge Registry Services in the Wallaroo instance to publish the pipeline to an OCI (Open Container Initiative) registry for edge deployments.
- ROCm processor support for the edge device.

AI Workloads for ROCm via the Wallaroo SDK

The Wallaroo SDK provides ROCm support for models uploaded for Wallaroo Ops or multicloud edge deployments.

Upload Models for ROCm via the Wallaroo SDK

Models are uploaded to Wallaroo via the wallaroo.client.upload_model method. The infrastructure is set with the optional arch parameter, which accepts the wallaroo.engine_config.Acceleration object.

wallaroo.client.upload_model has the following parameters. For more details on model uploads, see Automated Model Packaging.

Parameter	Type	Description
`name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`path`	`string` (Required)	The path to the model file being uploaded.
`framework`	`string` (Required)	The framework of the model from `wallaroo.framework`
`input_schema`	`pyarrow.lib.Schema` Native Wallaroo Runtimes: (Optional) Non-Native Wallaroo Runtimes: (Required)	The input schema in Apache Arrow schema format.
`output_schema`	`pyarrow.lib.Schema` Native Wallaroo Runtimes: (Optional) Non-Native Wallaroo Runtimes: (Required)	The output schema in Apache Arrow schema format.
`convert_wait`	`bool` (Optional)	True: Waits in the script for the model conversion completion. False: Proceeds with the script without waiting for the model conversion process to display complete.
`accel`	`wallaroo.engine_config.Acceleration` (Optional)	The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step. `wallaroo.engine_config.Acceleration._None` (Default): No accelerator is assigned. This works for all infrastructures. `wallaroo.engine_config.Acceleration.ROCM`: AI acceleration for AMD ROCm

Upload Model for AMD ROCm Acceleration Example

The following demonstrates uploading a model for deployment on the AMD ROCm.

import wallaroo

# set the Wallaroo client
wl = wallaroo.Client()

# upload the model and save the reference to a variable
rocm_model = wl.upload_model(
    name="sample_model",
    path=model_file_path,
    framework=framework, # the wallaroo.framework.Framework
    input_schema = input_schema, # input schema in PyArrow Schema format
    output_schema = output_schema, # input schema in PyArrow Schema format
    accel = wallaroo.engine_config.Acceleration.ROCm
)

Deploy Models for ROCm via the Wallaroo SDK

Models are added to pipeline as pipeline steps. Models are then deployed through the wallaroo.pipeline.Pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig] = None) method.

For full details, see Pipeline Deployment Configuration.

When deploying a model in a Wallaroo Ops instance, the deployment configurations inherits the model architecture setting. No additional changes are needed to set the architecture when deploying the model. Other settings, such as the number of CPUs, etc can be changed without modifying the architecture setting.

To change the architecture or acceleration settings for model deployment, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure.

The following demonstrates deploying a generic AI/ML model with the architecture set to ROCm. For this example, the model is deployed with a pre-determined deployment configuration saved to deployment_config.

# create the pipeline
pipeline = wl.build_pipeline("sample_pipeline")

# set the pipeline model step as the model set to the ROCm accelerator

pipeline.add_model_step(rocm_model)

# deploy the pipeline with the deployment configuration

pipeline.deploy(deployment_configuration)


name	sample_pipeline
created	2024-03-05 16:18:38.768602+00:00
last_updated	2024-04-03 21:46:21.865211+00:00
deployed	True
arch	x86
accel	rocm
tags
versions	d033152c-494c-44a6-8981-627c6b6ad72e
steps	sample_model
published	False

Publish Pipeline for ROCm via the Wallaroo SDK

Publishing the pipeline to uses the method wallaroo.pipeline.publish(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]).

This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Pipeline Deployment Configuration.

The deployment configuration for the pipeline publish inherits the model’s architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s architecture settings.

Pipelines do not need to be deployed in the Wallaroo Ops instance before publishing the pipeline.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

The following demonstrates deploying the generic model uploaded earlier.

# default deployment configuration
publish = pipeline.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
display(publish)

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.

ID 15

Pipeline Name sample_pipeline

Pipeline Version d033152c-494c-44a6-8981-627c6b6ad72e

Status Published

Engine URL registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-ppc64le:v2026.1.0-main

Pipeline URL registry.example.com/uat/pipelines/sample_pipeline:d033152c-494c-44a6-8981-627c6b6ad72e

Helm Chart URL oci://registry.example.com/uat/charts/sample_pipeline

Helm Chart Reference registry.example.com/uat/charts@sha256:7e2a314d9024cc2529be3e902eb24ac241f1e0819fc07e47bf26dd2e6e64f183

Helm Chart Version 0.0.1-d033152c-494c-44a6-8981-627c6b6ad72e

Engine Config {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'rocm', 'arch': 'x86', 'gpu': True}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}

User Images []

Created By john.hummel@wallaroo.ai

Created At 2024-04-03 21:50:14.306316+00:00

Updated At 2024-04-03 21:50:14.306316+00:00

Replaces

Docker Run Command

docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=[registry.example.com/uat/pipelines/architecture-demonstration-rocm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83](https://registry.example.com/uat/pipelines/sample_pipeline:d033152c-494c-44a6-8981-627c6b6ad72e) \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-rocm:v2026.1.0-main

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.

Helm Install Command

helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/architecture-demonstration-ROCm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Tutorials

The following tutorials demonstrate deploying models on the ROCm accelerator.

Llama 8B in SGLang Framework with ROCm AI Acceleration Example