Deployment on ROCm
Table of Contents
LLM models deployed under the Wallaroo SGLang framework can take advantage of AMD ROCm. AMD ROCm provides support for Generative AI with powerful x64 support.
AI/ML models can be deployed in centralized Wallaroo OPs instances and Edge devices on a variety of infrastructures and processors. The CPU infrastructure is set during the model upload and packaging stage.
LLMs packaged with the Wallaroo SGLang framework specified with the ROCm accelerator during the upload and automated model packaging can be deployed on Wallaroo Ops instances or multicloud deployments.
ROCm Support
For details on using AMD ROCm with Wallaroo and setting up a demonstration:
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today
Model Packaging and Deployments Prerequisites for ROCm
To upload and package a model for Wallaroo Ops or multicloud edge deployments, the following prerequisites must be met.
- Wallaroo Ops
- At least one ROCm node deployed in the cluster.
- Edge Devices
- Enable Edge Registry Services in the Wallaroo instance to publish the pipeline to an OCI (Open Container Initiative) registry for edge deployments.
- ROCm processor support for the edge device.
AI Workloads for ROCm via the Wallaroo SDK
The Wallaroo SDK provides ROCm support for models uploaded for Wallaroo Ops or multicloud edge deployments.
Upload Models for ROCm via the Wallaroo SDK
Models are uploaded to Wallaroo via the wallaroo.client.upload_model method. The infrastructure is set with the optional arch parameter, which accepts the wallaroo.engine_config.Acceleration object.
wallaroo.client.upload_model has the following parameters. For more details on model uploads, see Automated Model Packaging.
| Parameter | Type | Description |
|---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | The framework of the model from wallaroo.framework |
input_schema | pyarrow.lib.Schema
| The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema
| The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) |
|
accel | wallaroo.engine_config.Acceleration (Optional) | The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
|
Upload Model for AMD ROCm Acceleration Example
The following demonstrates uploading a model for deployment on the AMD ROCm.
import wallaroo
# set the Wallaroo client
wl = wallaroo.Client()
# upload the model and save the reference to a variable
rocm_model = wl.upload_model(
name="sample_model",
path=model_file_path,
framework=framework, # the wallaroo.framework.Framework
input_schema = input_schema, # input schema in PyArrow Schema format
output_schema = output_schema, # input schema in PyArrow Schema format
accel = wallaroo.engine_config.Acceleration.ROCm
)
Deploy Models for ROCm via the Wallaroo SDK
Models are added to pipeline as pipeline steps. Models are then deployed through the wallaroo.pipeline.Pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig] = None) method.
For full details, see Pipeline Deployment Configuration.
When deploying a model in a Wallaroo Ops instance, the deployment configurations inherits the model architecture setting. No additional changes are needed to set the architecture when deploying the model. Other settings, such as the number of CPUs, etc can be changed without modifying the architecture setting.
To change the architecture or acceleration settings for model deployment, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure.
The following demonstrates deploying a generic AI/ML model with the architecture set to ROCm. For this example, the model is deployed with a pre-determined deployment configuration saved to deployment_config.
# create the pipeline
pipeline = wl.build_pipeline("sample_pipeline")
# set the pipeline model step as the model set to the ROCm accelerator
pipeline.add_model_step(rocm_model)
# deploy the pipeline with the deployment configuration
pipeline.deploy(deployment_configuration)
| name | sample_pipeline |
| created | 2024-03-05 16:18:38.768602+00:00 |
| last_updated | 2024-04-03 21:46:21.865211+00:00 |
| deployed | True |
| arch | x86 |
| accel | rocm |
| tags | |
| versions | d033152c-494c-44a6-8981-627c6b6ad72e |
| steps | sample_model |
| published | False |
Publish Pipeline for ROCm via the Wallaroo SDK
Publishing the pipeline to uses the method wallaroo.pipeline.publish(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]).
This requires that the Wallaroo Ops instance have Edge Registry Services enabled.
A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Pipeline Deployment Configuration.
The deployment configuration for the pipeline publish inherits the model’s architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s architecture settings.
Pipelines do not need to be deployed in the Wallaroo Ops instance before publishing the pipeline.
For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.
The following demonstrates deploying the generic model uploaded earlier.
# default deployment configuration
publish = pipeline.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
display(publish)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.
| ID | 15 | |
| Pipeline Name | sample_pipeline | |
| Pipeline Version | d033152c-494c-44a6-8981-627c6b6ad72e | |
| Status | Published | |
| Engine URL | registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-ppc64le:v2026.1.0-main | |
| Pipeline URL | registry.example.com/uat/pipelines/sample_pipeline:d033152c-494c-44a6-8981-627c6b6ad72e | |
| Helm Chart URL | oci://registry.example.com/uat/charts/sample_pipeline | |
| Helm Chart Reference | registry.example.com/uat/charts@sha256:7e2a314d9024cc2529be3e902eb24ac241f1e0819fc07e47bf26dd2e6e64f183 | |
| Helm Chart Version | 0.0.1-d033152c-494c-44a6-8981-627c6b6ad72e | |
| Engine Config | {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'rocm', 'arch': 'x86', 'gpu': True}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}} | |
| User Images | [] | |
| Created By | john.hummel@wallaroo.ai | |
| Created At | 2024-04-03 21:50:14.306316+00:00 | |
| Updated At | 2024-04-03 21:50:14.306316+00:00 | |
| Replaces | ||
| Docker Run Command |
Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables. | |
| Helm Install Command |
Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE,
OCI_USERNAME, and OCI_PASSWORD environment variables. |
Tutorials
The following tutorials demonstrate deploying models on the ROCm accelerator.