Inference with Acceleration Libraries

How to use package models to run with hardware accelerators

Wallaroo supports deploying models that run with hardware accelerators that increase the inference speed and performance.

Hardware accelerators are set during model uploads, which set what hardware is allocated during model deployment in Wallaroo Ops and multicloud deployments.

Prerequisites

The following prerequisites must be met before uploading and deploying models with hardware accelerators.

Supported Accelerators

The following accelerators are supported:

AcceleratorARM SupportX64/X86 SupportDescription
NoneThe default acceleration, used for all scenarios and architectures.
AIOXAIO acceleration for Ampere Optimized trained models, only available with ARM processors.
JetsonXNvidia Jetson acceleration is a CUDA based accelerator used with edge deployments on ARM processors.
CUDANvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

Set Accelerator via the Wallaroo SDK

Accelerators are set during model upload and packaging. To change the accelerator used with a model, re-upload the model with the new accelerator setting for maximum compatibility and support.

Model Accelerator at Model Upload

Models uploaded to Wallaroo have the accelerator set via the wallaroo.client.upload_model method’s Optional accel: wallaroo.engine_config.Acceleration parameter. For more details on model uploads and other parameters, see Automated Model Packaging.

ParameterTypeDescription
namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)The framework of the model from wallaroo.framework
input_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Non-Native Wallaroo Runtimes: (Required)
The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Non-Native Wallaroo Runtimes: (Required)
The output schema in Apache Arrow schema format.
convert_waitbool (Optional)
  • True: Waits in the script for the model conversion completion.
  • False: Proceeds with the script without waiting for the model conversion process to display complete.
archwallaroo.engine_config.Architecture (Optional)The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include:
  • X86 (Default): x86 based architectures.
  • ARM: ARM based architectures.
accelwallaroo.engine_config.Acceleration (Optional)The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
  • wallaroo.engine_config.Acceleration._None (Default): No accelerator is assigned. This works for all infrastructures.
  • wallaroo.engine_config.Acceleration.AIO: AIO acceleration for Ampere Optimized trained models, only available with ARM processors.
  • wallaroo.engine_config.Acceleration.Jetson: Nvidia Jetson acceleration used with edge deployments with ARM processors.
  • wallaroo.engine_config.Acceleration.CUDA: Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

Deploy Models with Accelerator Settings

Models are added to pipeline as pipeline steps. Models are then deployed through the wallaroo.pipeline.Pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig] = None) method.

For full details, see Pipeline Deployment Configuration.

When deploying a model, the deployment configurations inherits the model accelerator setting. No additional changes are needed to set the accelerator when deploying the model. Other settings, such as the number of CPUs, etc can be changed without modifying the accelerator setting.

To change the accelerator settings, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure. For more information on uploading models or new model versions, see Model Accelerator at Model Upload.

Model Accelerator for Pipeline Publish

Publishing the pipeline to uses the method wallaroo.pipeline.publish(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]).

This requires that the Wallaroo Ops has Edge Registry Services enabled.

The deployment configuration for the pipeline publish inherits the model’s accelerator. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator settings.

A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration.

Pipelines do not need to be deployed in the Wallaroo Ops instance before publishing the pipeline. This is useful in multicloud deployments to edge devices with different hardware accelerators than the Wallaroo Ops instance.

To change the model acceleration settings, upload the model as a new model or model version with the new acceleration settings.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

Model Deployment with Accelerators Via the Wallaroo SDK Examples

The following demonstrates the following:

  • Upload two Resnet50 models.
    • One set to ARM with the AIO accelerator.
    • One set to ARM with the Jetson accelerator.
  • Create two pipelines and add each model as a pipeline step. Deploy each pipeline, and demonstrate the deployment configuration inherits the model’s acceleration settings.
  • Publish each pipeline and demonstrate the deployment configuration for the pipeline publish inherits the model’s acceleration settings.

Resnet 50 Model for ARM and AIO

For this example, we will use a Resnet50 Computer Vision model.

We upload the model and set the architecture to ARM and the accelerator to AIO.

model_name_arm = 'computer-vision-resnet50-arm'
model_file_name = './models/frcnn-resnet.pt.onnx'

arm_model = wl.upload_model(model_name_arm, 
                        model_file_name, 
                        framework=Framework.ONNX,
                        arch=wallaroo.engine_config.Architecture.ARM,
                        accel=wallaroo.engine_config.Acceleration.AIO)
display(arm_model)
Namecomputer-vision-resnet50-arm
Version47743b5f-c88a-4150-a37f-9ad591eb4ee3
File Namefrcnn-resnet.pt.onnx
SHA43326e50af639105c81372346fb9ddf453fea0fe46648b2053c375360d9c1647
Statusready
Image PathNone
Architecturearm
Accelerationaio
Updated At2024-03-Apr 22:13:40

We then build the pipeline, add the model as our model step, and deploy it with a deployment configuration that allocates 1 CPU and 2Gi of RAM. We then show the deployment configuration inherited the model’s acceleration settings.

pipeline_arm = wl.build_pipeline('acceleration-demonstration-arm')
pipeline_arm.clear()
pipeline_arm.add_model_step(arm_model)

deployment_config = wallaroo.DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(1) \
    .memory("2Gi") \
    .build()

pipeline_arm.deploy(deployment_config = deployment_config)
display(pipeline_arm)
nameacceleration-demonstration-arm
created2024-04-01 18:36:26.347071+00:00
last_updated2024-04-03 22:14:42.912284+00:00
deployedTrue
archarm
accelaio
tags
versions18329c99-4b9c-4a15-bc93-42e4d6b93fff, 2f1aa87e-edc2-4af7-8821-00ba54abf18e, 4c8ab1b1-f9c8-49d9-846a-54cad3a18b56, cbc520f2-5755-4f6b-8e89-b4374cb95fdf, 59ff6719-67f1-4359-a6b3-5565b9f6dc09, 39b91147-3b73-4f1a-a25f-500ef648bd6a, 45c0e8ba-b35d-4139-9675-aa7ffcc04dfc, 2d561d88-31f6-43c3-a84d-38cc1cd53fb8, ef9e2394-e29f-46dc-aaa4-eda0a304a71e, fe2b6f05-3623-4440-8258-9e5828bc7eaf, aa86387c-813a-40de-b07a-baf388e20d67
stepscomputer-vision-resnet50-arm
publishedTrue

We now publish the pipeline. Note that the Engine Config inherited the acceleration from the model.

# default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
display(pub_arm)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.
ID87
Pipeline Nameacceleration-demonstration-arm
Pipeline Version890b56ee-2a0e-4ed1-ae96-c021ca801a7e
StatusPublished
Engine URLregistry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870
Pipeline URLregistry.example.com/uat/pipelines/acceleration-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e
Helm Chart URLoci://registry.example.com/uat/charts/acceleration-demonstration-arm
Helm Chart Referenceregistry.example.com/uat/charts@sha256:15c50483f2010e2691d32d32ded595f20993fa7b043474962b0fa2b509b61510
Helm Chart Version0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'aio', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-03 22:17:03.122597+00:00
Updated At2024-04-03 22:17:03.122597+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=registry.example.com/uat/pipelines/acceleration-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/acceleration-demonstration-arm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

HF Summarizer with CUDA

For this example, we will use a Hugging Face summarizer model. When uploaded, this model is converted to the Wallaroo Containerized Runtime.

We upload the model and set the accelerator to CUDA.

model = wl.upload_model('hf-summarization', 
                        './models/hf-summarisation-bart-large-samsun.zip', 
                        framework=Framework.HUGGING_FACE_SUMMARIZATION, 
                        input_schema=input_schema, 
                        output_schema=output_schema,
                        accel=wallaroo.engine_config.Acceleration.CUDA)
display(model)
Namehf-summarization
Version47743b5f-c88a-4150-a37f-9ad591eb4ee3
File Namehf-summarisation-bart-large-samsun.zip
SHAee71d066a83708e7ca4a3c07caf33fdc528bb000039b6ca2ef77fa2428dc6268
Statusready
Image Pathproxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.1.0-main-4921
Architecturex86
Accelerationcuda
Updated At2024-03-Apr 22:13:40

We then build the pipeline, add the model as our model step, and deploy it with a deployment configuration that allocates 4 cpus, 4 GI RAM and 1 GPU to the Containerized Runtime. We then show the deployment configuration inherited the model’s acceleration settings from the model without being specified

deployment_config = DeploymentConfigBuilder() \
    .cpus(1).memory('1Gi') \
    .sidekick_gpus(model, 1) \
    .sidekick_cpus(model,4) \
    .sidekick_memory(model, '8Gi') \
    .deployment_label('wallaroo.ai/accelerator: a100') \
    .build()
display(deployment_config)

{'engine': {'cpu': 1,
  'resources': {'limits': {'cpu': 1, 'memory': '1Gi'},
   'requests': {'cpu': 1, 'memory': '1Gi'}},
  'node_selector': 'wallaroo.ai/accelerator: a100',
  'arch': 'x86',
  'accel': 'cuda'},
 'enginelb': {},
 'engineAux': {'images': {'hf-summarization-yns-65': {'resources': {'limits': {'nvidia.com/gpu': 1,
      'cpu': 4,
      'memory': '8Gi'},
     'requests': {'nvidia.com/gpu': 1, 'cpu': 4, 'memory': '8Gi'}}}}},
 'node_selector': {}}

We now publish the pipeline. Note that the Engine Config inherited the acceleration from the model.

# default deployment configuration
pub_arm = pipeline.publish(deployment_config)
display(pub_arm)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.
ID1
Pipeline Namehf-summarization-pipeline
Pipeline Version6d453276-a4cf-4b01-90d7-78e9da1dd72a
StatusPublished
Engine URLghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-cuda:v2024.1.0-main-4921
Pipeline URLghcr.io/wallaroolabs/doc-samples/pipelines/hf-summarization-pipeline:6d453276-a4cf-4b01-90d7-78e9da1dd72a
Helm Chart URLoci://ghcr.io/wallaroolabs/doc-samples/charts/hf-summarization-pipeline
Helm Chart Referenceghcr.io/wallaroolabs/doc-samples/charts@sha256:a9406689f7429c16758447780c860ee41c78dc674280754eb2b377da1a9efbf4
Helm Chart Version0.0.1-6d453276-a4cf-4b01-90d7-78e9da1dd72a
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'cuda', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byjohn.hansarick@wallaroo.ai
Created At2024-04-17 19:49:32.922418+00:00
Updated At2024-04-17 19:49:32.922418+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/hf-summarization-pipeline:6d453276-a4cf-4b01-90d7-78e9da1dd72a \
    -e CONFIG_CPUS=1 ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-cuda:v2024.1.0-main-4921

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://ghcr.io/wallaroolabs/doc-samples/charts/hf-summarization-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-6d453276-a4cf-4b01-90d7-78e9da1dd72a \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Model Accelerator Deployment Troubleshooting

If the specified hardware accelerator or infrastructure is not available in the Wallaroo Ops cluster during deployment, the following error message is displayed:

Tutorials

The following examples are available to demonstrate uploading and publishing models with hardware accelerator support.