Inference on ARM Architecture

How to deploy ML models with ARM processors and infrastructure.

Table of Contents

ML models can be deployed in centralized Wallaroo OPs instances and Edge devices on a variety of infrastructures and processors. The CPU infrastructure is set during the model upload and packaging stage.

Models specified with the ARM architecture during the upload and automated model packaging can be deployed on Wallaroo Ops instances or multicloud deployments.

ARM Support from Cloud Providers

ARM processors for Kubernetes clusters in Cloud environments are supported by the following providers.

Model Packaging and Deployments Prerequisites for ARM

To upload and package a model for Wallaroo Ops or multicloud edge deployments, the following prerequisites must be met.

  • Wallaroo Ops
  • Edge Devices
    • Enable Edge Registry Services in the Wallaroo instance to publish the pipeline to an OCI registry for edge deployments.
    • ARM processor support for the edge device.

AI Workloads for ARM via the Wallaroo SDK

The Wallaroo SDK provides ARM support for models uploaded for Wallaroo Ops or multicloud edge deployments.

Upload Models for ARM via the Wallaroo SDK

Models are uploaded to Wallaroo via the wallaroo.client.upload_model method. The infrastructure is set with the optional arch parameter, which accepts the wallaroo.engine_config.Architecture object.

wallaroo.client.upload_model has the following parameters. For more details on model uploads, see Automated Model Packaging.

ParameterTypeDescription
namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)The framework of the model from wallaroo.framework
input_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Non-Native Wallaroo Runtimes: (Required)
The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Non-Native Wallaroo Runtimes: (Required)
The output schema in Apache Arrow schema format.
convert_waitbool (Optional)
  • True: Waits in the script for the model conversion completion.
  • False: Proceeds with the script without waiting for the model conversion process to display complete.
archwallaroo.engine_config.Architecture (Optional)The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include:
  • X86 (Default): x86 based architectures.
  • ARM: ARM based architectures.
accelwallaroo.engine_config.Acceleration (Optional)The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
  • wallaroo.engine_config.Acceleration._None (Default): No accelerator is assigned. This works for all infrastructures.
  • wallaroo.engine_config.Acceleration.AIO: AIO acceleration for Ampere Optimized trained models, only available with ARM processors.
  • wallaroo.engine_config.Acceleration.Jetson: Nvidia Jetson acceleration used with multicloud edge deployments with ARM processors.
  • wallaroo.engine_config.Acceleration.CUDA: Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

Deploy Models for ARM via the Wallaroo SDK

Models are added to pipeline as pipeline steps. Models are then deployed through the wallaroo.pipeline.Pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig] = None) method.

For full details, see Pipeline Deployment Configuration.

When deploying a model in a Wallaroo Ops instance, the deployment configurations inherits the model architecture setting. No additional changes are needed to set the architecture when deploying the model. Other settings, such as the number of CPUs, etc can be changed without modifying the architecture setting.

To change the architecture settings for model deployment, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure. For more information on uploading models or new model versions, see Upload Models for ARM via the Wallaroo SDK.

Publish Pipeline for ARM via the Wallaroo SDK

Publishing the pipeline to uses the method wallaroo.pipeline.publish(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]).

This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Pipeline Deployment Configuration.

The deployment configuration for the pipeline publish inherits the model’s architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s architecture settings.

Pipelines do not need to be deployed in the Wallaroo Ops instance before publishing the pipeline.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

Model Deployment on ARM Via the Wallaroo SDK Examples

The following examples demonstrates:

  • Uploading a model for packaging in the Wallaroo Ops instance with the arch set to ARM.
  • Creating a pipeline and adding the model as a pipeline step.
  • Deploying the pipeline and demonstrating the deployment configuration inherits the model’s architecture setting.
  • Publishing the pipeline to an OCI registry and demonstrating the deployment configuration inherits the model’s architecture setting.

Note that the arch and accel deployment configuration settings are not specified, as the deployment configuration inherits the model’s architecture settings.

Model Deployment on ARM Examples: Hugging Face Summarization Model

First we demonstrate uploading a Hugging Face Summarization model for ARM processor deployment. Note the arch setting is set to wallaroo.engine_config.Architecture.ARM.

input_schema = pa.schema([
    pa.field('inputs', pa.string()),
    pa.field('return_text', pa.bool_()),
    pa.field('return_tensors', pa.bool_()),
    pa.field('clean_up_tokenization_spaces', pa.bool_()),
    # pa.field('generate_kwargs', pa.map_(pa.string(), pa.null())), # dictionaries are not currently supported by the engine
])

output_schema = pa.schema([
    pa.field('summary_text', pa.string()),
])


model_name_arm = f'hf-summarizer-arm'
model_file_name = './models/hf_summarization.zip'

model_arm = wl.upload_model(model_name_arm, 
                        model_file_name, 
                        framework=wallaroo.framework.Framework.HUGGING_FACE_SUMMARIZATION, 
                        input_schema=input_schema, 
                        output_schema=output_schema,
                        arch=wallaroo.engine_config.Architecture.ARM
                        )

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime......................successful

Ready
  
Namehf-summarizer-arm
Version712b3023-afba-4b8b-ac63-fc2c1a59c903
File Namehf_summarization.zip
SHAee71d066a83708e7ca4a3c07caf33fdc528bb000039b6ca2ef77fa2428dc6268
Statusready
Image Pathproxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.1.0-main-4870
Architecturearm
Accelerationnone
Updated At2024-03-Apr 21:42:17

We now create a pipeline and add the model as a pipeline step, then deploy the pipeline. The model was packaged as a Wallaroo Containerized Runtime, therefore the deployment configuration for the pipeline will specify the model deployment in the sidekick deployment. For more details on pipeline deployment configurations, see Pipeline Deployment Configuration.

Note that the arch and accel deployment configuration settings are not specified in the deployment configuration, as the deployment configuration inherits the model’s architecture settings.

# create the pipeline and add the model as a pipeline step
pipeline_arm = wl.build_pipeline('architecture-demonstration-arm')
pipeline_arm.add_model_step(model_arm)

# create the deployment configuration and specify 4 cpus with 8Gi RAM.  We do not have to specify the architecture;
# that is inherited from the model's `Architecture` setting.

from wallaroo.deployment_config import DeploymentConfigBuilder

deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('1Gi') \
    .sidekick_cpus(model_arm, 4) \
    .sidekick_memory(model_arm, "8Gi") \
    .build()

pipeline_arm.deploy(deployment_config=deployment_config)

#display the pipeline details
display(pipeline_arm)
  
namearchitecture-demonstration-arm
created2024-03-05 16:18:38.768602+00:00
last_updated2024-04-03 21:46:21.865211+00:00
deployedTrue
archarm
accelnone
tags
versionsae54ae3f-6c26-4584-b424-4c0207d95f3e, 77dd7f95-42b9-422d-a40e-6b678a00e7a8, 47258923-c616-471a-af49-f6504d3c0d22, 4e942b31-d34e-4764-a7fb-6dc27ac00a64, 88801051-5e25-4dda-a3bd-6e64b154f81e, 80c2e1fb-57ba-4ee8-a47b-b09494158769, bbdbc69d-7cc5-4f9b-a70f-6ebaef441075, 07b5ee82-95df-4f30-9128-f344a8df0625, d033152c-494c-44a6-8981-627c6b6ad72e
stepshf-summarizer-arm
publishedTrue

We now publish the pipeline to an OCI registry. It is not required that we deploy the pipeline before publishing it.

For our example, we will use the default deployment configuration. We note again that the Engine Config specifies the arm architecture, which was inherited from the model’s arch setting.

# default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
pub_arm

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.
ID86
Pipeline Namearchitecture-demonstration-arm
Pipeline Versionfd5e3d64-9eea-492d-92b2-8bdb5b20ec83
StatusPublished
Engine URLregistry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870
Pipeline URLregistry.example.com/uat/pipelines/architecture-demonstration-arm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83
Helm Chart URLoci://registry.example.com/uat/charts/architecture-demonstration-arm
Helm Chart Referenceregistry.example.com/uat/charts@sha256:7e2a314d9024cc2529be3e902eb24ac241f1e0819fc07e47bf26dd2e6e64f183
Helm Chart Version0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-03 21:50:14.306316+00:00
Updated At2024-04-03 21:50:14.306316+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=registry.example.com/uat/pipelines/architecture-demonstration-arm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/architecture-demonstration-arm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Model Deployment on ARM Examples: Resnet50 Computer Vision Model

For this example, we will use a Resnet50 Computer Vision model.

We upload the model and set the architecture to ARM.

model_name_arm = 'computer-vision-resnet50-arm'
model_file_name = './models/frcnn-resnet.pt.onnx'

arm_model = wl.upload_model(model_name_arm, 
                        model_file_name, 
                        framework=Framework.ONNX,
                        arch=Architecture.ARM)
display(arm_model)
Namecomputer-vision-resnet50-arm
Version47743b5f-c88a-4150-a37f-9ad591eb4ee3
File Namefrcnn-resnet.pt.onnx
SHA43326e50af639105c81372346fb9ddf453fea0fe46648b2053c375360d9c1647
Statusready
Image PathNone
Architecturearm
Accelerationnone
Updated At2024-03-Apr 22:13:40

We then build the pipeline, add the model as our model step, and deploy it with a deployment configuration that allocates 1 CPU and 2Gi of RAM. We then show the deployment configuration inherited the model’s architecture setting.

pipeline_arm = wl.build_pipeline('architecture-demonstration-arm')
pipeline_arm.clear()
pipeline_arm.add_model_step(arm_model)

deployment_config = wallaroo.DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(1) \
    .memory("2Gi") \
    .build()

pipeline_arm.deploy(deployment_config = deployment_config)
display(pipeline_arm)
namearchitecture-demonstration-arm
created2024-04-01 18:36:26.347071+00:00
last_updated2024-04-03 22:14:42.912284+00:00
deployedTrue
archarm
accelnone
tags
versions18329c99-4b9c-4a15-bc93-42e4d6b93fff, 2f1aa87e-edc2-4af7-8821-00ba54abf18e, 4c8ab1b1-f9c8-49d9-846a-54cad3a18b56, cbc520f2-5755-4f6b-8e89-b4374cb95fdf, 59ff6719-67f1-4359-a6b3-5565b9f6dc09, 39b91147-3b73-4f1a-a25f-500ef648bd6a, 45c0e8ba-b35d-4139-9675-aa7ffcc04dfc, 2d561d88-31f6-43c3-a84d-38cc1cd53fb8, ef9e2394-e29f-46dc-aaa4-eda0a304a71e, fe2b6f05-3623-4440-8258-9e5828bc7eaf, aa86387c-813a-40de-b07a-baf388e20d67
stepscomputer-vision-resnet50-arm
publishedTrue

We now publish the pipeline. Note that the Engine Config inherited the architecture from the model.

# use the default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
display(pub_arm)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.
ID87
Pipeline Namearchitecture-demonstration-arm
Pipeline Version890b56ee-2a0e-4ed1-ae96-c021ca801a7e
StatusPublished
Engine URLregistry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870
Pipeline URLregistry.example.com/uat/pipelines/architecture-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e
Helm Chart URLoci://registry.example.com/uat/charts/architecture-demonstration-arm
Helm Chart Referenceregistry.example.com/uat/charts@sha256:15c50483f2010e2691d32d32ded595f20993fa7b043474962b0fa2b509b61510
Helm Chart Version0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-03 22:17:03.122597+00:00
Updated At2024-04-03 22:17:03.122597+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=registry.example.com/uat/pipelines/architecture-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/architecture-demonstration-arm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Tutorials

The following examples are available to demonstrate uploading and publishing models with ARM processor support.