Inference on ARM Architecture

How to deploy ML models with ARM processors and infrastructure.

ML models can be deployed in centralized Wallaroo OPs instances and Edge devices on a variety of infrastructures and processors. The CPU infrastructure is set during the model upload and packaging stage.

Models specified with the ARM architecture during the upload and automated model packaging can be deployed on Wallaroo Ops instances or multicloud deployments.

ARM Support from Cloud Providers

ARM processors for Kubernetes clusters in Cloud environments are supported by the following providers.

Model Packaging and Deployments Prerequisites for ARM

To upload and package a model for Wallaroo Ops or multicloud edge deployments, the following prerequisites must be met.

Wallaroo Ops
- At least one node with ARM support deployed in the cluster.
  - For details on adding ARM nodes to a cluster, see Create ARM Nodepools for Kubernetes Clusters
Edge Devices
- Enable Edge Registry Services in the Wallaroo instance to publish the pipeline to an OCI registry for edge deployments.
- ARM processor support for the edge device.

AI Workloads for ARM via the Wallaroo SDK

The Wallaroo SDK provides ARM support for models uploaded for Wallaroo Ops or multicloud edge deployments.

Upload Models for ARM via the Wallaroo SDK

Models are uploaded to Wallaroo via the wallaroo.client.upload_model method. The infrastructure is set with the optional arch parameter, which accepts the wallaroo.engine_config.Architecture object.

wallaroo.client.upload_model has the following parameters. For more details on model uploads, see Automated Model Packaging.

Parameter	Type	Description
`name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`path`	`string` (Required)	The path to the model file being uploaded.
`framework`	`string` (Required)	The framework of the model from `wallaroo.framework`
`input_schema`	`pyarrow.lib.Schema` Native Wallaroo Runtimes: (Optional) Non-Native Wallaroo Runtimes: (Required)	The input schema in Apache Arrow schema format.
`output_schema`	`pyarrow.lib.Schema` Native Wallaroo Runtimes: (Optional) Non-Native Wallaroo Runtimes: (Required)	The output schema in Apache Arrow schema format.
`convert_wait`	`bool` (Optional)	True: Waits in the script for the model conversion completion. False: Proceeds with the script without waiting for the model conversion process to display complete.
`arch`	wallaroo.engine_config.Architecture (Optional)	The architecture the model is deployed to. If a model is intended for deployment to an `ARM` architecture, it must be specified during this step. Values include: `X86` (Default): x86 based architectures. `ARM`: ARM based architectures.
`accel`	`wallaroo.engine_config.Acceleration` (Optional)	The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step. `wallaroo.engine_config.Acceleration._None` (Default): No accelerator is assigned. This works for all infrastructures. `wallaroo.engine_config.Acceleration.AIO`: AIO acceleration for Ampere Optimized trained models, only available with ARM processors. `wallaroo.engine_config.Acceleration.Jetson`: Nvidia Jetson acceleration used with multicloud edge deployments with ARM processors. `wallaroo.engine_config.Acceleration.CUDA`: Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

Deploy Models for ARM via the Wallaroo SDK

Models are added to pipeline as pipeline steps. Models are then deployed through the wallaroo.pipeline.Pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig] = None) method.

For full details, see Pipeline Deployment Configuration.

When deploying a model in a Wallaroo Ops instance, the deployment configurations inherits the model architecture setting. No additional changes are needed to set the architecture when deploying the model. Other settings, such as the number of CPUs, etc can be changed without modifying the architecture setting.

To change the architecture settings for model deployment, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure. For more information on uploading models or new model versions, see Upload Models for ARM via the Wallaroo SDK.

Publish Pipeline for ARM via the Wallaroo SDK

Publishing the pipeline to uses the method wallaroo.pipeline.publish(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]).

This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Pipeline Deployment Configuration.

The deployment configuration for the pipeline publish inherits the model’s architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s architecture settings.

Pipelines do not need to be deployed in the Wallaroo Ops instance before publishing the pipeline.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

Model Deployment on ARM Via the Wallaroo SDK Examples

The following examples demonstrates:

Uploading a model for packaging in the Wallaroo Ops instance with the arch set to ARM.
Creating a pipeline and adding the model as a pipeline step.
Deploying the pipeline and demonstrating the deployment configuration inherits the model’s architecture setting.
Publishing the pipeline to an OCI registry and demonstrating the deployment configuration inherits the model’s architecture setting.

Note that the arch and accel deployment configuration settings are not specified, as the deployment configuration inherits the model’s architecture settings.

Model Deployment on ARM Examples: Hugging Face Summarization Model

First we demonstrate uploading a Hugging Face Summarization model for ARM processor deployment. Note the arch setting is set to wallaroo.engine_config.Architecture.ARM.

input_schema = pa.schema([
    pa.field('inputs', pa.string()),
    pa.field('return_text', pa.bool_()),
    pa.field('return_tensors', pa.bool_()),
    pa.field('clean_up_tokenization_spaces', pa.bool_()),
    # pa.field('generate_kwargs', pa.map_(pa.string(), pa.null())), # dictionaries are not currently supported by the engine
])

output_schema = pa.schema([
    pa.field('summary_text', pa.string()),
])


model_name_arm = f'hf-summarizer-arm'
model_file_name = './models/hf_summarization.zip'

model_arm = wl.upload_model(model_name_arm, 
                        model_file_name, 
                        framework=wallaroo.framework.Framework.HUGGING_FACE_SUMMARIZATION, 
                        input_schema=input_schema, 
                        output_schema=output_schema,
                        arch=wallaroo.engine_config.Architecture.ARM
                        )

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime......................successful

Ready


Name	hf-summarizer-arm
Version	712b3023-afba-4b8b-ac63-fc2c1a59c903
File Name	hf_summarization.zip
SHA	ee71d066a83708e7ca4a3c07caf33fdc528bb000039b6ca2ef77fa2428dc6268
Status	ready
Image Path	proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.1.0-main-4870
Architecture	arm
Acceleration	none
Updated At	2024-03-Apr 21:42:17

We now create a pipeline and add the model as a pipeline step, then deploy the pipeline. The model was packaged as a Wallaroo Containerized Runtime, therefore the deployment configuration for the pipeline will specify the model deployment in the sidekick deployment. For more details on pipeline deployment configurations, see Pipeline Deployment Configuration.

Note that the arch and accel deployment configuration settings are not specified in the deployment configuration, as the deployment configuration inherits the model’s architecture settings.

# create the pipeline and add the model as a pipeline step
pipeline_arm = wl.build_pipeline('architecture-demonstration-arm')
pipeline_arm.add_model_step(model_arm)

# create the deployment configuration and specify 4 cpus with 8Gi RAM.  We do not have to specify the architecture;
# that is inherited from the model's `Architecture` setting.

from wallaroo.deployment_config import DeploymentConfigBuilder

deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('1Gi') \
    .sidekick_cpus(model_arm, 4) \
    .sidekick_memory(model_arm, "8Gi") \
    .build()

pipeline_arm.deploy(deployment_config=deployment_config)

#display the pipeline details
display(pipeline_arm)


name	architecture-demonstration-arm
created	2024-03-05 16:18:38.768602+00:00
last_updated	2024-04-03 21:46:21.865211+00:00
deployed	True
arch	arm
accel	none
tags
versions	ae54ae3f-6c26-4584-b424-4c0207d95f3e, 77dd7f95-42b9-422d-a40e-6b678a00e7a8, 47258923-c616-471a-af49-f6504d3c0d22, 4e942b31-d34e-4764-a7fb-6dc27ac00a64, 88801051-5e25-4dda-a3bd-6e64b154f81e, 80c2e1fb-57ba-4ee8-a47b-b09494158769, bbdbc69d-7cc5-4f9b-a70f-6ebaef441075, 07b5ee82-95df-4f30-9128-f344a8df0625, d033152c-494c-44a6-8981-627c6b6ad72e
steps	hf-summarizer-arm
published	True

We now publish the pipeline to an OCI registry. It is not required that we deploy the pipeline before publishing it.

For our example, we will use the default deployment configuration. We note again that the Engine Config specifies the arm architecture, which was inherited from the model’s arch setting.

# default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
pub_arm

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.

ID 86

Pipeline Name architecture-demonstration-arm

Pipeline Version fd5e3d64-9eea-492d-92b2-8bdb5b20ec83

Status Published

Engine URL registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

Pipeline URL registry.example.com/uat/pipelines/architecture-demonstration-arm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83

Helm Chart URL oci://registry.example.com/uat/charts/architecture-demonstration-arm

Helm Chart Reference registry.example.com/uat/charts@sha256:7e2a314d9024cc2529be3e902eb24ac241f1e0819fc07e47bf26dd2e6e64f183

Helm Chart Version 0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83

Engine Config {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}

User Images []

Created By john.hummel@wallaroo.ai

Created At 2024-04-03 21:50:14.306316+00:00

Updated At 2024-04-03 21:50:14.306316+00:00

Replaces

Docker Run Command

docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=registry.example.com/uat/pipelines/architecture-demonstration-arm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.

Helm Install Command

helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/architecture-demonstration-arm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Model Deployment on ARM Examples: Resnet50 Computer Vision Model

For this example, we will use a Resnet50 Computer Vision model.

We upload the model and set the architecture to ARM.

model_name_arm = 'computer-vision-resnet50-arm'
model_file_name = './models/frcnn-resnet.pt.onnx'

arm_model = wl.upload_model(model_name_arm, 
                        model_file_name, 
                        framework=Framework.ONNX,
                        arch=Architecture.ARM)

display(arm_model)

Name	computer-vision-resnet50-arm
Version	47743b5f-c88a-4150-a37f-9ad591eb4ee3
File Name	frcnn-resnet.pt.onnx
SHA	43326e50af639105c81372346fb9ddf453fea0fe46648b2053c375360d9c1647
Status	ready
Image Path	None
Architecture	arm
Acceleration	none
Updated At	2024-03-Apr 22:13:40

We then build the pipeline, add the model as our model step, and deploy it with a deployment configuration that allocates 1 CPU and 2Gi of RAM. We then show the deployment configuration inherited the model’s architecture setting.

pipeline_arm = wl.build_pipeline('architecture-demonstration-arm')
pipeline_arm.clear()
pipeline_arm.add_model_step(arm_model)

deployment_config = wallaroo.DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(1) \
    .memory("2Gi") \
    .build()

pipeline_arm.deploy(deployment_config = deployment_config)
display(pipeline_arm)

name	architecture-demonstration-arm
created	2024-04-01 18:36:26.347071+00:00
last_updated	2024-04-03 22:14:42.912284+00:00
deployed	True
arch	arm
accel	none
tags
versions	18329c99-4b9c-4a15-bc93-42e4d6b93fff, 2f1aa87e-edc2-4af7-8821-00ba54abf18e, 4c8ab1b1-f9c8-49d9-846a-54cad3a18b56, cbc520f2-5755-4f6b-8e89-b4374cb95fdf, 59ff6719-67f1-4359-a6b3-5565b9f6dc09, 39b91147-3b73-4f1a-a25f-500ef648bd6a, 45c0e8ba-b35d-4139-9675-aa7ffcc04dfc, 2d561d88-31f6-43c3-a84d-38cc1cd53fb8, ef9e2394-e29f-46dc-aaa4-eda0a304a71e, fe2b6f05-3623-4440-8258-9e5828bc7eaf, aa86387c-813a-40de-b07a-baf388e20d67
steps	computer-vision-resnet50-arm
published	True

We now publish the pipeline. Note that the Engine Config inherited the architecture from the model.

# use the default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
display(pub_arm)

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.

ID 87

Pipeline Name architecture-demonstration-arm

Pipeline Version 890b56ee-2a0e-4ed1-ae96-c021ca801a7e

Status Published

Engine URL registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

Pipeline URL registry.example.com/uat/pipelines/architecture-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e

Helm Chart URL oci://registry.example.com/uat/charts/architecture-demonstration-arm

Helm Chart Reference registry.example.com/uat/charts@sha256:15c50483f2010e2691d32d32ded595f20993fa7b043474962b0fa2b509b61510

Helm Chart Version 0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e

User Images []

Created By john.hummel@wallaroo.ai

Created At 2024-04-03 22:17:03.122597+00:00

Updated At 2024-04-03 22:17:03.122597+00:00

Replaces

Docker Run Command

docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=registry.example.com/uat/pipelines/architecture-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.

Helm Install Command

helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/architecture-demonstration-arm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Tutorials

The following examples are available to demonstrate uploading and publishing models with ARM processor support.