Inference on ARM Architecture
Table of Contents
ML models can be deployed in centralized Wallaroo OPs instances and Edge devices on a variety of infrastructures and processors. The CPU infrastructure is set during the model upload and packaging stage.
Models specified with the ARM
architecture during the upload and automated model packaging can be deployed on Wallaroo Ops instances or multicloud deployments.
ARM Support from Cloud Providers
ARM processors for Kubernetes clusters in Cloud environments are supported by the following providers.
- Amazon Web Services (AWS)
- Google Cloud Platform (GCP)
- Microsoft Azure
- Oracle Cloud Infrastructure (OCI)
Model Packaging and Deployments Prerequisites for ARM
To upload and package a model for Wallaroo Ops or multicloud edge deployments, the following prerequisites must be met.
- Wallaroo Ops
- At least one node with ARM support deployed in the cluster.
- For details on adding ARM nodes to a cluster, see Create ARM Nodepools for Kubernetes Clusters
- At least one node with ARM support deployed in the cluster.
- Edge Devices
- Enable Edge Registry Services in the Wallaroo instance to publish the pipeline to an OCI registry for edge deployments.
- ARM processor support for the edge device.
AI Workloads for ARM via the Wallaroo SDK
The Wallaroo SDK provides ARM support for models uploaded for Wallaroo Ops or multicloud edge deployments.
Upload Models for ARM via the Wallaroo SDK
Models are uploaded to Wallaroo via the wallaroo.client.upload_model
method. The infrastructure is set with the optional arch
parameter, which accepts the wallaroo.engine_config.Architecture
object.
wallaroo.client.upload_model
has the following parameters. For more details on model uploads, see Automated Model Packaging.
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | The framework of the model from wallaroo.framework |
input_schema | pyarrow.lib.Schema
| The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema
| The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) |
|
arch | wallaroo.engine_config.Architecture (Optional) | The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include:
|
accel | wallaroo.engine_config.Acceleration (Optional) | The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
|
Deploy Models for ARM via the Wallaroo SDK
Models are added to pipeline as pipeline steps. Models are then deployed through the wallaroo.pipeline.Pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig] = None)
method.
For full details, see Pipeline Deployment Configuration.
When deploying a model in a Wallaroo Ops instance, the deployment configurations inherits the model architecture setting. No additional changes are needed to set the architecture when deploying the model. Other settings, such as the number of CPUs, etc can be changed without modifying the architecture setting.
To change the architecture settings for model deployment, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure. For more information on uploading models or new model versions, see Upload Models for ARM via the Wallaroo SDK.
Publish Pipeline for ARM via the Wallaroo SDK
Publishing the pipeline to uses the method wallaroo.pipeline.publish(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig])
.
This requires that the Wallaroo Ops instance have Edge Registry Services enabled.
A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Pipeline Deployment Configuration.
The deployment configuration for the pipeline publish inherits the model’s architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s architecture settings.
Pipelines do not need to be deployed in the Wallaroo Ops instance before publishing the pipeline.
For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.
Model Deployment on ARM Via the Wallaroo SDK Examples
The following examples demonstrates:
- Uploading a model for packaging in the Wallaroo Ops instance with the
arch
set to ARM. - Creating a pipeline and adding the model as a pipeline step.
- Deploying the pipeline and demonstrating the deployment configuration inherits the model’s architecture setting.
- Publishing the pipeline to an OCI registry and demonstrating the deployment configuration inherits the model’s architecture setting.
Note that the arch
and accel
deployment configuration settings are not specified, as the deployment configuration inherits the model’s architecture settings.
Model Deployment on ARM Examples: Hugging Face Summarization Model
First we demonstrate uploading a Hugging Face Summarization model for ARM processor deployment. Note the arch
setting is set to wallaroo.engine_config.Architecture.ARM
.
input_schema = pa.schema([
pa.field('inputs', pa.string()),
pa.field('return_text', pa.bool_()),
pa.field('return_tensors', pa.bool_()),
pa.field('clean_up_tokenization_spaces', pa.bool_()),
# pa.field('generate_kwargs', pa.map_(pa.string(), pa.null())), # dictionaries are not currently supported by the engine
])
output_schema = pa.schema([
pa.field('summary_text', pa.string()),
])
model_name_arm = f'hf-summarizer-arm'
model_file_name = './models/hf_summarization.zip'
model_arm = wl.upload_model(model_name_arm,
model_file_name,
framework=wallaroo.framework.Framework.HUGGING_FACE_SUMMARIZATION,
input_schema=input_schema,
output_schema=output_schema,
arch=wallaroo.engine_config.Architecture.ARM
)
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime......................successful
Ready
Name | hf-summarizer-arm |
Version | 712b3023-afba-4b8b-ac63-fc2c1a59c903 |
File Name | hf_summarization.zip |
SHA | ee71d066a83708e7ca4a3c07caf33fdc528bb000039b6ca2ef77fa2428dc6268 |
Status | ready |
Image Path | proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.1.0-main-4870 |
Architecture | arm |
Acceleration | none |
Updated At | 2024-03-Apr 21:42:17 |
We now create a pipeline and add the model as a pipeline step, then deploy the pipeline. The model was packaged as a Wallaroo Containerized Runtime, therefore the deployment configuration for the pipeline will specify the model deployment in the sidekick
deployment. For more details on pipeline deployment configurations, see Pipeline Deployment Configuration.
Note that the arch
and accel
deployment configuration settings are not specified in the deployment configuration, as the deployment configuration inherits the model’s architecture settings.
# create the pipeline and add the model as a pipeline step
pipeline_arm = wl.build_pipeline('architecture-demonstration-arm')
pipeline_arm.add_model_step(model_arm)
# create the deployment configuration and specify 4 cpus with 8Gi RAM. We do not have to specify the architecture;
# that is inherited from the model's `Architecture` setting.
from wallaroo.deployment_config import DeploymentConfigBuilder
deployment_config = DeploymentConfigBuilder() \
.cpus(0.25).memory('1Gi') \
.sidekick_cpus(model_arm, 4) \
.sidekick_memory(model_arm, "8Gi") \
.build()
pipeline_arm.deploy(deployment_config=deployment_config)
#display the pipeline details
display(pipeline_arm)
name | architecture-demonstration-arm |
created | 2024-03-05 16:18:38.768602+00:00 |
last_updated | 2024-04-03 21:46:21.865211+00:00 |
deployed | True |
arch | arm |
accel | none |
tags | |
versions | ae54ae3f-6c26-4584-b424-4c0207d95f3e, 77dd7f95-42b9-422d-a40e-6b678a00e7a8, 47258923-c616-471a-af49-f6504d3c0d22, 4e942b31-d34e-4764-a7fb-6dc27ac00a64, 88801051-5e25-4dda-a3bd-6e64b154f81e, 80c2e1fb-57ba-4ee8-a47b-b09494158769, bbdbc69d-7cc5-4f9b-a70f-6ebaef441075, 07b5ee82-95df-4f30-9128-f344a8df0625, d033152c-494c-44a6-8981-627c6b6ad72e |
steps | hf-summarizer-arm |
published | True |
We now publish the pipeline to an OCI registry. It is not required that we deploy the pipeline before publishing it.
For our example, we will use the default deployment configuration. We note again that the Engine Config
specifies the arm
architecture, which was inherited from the model’s arch
setting.
# default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
pub_arm
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.
ID | 86 | |
Pipeline Name | architecture-demonstration-arm | |
Pipeline Version | fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 | |
Status | Published | |
Engine URL | registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870 | |
Pipeline URL | registry.example.com/uat/pipelines/architecture-demonstration-arm:fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 | |
Helm Chart URL | oci://registry.example.com/uat/charts/architecture-demonstration-arm | |
Helm Chart Reference | registry.example.com/uat/charts@sha256:7e2a314d9024cc2529be3e902eb24ac241f1e0819fc07e47bf26dd2e6e64f183 | |
Helm Chart Version | 0.0.1-fd5e3d64-9eea-492d-92b2-8bdb5b20ec83 | |
Engine Config | {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}} | |
User Images | [] | |
Created By | john.hummel@wallaroo.ai | |
Created At | 2024-04-03 21:50:14.306316+00:00 | |
Updated At | 2024-04-03 21:50:14.306316+00:00 | |
Replaces | ||
Docker Run Command |
Note: Please set the EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
Model Deployment on ARM Examples: Resnet50 Computer Vision Model
For this example, we will use a Resnet50 Computer Vision model.
We upload the model and set the architecture to ARM
.
model_name_arm = 'computer-vision-resnet50-arm'
model_file_name = './models/frcnn-resnet.pt.onnx'
arm_model = wl.upload_model(model_name_arm,
model_file_name,
framework=Framework.ONNX,
arch=Architecture.ARM)
display(arm_model)
Name | computer-vision-resnet50-arm |
Version | 47743b5f-c88a-4150-a37f-9ad591eb4ee3 |
File Name | frcnn-resnet.pt.onnx |
SHA | 43326e50af639105c81372346fb9ddf453fea0fe46648b2053c375360d9c1647 |
Status | ready |
Image Path | None |
Architecture | arm |
Acceleration | none |
Updated At | 2024-03-Apr 22:13:40 |
We then build the pipeline, add the model as our model step, and deploy it with a deployment configuration that allocates 1 CPU and 2Gi of RAM. We then show the deployment configuration inherited the model’s architecture setting.
pipeline_arm = wl.build_pipeline('architecture-demonstration-arm')
pipeline_arm.clear()
pipeline_arm.add_model_step(arm_model)
deployment_config = wallaroo.DeploymentConfigBuilder() \
.replica_count(1) \
.cpus(1) \
.memory("2Gi") \
.build()
pipeline_arm.deploy(deployment_config = deployment_config)
display(pipeline_arm)
name | architecture-demonstration-arm |
---|---|
created | 2024-04-01 18:36:26.347071+00:00 |
last_updated | 2024-04-03 22:14:42.912284+00:00 |
deployed | True |
arch | arm |
accel | none |
tags | |
versions | 18329c99-4b9c-4a15-bc93-42e4d6b93fff, 2f1aa87e-edc2-4af7-8821-00ba54abf18e, 4c8ab1b1-f9c8-49d9-846a-54cad3a18b56, cbc520f2-5755-4f6b-8e89-b4374cb95fdf, 59ff6719-67f1-4359-a6b3-5565b9f6dc09, 39b91147-3b73-4f1a-a25f-500ef648bd6a, 45c0e8ba-b35d-4139-9675-aa7ffcc04dfc, 2d561d88-31f6-43c3-a84d-38cc1cd53fb8, ef9e2394-e29f-46dc-aaa4-eda0a304a71e, fe2b6f05-3623-4440-8258-9e5828bc7eaf, aa86387c-813a-40de-b07a-baf388e20d67 |
steps | computer-vision-resnet50-arm |
published | True |
We now publish the pipeline. Note that the Engine Config
inherited the architecture from the model.
# use the default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=wallaroo.DeploymentConfigBuilder().build())
display(pub_arm)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.
ID | 87 | |
Pipeline Name | architecture-demonstration-arm | |
Pipeline Version | 890b56ee-2a0e-4ed1-ae96-c021ca801a7e | |
Status | Published | |
Engine URL | registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.1.0-main-4870 | |
Pipeline URL | registry.example.com/uat/pipelines/architecture-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e | |
Helm Chart URL | oci://registry.example.com/uat/charts/architecture-demonstration-arm | |
Helm Chart Reference | registry.example.com/uat/charts@sha256:15c50483f2010e2691d32d32ded595f20993fa7b043474962b0fa2b509b61510 | |
Helm Chart Version | 0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e | |
Engine Config | {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}} | |
User Images | [] | |
Created By | john.hummel@wallaroo.ai | |
Created At | 2024-04-03 22:17:03.122597+00:00 | |
Updated At | 2024-04-03 22:17:03.122597+00:00 | |
Replaces | ||
Docker Run Command |
Note: Please set the EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
Tutorials
The following examples are available to demonstrate uploading and publishing models with ARM processor support.