Run AI Workloads with Hardware Accelerators: Aloha Tutorial

A demonstration of accelerating model deployment performance with optimization settings.

Features:

Models:

aloha

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Run Anywhere With Acceleration Tutorial: Aloha Model

Wallaroo supports deploying models with accelerators that increase the inference speed and performance. These accelerators are set during model upload, and are carried through to model deployment and model edge deployment.

Supported Accelerators

The following accelerators are supported:

Accelerator	Description
`None`	The default acceleration, used for all scenarios and architectures.
`Aio`	Compatible only with the `ARM` architecture.
`Jetson`	compatible only with the `ARM` architecture.
`CUDA`	Compatible with either `ARM` or `X86/X64` architectures.

Goal

Demonstrate uploading an Aloha model with the Jetson, then publishing the same model for edge deployment with the Jetson accelerator inherited from the model.

Resources

This tutorial provides the following:

Models:
- models/alohacnnlstm.zip: An open source model based on the Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.

Prerequisites

A deployed Wallaroo instance with Edge Registry Services and Edge Observability enabled.
The following Python libraries installed:
- wallaroo: The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
- pandas: Pandas, mainly used for Pandas DataFrame
- json: Used for format input data for inference requests.

Steps

Upload the model with the targeted accelerator left as None by default.
Create the pipeline add the model as a model step.
Deploy the model with deployment configuration and show the acceleration setting inherits the model’s accelerator.
Publish the pipeline an OCI registry and show the publish pipeline deployment configuration inherit’s the model’s accelerator.

Import Libraries

The first step will be to import our libraries, and set variables used through this tutorial.

import wallaroo
from wallaroo.object import EntityNotFoundError

# to display dataframe tables
from IPython.display import display
# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)
import pyarrow as pa

Connect to the Wallaroo Instance

The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

# Login through local Wallaroo instance

wl = wallaroo.Client()

Create Workspace

We will create a workspace to manage our pipeline and models. The following variables will set the name of our sample workspace then set it as the current workspace.

Workspace, pipeline, and model names should be unique to each user, so we’ll add in a randomly generated suffix so multiple people can run this tutorial in a Wallaroo instance without effecting each other.

workspace_name = 'accelerator-aloha-demonstration'

workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)

{'name': 'optimization-aloha-demonstration', 'id': 2900937, 'archived': False, 'created_by': 'b4a9aa3d-83fc-407a-b4eb-37796e96f1ef', 'created_at': '2024-04-01T21:20:03.337377+00:00', 'models': [{'name': 'aloha', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2024, 4, 1, 21, 20, 4, 427650, tzinfo=tzutc()), 'created_at': datetime.datetime(2024, 4, 1, 21, 20, 4, 427650, tzinfo=tzutc())}], 'pipelines': [{'name': 'aloha-pipeline', 'create_time': datetime.datetime(2024, 4, 1, 21, 20, 7, 281931, tzinfo=tzutc()), 'definition': '[]'}]}

Set Model Accelerator

For our example, we will upload the model. The file name is ./models/alohacnnlstm.zip and the model will be called aloha.

Models are uploaded to Wallaroo via the wallaroo.client.upload_model method which takes the following arguments:

Parameter	Type	Description
path	String (Required)	The file path to the model.
framework	wallaroo.framework.Framework (Required)	The model’s framework. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for supported model frameworks.
input_schema	pyarrow.lib.Schema (Optional)	The model’s input schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details.
output_schema	pyarrow.lib.Schema (Optional)	The model’s output schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details.
convert_wait	bool (Optional)	Whether to wait in the SDK session to complete the auto-packaging process for non-native Wallaroo frameworks.
arch	wallaroo.engine_config.Architecture (Optional)	The targeted architecture for the model. Options are `X86` (Default) `ARM`
accel	wallaroo.engine_config.Acceleration (Optional)	The targeted optimization for the model. Options are `None`: The default acceleration, used for all scenarios and architectures. `Aio`:Compatible only with the `ARM` architecture. `Jetson`: Compatible only with the `ARM` architecture. `CUDA`: Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

We upload the model and set set the accel to wallaroo.engine_config.Acceleration.Jetson.


model_name = 'aloha'
model_file_name = './models/alohacnnlstm.zip'

from wallaroo.framework import Framework
from wallaroo.engine_config import Architecture, Acceleration

model = wl.upload_model(model_name, 
                        model_file_name,
                        framework=Framework.TENSORFLOW,
                        arch=Architecture.ARM,
                        accel=Acceleration.Jetson,
                        )

Display Model Details

Once the model is uploaded, we view the model details to verify the accel setting it set to Jetson.

model

Name	aloha
Version	c8b7497f-0ef0-4336-b0d9-e608f4b11657
File Name	alohacnnlstm.zip
SHA	d71d9ffc61aaac58c2b1ed70a2db13d1416fb9d3f5b891e5e4e2e97180fe22f8
Status	ready
Image Path	None
Architecture	arm
Acceleration	jetson
Updated At	2024-02-Apr 17:58:13

Create the Pipeline

With the model uploaded, we build our pipeline and add the Aloha model as a pipeline step.

pipeline_name = 'aloha-pipeline'

aloha_pipeline = wl.build_pipeline(pipeline_name)

_ = aloha_pipeline.add_model_step(model)

Set Accelerator for Pipeline Publish

Publishing the pipeline uses the pipeline wallaroo.pipeline.Pipeline.publish() command. This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

The deployment configuration for the pipeline publish inherits the model’s accelerator and architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator or architecture settings.

Pipelines do not need to be deployed in the centralized Wallaroo Ops instance before publishing the pipeline. This is useful in multicloud deployments to edge devices with different hardware accelerators than the centralized Wallaroo Ops instance.

To change the model architecture or acceleration settings, upload the model as a new model or model version with the new architecture or acceleration settings.

For this example, we will publish the pipeline twice:

Publish the pipeline with a default deployment configuration.
Publish the pipeline with the cpu and memory specified.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

from wallaroo.deployment_config import DeploymentConfigBuilder

deploy_config = wallaroo.DeploymentConfigBuilder().build()

aloha_pipeline.publish(deployment_config=deploy_config)

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing...................... Published.

ID 78

Pipeline Name aloha-pipeline

Pipeline Version 3db319b4-c0b5-47a1-94d3-1931a38cb3f9

Status Published

Engine URL sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833

Pipeline URL sample.registry.example.com/uat/pipelines/aloha-pipeline:3db319b4-c0b5-47a1-94d3-1931a38cb3f9

Helm Chart URL oci://sample.registry.example.com/uat/charts/aloha-pipeline

Helm Chart Reference sample.registry.example.com/uat/charts@sha256:60b9d5e44f4fd7adcc4a5296d497bc009cec67d919fe5bafdc3b9fa3768224fb

Helm Chart Version 0.0.1-3db319b4-c0b5-47a1-94d3-1931a38cb3f9

Engine Config {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}, 'enginelb': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'x86', 'gpu': False}}}

User Images []

Created By john.hummel@wallaroo.ai

Created At 2024-04-02 17:58:14.636536+00:00

Updated At 2024-04-02 17:58:14.636536+00:00

Replaces

Docker Run Command

docker run \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/aloha-pipeline:3db319b4-c0b5-47a1-94d3-1931a38cb3f9 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833

Note: Please set the OCI_USERNAME, and OCI_PASSWORD environment variables.

Helm Install Command

helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/aloha-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-3db319b4-c0b5-47a1-94d3-1931a38cb3f9 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

We publish the pipeline again, this time changing the number of cpus and memory for the deployment configuration.

from wallaroo.deployment_config import DeploymentConfigBuilder

deploy_config_custom = (wallaroo.DeploymentConfigBuilder()
                     .replica_count(1)
                     .cpus(1)
                     .memory("1Gi")
                     .build()
                    )

aloha_pipeline.publish(deployment_config=deploy_config_custom)

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing...................... Published.

ID 79

Pipeline Name aloha-pipeline

Pipeline Version e10ec783-5f55-4c28-b8d9-330ddda91474

Status Published

Engine URL sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833

Pipeline URL sample.registry.example.com/uat/pipelines/aloha-pipeline:e10ec783-5f55-4c28-b8d9-330ddda91474

Helm Chart URL oci://sample.registry.example.com/uat/charts/aloha-pipeline

Helm Chart Reference sample.registry.example.com/uat/charts@sha256:214908ea92d651121cb18bebfe97efd26344562d8ddfbec3b3618d3c68312ba9

Helm Chart Version 0.0.1-e10ec783-5f55-4c28-b8d9-330ddda91474

User Images []

Created By john.hummel@wallaroo.ai

Created At 2024-04-02 18:00:03.706533+00:00

Updated At 2024-04-02 18:00:03.706533+00:00

Replaces

Docker Run Command

docker run \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/aloha-pipeline:e10ec783-5f55-4c28-b8d9-330ddda91474 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833

Note: Please set the OCI_USERNAME, and OCI_PASSWORD environment variables.

Helm Install Command

helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/aloha-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-e10ec783-5f55-4c28-b8d9-330ddda91474 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

ML models published to OCI registries via the Wallaroo SDK are provided with the Docker Run Command: a sample docker script for deploying the model on edge and multicloud environments.

For ML models deployed on Jetson accelerated hardware via Docker, the application docker is replace by the nvidia-docker application. For details on installing nvidia-docker, see Installing the NVIDIA Container Toolkit. For example:

nvidia-docker run -v $PERSISTENT_VOLUME_DIR:/persist \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/sample-edge-deploy:446aeed9-2d52-47ae-9e5c-f2a05ef0d4d6\
    -e EDGE_BUNDLE=abc123 \
    ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:2024.1.0-5097