Run AI Workloads with Hardware Accelerators: Aloha Tutorial

A demonstration of accelerating model deployment performance with optimization settings.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Run Anywhere With Acceleration Tutorial: Aloha Model

Wallaroo supports deploying models with accelerators that increase the inference speed and performance. These accelerators are set during model upload, and are carried through to model deployment and model edge deployment.

Supported Accelerators

The following accelerators are supported:

AcceleratorDescription
NoneThe default acceleration, used for all scenarios and architectures.
AioCompatible only with the ARM architecture.
JetsonCompatible only with the ARM architecture.
CUDANvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

Goal

Demonstrate uploading an Aloha model with the Jetson, then publishing the same model for edge deployment with the Jetson accelerator inherited from the model.

Resources

This tutorial provides the following:

  • Models:
    • models/alohacnnlstm.zip: An open source model based on the Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.

Prerequisites

  • A deployed Wallaroo instance with Edge Registry Services and Edge Observability enabled.
  • The following Python libraries installed:
    • wallaroo: The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
    • pandas: Pandas, mainly used for Pandas DataFrame
    • json: Used for format input data for inference requests.

Steps

  • Upload the model with the targeted accelerator left as None by default.
  • Create the pipeline add the model as a model step.
  • Deploy the model with deployment configuration and show the acceleration setting inherits the model’s accelerator.
  • Publish the pipeline an OCI registry and show the publish pipeline deployment configuration inherit’s the model’s accelerator.

Import Libraries

The first step will be to import our libraries, and set variables used through this tutorial.

import wallaroo
from wallaroo.object import EntityNotFoundError

# to display dataframe tables
from IPython.display import display
# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
import pyarrow as pa

Connect to the Wallaroo Instance

The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

# Login through local Wallaroo instance

wl = wallaroo.Client()

Create Workspace

We will create a workspace to manage our pipeline and models. The following variables will set the name of our sample workspace then set it as the current workspace.

Workspace, pipeline, and model names should be unique to each user, so we’ll add in a randomly generated suffix so multiple people can run this tutorial in a Wallaroo instance without effecting each other.

workspace_name = 'accelerator-aloha-demonstration'

workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)
{'name': 'accelerator-aloha-demonstration', 'id': 41, 'archived': False, 'created_by': 'eed2002f-769f-4cbd-a189-8ca1e9bf496c', 'created_at': '2024-04-19T17:03:13.108611+00:00', 'models': [], 'pipelines': []}

Set Model Accelerator

For our example, we will upload the model. The file name is ./models/alohacnnlstm.zip and the model will be called aloha.

Models are uploaded to Wallaroo via the wallaroo.client.upload_model method which takes the following arguments:

ParameterTypeDescription
pathString (Required)The file path to the model.
frameworkwallaroo.framework.Framework (Required)The model’s framework. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for supported model frameworks.
input_schemapyarrow.lib.Schema (Optional)The model’s input schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details.
output_schemapyarrow.lib.Schema (Optional)The model’s output schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details.
convert_waitbool (Optional)Whether to wait in the SDK session to complete the auto-packaging process for non-native Wallaroo frameworks.
archwallaroo.engine_config.Architecture (Optional)The targeted architecture for the model. Options are
  • X86 (Default)
  • ARM
accelwallaroo.engine_config.Acceleration (Optional)The targeted optimization for the model. Options are
  • None: The default acceleration, used for all scenarios and architectures.
  • Aio:Compatible only with the ARM architecture.
  • Jetson: Compatible only with the ARM architecture.
  • CUDA: Compatible with either ARM or X86/X64 architectures.

We upload the model and set set the accel to wallaroo.engine_config.Acceleration.Jetson.


model_name = 'aloha'
model_file_name = './models/alohacnnlstm.zip'

from wallaroo.framework import Framework
from wallaroo.engine_config import Architecture, Acceleration

model = wl.upload_model(model_name, 
                        model_file_name,
                        framework=Framework.TENSORFLOW,
                        arch=Architecture.ARM,
                        accel=Acceleration.Jetson,
                        )

Display Model Details

Once the model is uploaded, we view the model details to verify the accel setting it set to Jetson.

model
Namealoha
Version4f42d63e-aff1-4eb2-997a-eef23dc5582d
File Namealohacnnlstm.zip
SHAd71d9ffc61aaac58c2b1ed70a2db13d1416fb9d3f5b891e5e4e2e97180fe22f8
Statusready
Image PathNone
Architecturearm
Accelerationjetson
Updated At2024-19-Apr 17:03:13

Create the Pipeline

With the model uploaded, we build our pipeline and add the Aloha model as a pipeline step.

pipeline_name = 'aloha-pipeline'

aloha_pipeline = wl.build_pipeline(pipeline_name)

_ = aloha_pipeline.add_model_step(model)

Set Accelerator for Pipeline Publish

Publishing the pipeline uses the pipeline wallaroo.pipeline.Pipeline.publish() command. This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

The deployment configuration for the pipeline publish inherits the model’s accelerator and architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator or architecture settings.

Pipelines do not need to be deployed in the centralized Wallaroo Ops instance before publishing the pipeline. This is useful in deployments to edge devices with different hardware accelerators than the centralized Wallaroo Ops instance.

To change the model architecture or acceleration settings, upload the model as a new model or model version with the new architecture or acceleration settings.

For this example, we will publish the pipeline twice:

  • Publish the pipeline with a default deployment configuration.
  • Publish the pipeline with the cpu and memory specified.

Note that in both examples, the architecture and the acceleration inherits the model’s settings.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

from wallaroo.deployment_config import DeploymentConfigBuilder

deploy_config = wallaroo.DeploymentConfigBuilder().build()

aloha_pipeline.publish(deployment_config=deploy_config)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing.................... Published.
ID2
Pipeline Namealoha-pipeline
Pipeline Versionf4336f6f-808f-4e46-b8bd-c0e2a407cb01
StatusPublished
Engine URLsample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4963
Pipeline URLsample.registry.example.com/uat/pipelines/aloha-pipeline:f4336f6f-808f-4e46-b8bd-c0e2a407cb01
Helm Chart URLoci://sample.registry.example.com/uat/charts/aloha-pipeline
Helm Chart Referencesample.registry.example.com/uat/charts@sha256:acf1102592e193c7bfc5fa867856d81f8619f9205a52860aad7d1a671af7e853
Helm Chart Version0.0.1-f4336f6f-808f-4e46-b8bd-c0e2a407cb01
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-19 17:03:14.326070+00:00
Updated At2024-04-19 17:03:14.326070+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/aloha-pipeline:f4336f6f-808f-4e46-b8bd-c0e2a407cb01 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4963

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/aloha-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-f4336f6f-808f-4e46-b8bd-c0e2a407cb01 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

We publish the pipeline again, this time changing the number of cpus and memory for the deployment configuration.

from wallaroo.deployment_config import DeploymentConfigBuilder

deploy_config_custom = (wallaroo.DeploymentConfigBuilder()
                     .replica_count(1)
                     .cpus(1)
                     .memory("1Gi")
                     .build()
                    )

aloha_pipeline.publish(deployment_config=deploy_config_custom)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.
ID3
Pipeline Namealoha-pipeline
Pipeline Versiond261c45e-de85-4997-971c-2bdcaf406014
StatusPublished
Engine URLsample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4963
Pipeline URLsample.registry.example.com/uat/pipelines/aloha-pipeline:d261c45e-de85-4997-971c-2bdcaf406014
Helm Chart URLoci://sample.registry.example.com/uat/charts/aloha-pipeline
Helm Chart Referencesample.registry.example.com/uat/charts@sha256:b9e27a601d69deb4c79bc46d1a1597b2ec1cbd5cbf67cbe5e46f5aa88536d361
Helm Chart Version0.0.1-d261c45e-de85-4997-971c-2bdcaf406014
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-19 17:04:51.595162+00:00
Updated At2024-04-19 17:04:51.595162+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/aloha-pipeline:d261c45e-de85-4997-971c-2bdcaf406014 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4963

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/aloha-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-d261c45e-de85-4997-971c-2bdcaf406014 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

ML models published to OCI registries via the Wallaroo SDK are provided with the Docker Run Command: a sample docker script for deploying the model on edge and multicloud environments.

For ML models deployed on Jetson accelerated hardware via Docker, the docker command is replace by the docker run --runtime nvidia --privileged --gpus all application. For details on installing Nvidia Container Toolkit, see Installing the NVIDIA Container Toolkit. For example:

docker run --runtime nvidia --privileged --gpus all \
    -v $PERSISTENT_VOLUME_DIR:/persist \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/sample-edge-deploy:446aeed9-2d52-47ae-9e5c-f2a05ef0d4d6\
    -e EDGE_BUNDLE=abc123 \
    ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.4.0-5849