Run AI Workloads with Hardware Accelerators: Aloha Tutorial

A demonstration of accelerating model deployment performance with optimization settings.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Run Anywhere With Acceleration Tutorial: Aloha Model

Wallaroo supports deploying models with accelerators that increase the inference speed and performance. These accelerators are set during model upload, and are carried through to model deployment and model edge deployment.

Supported Accelerators

The following accelerators are supported:

AcceleratorDescription
NoneThe default acceleration, used for all scenarios and architectures.
AioCompatible only with the ARM architecture.
Jetsoncompatible only with the ARM architecture.
CUDACompatible with either ARM or X86/X64 architectures.

Goal

Demonstrate uploading an Aloha model with the Jetson, then publishing the same model for edge deployment with the Jetson accelerator inherited from the model.

Resources

This tutorial provides the following:

  • Models:
    • models/alohacnnlstm.zip: An open source model based on the Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.

Prerequisites

  • A deployed Wallaroo instance with Edge Registry Services and Edge Observability enabled.
  • The following Python libraries installed:
    • wallaroo: The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
    • pandas: Pandas, mainly used for Pandas DataFrame
    • json: Used for format input data for inference requests.

Steps

  • Upload the model with the targeted accelerator left as None by default.
  • Create the pipeline add the model as a model step.
  • Deploy the model with deployment configuration and show the acceleration setting inherits the model’s accelerator.
  • Publish the pipeline an OCI registry and show the publish pipeline deployment configuration inherit’s the model’s accelerator.

Import Libraries

The first step will be to import our libraries, and set variables used through this tutorial.

import wallaroo
from wallaroo.object import EntityNotFoundError

# to display dataframe tables
from IPython.display import display
# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)
import pyarrow as pa

Connect to the Wallaroo Instance

The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

# Login through local Wallaroo instance

wl = wallaroo.Client()

Create Workspace

We will create a workspace to manage our pipeline and models. The following variables will set the name of our sample workspace then set it as the current workspace.

Workspace, pipeline, and model names should be unique to each user, so we’ll add in a randomly generated suffix so multiple people can run this tutorial in a Wallaroo instance without effecting each other.

workspace_name = 'accelerator-aloha-demonstration'

workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)
{'name': 'optimization-aloha-demonstration', 'id': 2900937, 'archived': False, 'created_by': 'b4a9aa3d-83fc-407a-b4eb-37796e96f1ef', 'created_at': '2024-04-01T21:20:03.337377+00:00', 'models': [{'name': 'aloha', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2024, 4, 1, 21, 20, 4, 427650, tzinfo=tzutc()), 'created_at': datetime.datetime(2024, 4, 1, 21, 20, 4, 427650, tzinfo=tzutc())}], 'pipelines': [{'name': 'aloha-pipeline', 'create_time': datetime.datetime(2024, 4, 1, 21, 20, 7, 281931, tzinfo=tzutc()), 'definition': '[]'}]}

Set Model Accelerator

For our example, we will upload the model. The file name is ./models/alohacnnlstm.zip and the model will be called aloha.

Models are uploaded to Wallaroo via the wallaroo.client.upload_model method which takes the following arguments:

ParameterTypeDescription
pathString (Required)The file path to the model.
frameworkwallaroo.framework.Framework (Required)The model’s framework. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for supported model frameworks.
input_schemapyarrow.lib.Schema (Optional)The model’s input schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details.
output_schemapyarrow.lib.Schema (Optional)The model’s output schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details.
convert_waitbool (Optional)Whether to wait in the SDK session to complete the auto-packaging process for non-native Wallaroo frameworks.
archwallaroo.engine_config.Architecture (Optional)The targeted architecture for the model. Options are
  • X86 (Default)
  • ARM
accelwallaroo.engine_config.Acceleration (Optional)The targeted optimization for the model. Options are
  • None: The default acceleration, used for all scenarios and architectures.
  • Aio:Compatible only with the ARM architecture.
  • Jetson: Compatible only with the ARM architecture.
  • CUDA: Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

We upload the model and set set the accel to wallaroo.engine_config.Acceleration.Jetson.


model_name = 'aloha'
model_file_name = './models/alohacnnlstm.zip'

from wallaroo.framework import Framework
from wallaroo.engine_config import Architecture, Acceleration

model = wl.upload_model(model_name, 
                        model_file_name,
                        framework=Framework.TENSORFLOW,
                        arch=Architecture.ARM,
                        accel=Acceleration.Jetson,
                        )

Display Model Details

Once the model is uploaded, we view the model details to verify the accel setting it set to Jetson.

model
Namealoha
Versionc8b7497f-0ef0-4336-b0d9-e608f4b11657
File Namealohacnnlstm.zip
SHAd71d9ffc61aaac58c2b1ed70a2db13d1416fb9d3f5b891e5e4e2e97180fe22f8
Statusready
Image PathNone
Architecturearm
Accelerationjetson
Updated At2024-02-Apr 17:58:13

Create the Pipeline

With the model uploaded, we build our pipeline and add the Aloha model as a pipeline step.

pipeline_name = 'aloha-pipeline'

aloha_pipeline = wl.build_pipeline(pipeline_name)

_ = aloha_pipeline.add_model_step(model)

Set Accelerator for Pipeline Publish

Publishing the pipeline uses the pipeline wallaroo.pipeline.Pipeline.publish() command. This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

The deployment configuration for the pipeline publish inherits the model’s accelerator and architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator or architecture settings.

Pipelines do not need to be deployed in the centralized Wallaroo Ops instance before publishing the pipeline. This is useful in multicloud deployments to edge devices with different hardware accelerators than the centralized Wallaroo Ops instance.

To change the model architecture or acceleration settings, upload the model as a new model or model version with the new architecture or acceleration settings.

For this example, we will publish the pipeline twice:

  • Publish the pipeline with a default deployment configuration.
  • Publish the pipeline with the cpu and memory specified.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

from wallaroo.deployment_config import DeploymentConfigBuilder

deploy_config = wallaroo.DeploymentConfigBuilder().build()

aloha_pipeline.publish(deployment_config=deploy_config)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing...................... Published.
ID78
Pipeline Namealoha-pipeline
Pipeline Version3db319b4-c0b5-47a1-94d3-1931a38cb3f9
StatusPublished
Engine URLsample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833
Pipeline URLsample.registry.example.com/uat/pipelines/aloha-pipeline:3db319b4-c0b5-47a1-94d3-1931a38cb3f9
Helm Chart URLoci://sample.registry.example.com/uat/charts/aloha-pipeline
Helm Chart Referencesample.registry.example.com/uat/charts@sha256:60b9d5e44f4fd7adcc4a5296d497bc009cec67d919fe5bafdc3b9fa3768224fb
Helm Chart Version0.0.1-3db319b4-c0b5-47a1-94d3-1931a38cb3f9
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}, 'enginelb': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'x86', 'gpu': False}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-02 17:58:14.636536+00:00
Updated At2024-04-02 17:58:14.636536+00:00
Replaces
Docker Run Command
docker run \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/aloha-pipeline:3db319b4-c0b5-47a1-94d3-1931a38cb3f9 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833

Note: Please set the OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/aloha-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-3db319b4-c0b5-47a1-94d3-1931a38cb3f9 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

We publish the pipeline again, this time changing the number of cpus and memory for the deployment configuration.

from wallaroo.deployment_config import DeploymentConfigBuilder

deploy_config_custom = (wallaroo.DeploymentConfigBuilder()
                     .replica_count(1)
                     .cpus(1)
                     .memory("1Gi")
                     .build()
                    )

aloha_pipeline.publish(deployment_config=deploy_config_custom)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing...................... Published.
ID79
Pipeline Namealoha-pipeline
Pipeline Versione10ec783-5f55-4c28-b8d9-330ddda91474
StatusPublished
Engine URLsample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833
Pipeline URLsample.registry.example.com/uat/pipelines/aloha-pipeline:e10ec783-5f55-4c28-b8d9-330ddda91474
Helm Chart URLoci://sample.registry.example.com/uat/charts/aloha-pipeline
Helm Chart Referencesample.registry.example.com/uat/charts@sha256:214908ea92d651121cb18bebfe97efd26344562d8ddfbec3b3618d3c68312ba9
Helm Chart Version0.0.1-e10ec783-5f55-4c28-b8d9-330ddda91474
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}, 'enginelb': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'none', 'arch': 'x86', 'gpu': False}}}
User Images[]
Created Byjohn.hummel@wallaroo.ai
Created At2024-04-02 18:00:03.706533+00:00
Updated At2024-04-02 18:00:03.706533+00:00
Replaces
Docker Run Command
docker run \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/aloha-pipeline:e10ec783-5f55-4c28-b8d9-330ddda91474 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4833

Note: Please set the OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/aloha-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-e10ec783-5f55-4c28-b8d9-330ddda91474 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

ML models published to OCI registries via the Wallaroo SDK are provided with the Docker Run Command: a sample docker script for deploying the model on edge and multicloud environments.

For ML models deployed on Jetson accelerated hardware via Docker, the application docker is replace by the nvidia-docker application. For details on installing nvidia-docker, see Installing the NVIDIA Container Toolkit. For example:

nvidia-docker run -v $PERSISTENT_VOLUME_DIR:/persist \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/sample-edge-deploy:446aeed9-2d52-47ae-9e5c-f2a05ef0d4d6\
    -e EDGE_BUNDLE=abc123 \
    ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:2024.1.0-5097