Run AI Workloads with Hardware Accelerators: Aloha Tutorial
Features:
Models:
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
Run Anywhere With Acceleration Tutorial: Aloha Model
Wallaroo supports deploying models with accelerators that increase the inference speed and performance. These accelerators are set during model upload, and are carried through to model deployment and model edge deployment.
Supported Accelerators
The following accelerators are supported:
Accelerator | Description |
---|---|
None | The default acceleration, used for all scenarios and architectures. |
Aio | Compatible only with the ARM architecture. |
Jetson | Compatible only with the ARM architecture. |
CUDA | Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs. |
Goal
Demonstrate uploading an Aloha model with the Jetson
, then publishing the same model for edge deployment with the Jetson
accelerator inherited from the model.
Resources
This tutorial provides the following:
- Models:
models/alohacnnlstm.zip
: An open source model based on the Aloha CNN LSTM model for classifying Domain names as being either legitimate or being used for nefarious purposes such as malware distribution.
Prerequisites
- A deployed Wallaroo instance with Edge Registry Services and Edge Observability enabled.
- The following Python libraries installed:
Steps
- Upload the model with the targeted accelerator left as
None
by default. - Create the pipeline add the model as a model step.
- Deploy the model with deployment configuration and show the acceleration setting inherits the model’s accelerator.
- Publish the pipeline an OCI registry and show the publish pipeline deployment configuration inherit’s the model’s accelerator.
Import Libraries
The first step will be to import our libraries, and set variables used through this tutorial.
import wallaroo
from wallaroo.object import EntityNotFoundError
# to display dataframe tables
from IPython.display import display
# used to display dataframe information without truncating
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
import pyarrow as pa
Connect to the Wallaroo Instance
The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. For more information on Wallaroo Client settings, see the Client Connection guide.
# Login through local Wallaroo instance
wl = wallaroo.Client()
Create Workspace
We will create a workspace to manage our pipeline and models. The following variables will set the name of our sample workspace then set it as the current workspace.
Workspace, pipeline, and model names should be unique to each user, so we’ll add in a randomly generated suffix so multiple people can run this tutorial in a Wallaroo instance without effecting each other.
workspace_name = 'accelerator-aloha-demonstration'
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)
wl.set_current_workspace(workspace)
{'name': 'accelerator-aloha-demonstration', 'id': 41, 'archived': False, 'created_by': 'eed2002f-769f-4cbd-a189-8ca1e9bf496c', 'created_at': '2024-04-19T17:03:13.108611+00:00', 'models': [], 'pipelines': []}
Set Model Accelerator
For our example, we will upload the model. The file name is ./models/alohacnnlstm.zip
and the model will be called aloha
.
Models are uploaded to Wallaroo via the wallaroo.client.upload_model
method which takes the following arguments:
Parameter | Type | Description |
---|---|---|
path | String (Required) | The file path to the model. |
framework | wallaroo.framework.Framework (Required) | The model’s framework. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for supported model frameworks. |
input_schema | pyarrow.lib.Schema (Optional) | The model’s input schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details. |
output_schema | pyarrow.lib.Schema (Optional) | The model’s output schema. **Only required for non-Native Wallaroo frameworks. See Wallaroo SDK Essentials Guide: Model Uploads and Registrations for more details. |
convert_wait | bool (Optional) | Whether to wait in the SDK session to complete the auto-packaging process for non-native Wallaroo frameworks. |
arch | wallaroo.engine_config.Architecture (Optional) | The targeted architecture for the model. Options are
|
accel | wallaroo.engine_config.Acceleration (Optional) | The targeted optimization for the model. Options are
|
We upload the model and set set the accel
to wallaroo.engine_config.Acceleration.Jetson
.
model_name = 'aloha'
model_file_name = './models/alohacnnlstm.zip'
from wallaroo.framework import Framework
from wallaroo.engine_config import Architecture, Acceleration
model = wl.upload_model(model_name,
model_file_name,
framework=Framework.TENSORFLOW,
arch=Architecture.ARM,
accel=Acceleration.Jetson,
)
Display Model Details
Once the model is uploaded, we view the model details to verify the accel
setting it set to Jetson
.
model
Name | aloha |
Version | 4f42d63e-aff1-4eb2-997a-eef23dc5582d |
File Name | alohacnnlstm.zip |
SHA | d71d9ffc61aaac58c2b1ed70a2db13d1416fb9d3f5b891e5e4e2e97180fe22f8 |
Status | ready |
Image Path | None |
Architecture | arm |
Acceleration | jetson |
Updated At | 2024-19-Apr 17:03:13 |
Create the Pipeline
With the model uploaded, we build our pipeline and add the Aloha model as a pipeline step.
pipeline_name = 'aloha-pipeline'
aloha_pipeline = wl.build_pipeline(pipeline_name)
_ = aloha_pipeline.add_model_step(model)
Set Accelerator for Pipeline Publish
Publishing the pipeline uses the pipeline wallaroo.pipeline.Pipeline.publish()
command. This requires that the Wallaroo Ops instance have Edge Registry Services enabled.
The deployment configuration for the pipeline publish inherits the model’s accelerator and architecture. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator or architecture settings.
Pipelines do not need to be deployed in the centralized Wallaroo Ops instance before publishing the pipeline. This is useful in deployments to edge devices with different hardware accelerators than the centralized Wallaroo Ops instance.
To change the model architecture or acceleration settings, upload the model as a new model or model version with the new architecture or acceleration settings.
For this example, we will publish the pipeline twice:
- Publish the pipeline with a default deployment configuration.
- Publish the pipeline with the cpu and memory specified.
Note that in both examples, the architecture and the acceleration inherits the model’s settings.
For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.
from wallaroo.deployment_config import DeploymentConfigBuilder
deploy_config = wallaroo.DeploymentConfigBuilder().build()
aloha_pipeline.publish(deployment_config=deploy_config)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing.................... Published.
ID | 2 | |
Pipeline Name | aloha-pipeline | |
Pipeline Version | f4336f6f-808f-4e46-b8bd-c0e2a407cb01 | |
Status | Published | |
Engine URL | sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4963 | |
Pipeline URL | sample.registry.example.com/uat/pipelines/aloha-pipeline:f4336f6f-808f-4e46-b8bd-c0e2a407cb01 | |
Helm Chart URL | oci://sample.registry.example.com/uat/charts/aloha-pipeline | |
Helm Chart Reference | sample.registry.example.com/uat/charts@sha256:acf1102592e193c7bfc5fa867856d81f8619f9205a52860aad7d1a671af7e853 | |
Helm Chart Version | 0.0.1-f4336f6f-808f-4e46-b8bd-c0e2a407cb01 | |
Engine Config | {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}} | |
User Images | [] | |
Created By | john.hummel@wallaroo.ai | |
Created At | 2024-04-19 17:03:14.326070+00:00 | |
Updated At | 2024-04-19 17:03:14.326070+00:00 | |
Replaces | ||
Docker Run Command |
Note: Please set the EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
We publish the pipeline again, this time changing the number of cpus and memory for the deployment configuration.
from wallaroo.deployment_config import DeploymentConfigBuilder
deploy_config_custom = (wallaroo.DeploymentConfigBuilder()
.replica_count(1)
.cpus(1)
.memory("1Gi")
.build()
)
aloha_pipeline.publish(deployment_config=deploy_config_custom)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing................ Published.
ID | 3 | |
Pipeline Name | aloha-pipeline | |
Pipeline Version | d261c45e-de85-4997-971c-2bdcaf406014 | |
Status | Published | |
Engine URL | sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.1.0-main-4963 | |
Pipeline URL | sample.registry.example.com/uat/pipelines/aloha-pipeline:d261c45e-de85-4997-971c-2bdcaf406014 | |
Helm Chart URL | oci://sample.registry.example.com/uat/charts/aloha-pipeline | |
Helm Chart Reference | sample.registry.example.com/uat/charts@sha256:b9e27a601d69deb4c79bc46d1a1597b2ec1cbd5cbf67cbe5e46f5aa88536d361 | |
Helm Chart Version | 0.0.1-d261c45e-de85-4997-971c-2bdcaf406014 | |
Engine Config | {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}} | |
User Images | [] | |
Created By | john.hummel@wallaroo.ai | |
Created At | 2024-04-19 17:04:51.595162+00:00 | |
Updated At | 2024-04-19 17:04:51.595162+00:00 | |
Replaces | ||
Docker Run Command |
Note: Please set the EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
ML models published to OCI registries via the Wallaroo SDK are provided with the Docker Run Command: a sample docker
script for deploying the model on edge and multicloud environments.
For ML models deployed on Jetson accelerated hardware via Docker, the docker
command is replace by the docker run --runtime nvidia --privileged --gpus all
application. For details on installing Nvidia Container Toolkit, see Installing the NVIDIA Container Toolkit. For example:
docker run --runtime nvidia --privileged --gpus all \
-v $PERSISTENT_VOLUME_DIR:/persist \
-e OCI_USERNAME=$OCI_USERNAME \
-e OCI_PASSWORD=$OCI_PASSWORD \
-e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/sample-edge-deploy:446aeed9-2d52-47ae-9e5c-f2a05ef0d4d6\
-e EDGE_BUNDLE=abc123 \
ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-jetson:v2024.4.0-5849