Inference with Acceleration Libraries: Deploy on Jetson Example

How to use deploy models on Jetson.

The following shows the process of uploading, deploying, publishing, and edge deploying a model with Wallaroo for ARM deployment with Nvidia Jetson acceleration . The example uses a Computer Vision Resnet50 model in the ONNX framework.

The first step is to upload the model, setting the architecture and AI accelerator.

model = wl.upload_model(
                        name="computer-vision-resnet50-arm",
                        path= './models/frcnn-resnet.pt.onnx', 
                        framework=Framework.ONNX,
                        arch=wallaroo.engine_config.Architecture.ARM,
                        accel=wallaroo.engine_config.Acceleration.Jetson)

display(model)

Name	computer-vision-resnet50-arm
Version	47743b5f-c88a-4150-a37f-9ad591eb4ee3
File Name	frcnn-resnet.pt.onnx
SHA	43326e50af639105c81372346fb9ddf453fea0fe46648b2053c375360d9c1647
Status	ready
Image Path	None
Architecture	arm
Acceleration	jetson
Updated At	2024-03-Apr 22:13:40

With the model uploaded, we deploy it by:

Adding the model to a pipeline as a pipeline step.
Setting the deployment configuration - the resources allocated to the model from the cluster. For this example, we allocate 1 CPU and 2 Gi RAM. Note that we do not specify what type of accelerator or processor architecture is used - this is set at the model level.
Deploying the model. At this point, the model is ready to accept inference requests until it is undeployed.

pipeline_arm = wl.build_pipeline('acceleration-demonstration-arm')
pipeline_arm.clear()
pipeline_arm.add_model_step(arm_model)

deployment_config = wallaroo.DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(1) \
    .memory("2Gi") \
    .build()

pipeline_arm.deploy(deployment_config = deployment_config)
display(pipeline_arm)

name	acceleration-demonstration-arm
created	2024-04-01 18:36:26.347071+00:00
last_updated	2024-04-03 22:14:42.912284+00:00
deployed	True
arch	arm
accel	aio
tags
versions	18329c99-4b9c-4a15-bc93-42e4d6b93fff, 2f1aa87e-edc2-4af7-8821-00ba54abf18e, 4c8ab1b1-f9c8-49d9-846a-54cad3a18b56, cbc520f2-5755-4f6b-8e89-b4374cb95fdf, 59ff6719-67f1-4359-a6b3-5565b9f6dc09, 39b91147-3b73-4f1a-a25f-500ef648bd6a, 45c0e8ba-b35d-4139-9675-aa7ffcc04dfc, 2d561d88-31f6-43c3-a84d-38cc1cd53fb8, ef9e2394-e29f-46dc-aaa4-eda0a304a71e, fe2b6f05-3623-4440-8258-9e5828bc7eaf, aa86387c-813a-40de-b07a-baf388e20d67
steps	computer-vision-resnet50-arm
published	True

Publishing the model stores a copy of the model and the inference engine in an OCI (Open Container Initiative) Registry that is set by the Wallaroo platform operations administrator. Once published, it is ready for deployment in any edge or multi-cloud environment with the same AI Accelerator and Architecture settings.

A template of the docker run command is included with the publish return.

We now publish the pipeline. Note that the Engine Config inherited the acceleration from the model.

# default deployment configuration
pub_arm = pipeline_arm.publish(deployment_config=deployment_config)
display(pub_arm)

Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.

ID 87

Pipeline Name acceleration-demonstration-arm

Pipeline Version 890b56ee-2a0e-4ed1-ae96-c021ca801a7e

Status Published

Engine URL registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.2.0-main-4870

Pipeline URL registry.example.com/uat/pipelines/acceleration-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e

Helm Chart URL oci://registry.example.com/uat/charts/acceleration-demonstration-arm

Helm Chart Reference registry.example.com/uat/charts@sha256:15c50483f2010e2691d32d32ded595f20993fa7b043474962b0fa2b509b61510

Helm Chart Version 0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e

Engine Config {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'jetson', 'arch': 'arm', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}

User Images []

Created By john.hummel@wallaroo.ai

Created At 2024-04-03 22:17:03.122597+00:00

Updated At 2024-04-03 22:17:03.122597+00:00

Replaces

Docker Run Command

docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=registry.example.com/uat/pipelines/acceleration-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    -e CONFIG_CPUS=1 registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.2.0-main-4870

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.

Helm Install Command

helm install --atomic $HELM_INSTALL_NAME \
    oci://registry.example.com/uat/charts/acceleration-demonstration-arm \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Once published, the model is deployed on edge or multi-cloud environments through the docker run template. Before deploying, the following environmental variables are set:

$EDGE_PORT: The network port used to submit inference requests to the deployed model.
$OCI_USERNAME: The user name or identifier to authenticate to the OCI (Open Container Initiative) Registry where the model was published.
$OCI_PASSWORD: The password or token to authenticate to the OCI (Open Container Initiative) Registry where the model was published.

For ML models deployed on Jetson accelerated hardware via Docker, the application docker is replace by the nvidia-docker application. For details on installing nvidia-docker, see Installing the NVIDIA Container Toolkit. For example:

nvidia-docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=registry.example.com/uat/pipelines/acceleration-demonstration-arm:890b56ee-2a0e-4ed1-ae96-c021ca801a7e \
    registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-aarch64:v2024.2.0-main-4870

Once deployed, the model is ready to accept inference requests through the specified $EDGE_PORT.