Inference with Acceleration Libraries: Deploy on AIO Example
How to use deploy models on AIO.
Wallaroo supports deploying models that run with hardware accelerators that increase the inference speed and performance.
Deploying models with AI hardware accelerators through through Wallaroo uses the following process depending on whether the model is deployed in Wallaroo, or deployed in an edge or multi-cloud environment.
The following prerequisites must be met before uploading and deploying models with hardware accelerators.
The following accelerators are supported:
Accelerator | ARM Support | X64/X86 Support | Intel GPU | Nvidia GPU | Description |
---|---|---|---|---|---|
None | N/A | N/A | N/A | N/A | The default acceleration, used for all scenarios and architectures. |
AIO | √ | X | X | X | AIO acceleration for Ampere Optimized trained models, only available with ARM processors. |
Jetson | √ | X | X | √ | Nvidia Jetson acceleration used with edge deployments with ARM processors. |
CUDA | √ | √ | X | √ | Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. Intended for deployment with Nvidia GPUs. |
OpenVINO | X | √ | √ | X | Intel OpenVino acceleration. AI Accelerator from Intel compatible with x86/64 architectures. Aimed at edge and multi-cloud deployments either with or without Intel GPUs. |
Accelerators are set during model upload and packaging. To change the accelerator used with a model, re-upload the model with the new accelerator setting for maximum compatibility and support.
Models uploaded to Wallaroo have the accelerator set via the wallaroo.client.upload_model
method’s Optional accel: wallaroo.engine_config.Acceleration
parameter. For more details on model uploads and other parameters, see Automated Model Packaging.
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | The framework of the model from wallaroo.framework |
input_schema | pyarrow.lib.Schema
| The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema
| The output schema in Apache Arrow schema format. |
arch | wallaroo.engine_config.Architecture (Optional) | The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include:
|
accel | wallaroo.engine_config.Acceleration (Optional) | The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
|
The following is a generic template for uploading models with the Wallaroo SDK.
model = wl.upload_model(name={Model Name}
path={Model File Path}
framework=Framework.{Wallaroo Framework},
arch=wallaroo.engine_config.Architecture.{X86 | ARM}, # defaults to X86
accel=wallaroo.engine_config.Acceleration.{Accelerator}) # defaults to None
Models deployed in the cloud in the Wallaroo Ops center take the following steps:
When deploying a model, the deployment configurations inherits the model’s accelerator and architecture settings. Other settings, such as the number of CPUs, amount of RAM, etc can be changed without modifying the accelerator setting are modified with the deployment configuration; for full details, see Pipeline Deployment Configuration. If no deployment configuration is provided, then the default resource allocations are used.
To change the accelerator settings, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure. For more information on uploading models or new model versions, see Model Accelerator at Model Upload.
The following is a generic template for deploying models in Wallaroo with a deployment configuration.
# create the pipeline
pipeline = wl.build_pipeline(pipeline_name={Pipeline Name})
# add the model as a pipeline step
pipeline.add_model_step(model)
# create a deployment configuration
deployment_config = wallaroo.DeploymentConfigBuilder() \
.cpus({Number of CPUs}) \
.memory("{Amount of RAM}") \
.build()
# deploy the model with the deployment configuration
pipeline.deploy(deployment_config = deployment_config)
Deploying a model in an Edge and Multi-Cloud environment takes two steps after uploading the model:
docker
or helm
.Publishing the pipeline to uses the method [wallaroo.pipeline.publish(deployment_config)
].
This requires that the Wallaroo Ops has Edge Registry Services enabled.
The deployment configuration for the pipeline publish inherits the model’s accelerator. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator settings.
A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration.
Pipelines do not need to be deployed in the Wallaroo Ops Center before publishing the pipeline. This is useful in multicloud deployments to edge devices with different hardware accelerators than the Wallaroo Ops instance.
To change the model acceleration settings, upload the model as a new model or model version with the new acceleration settings.
For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.
The Wallaroo SDK publish includes the docker run
and the helm run
fields. These provide “copy and paste” scripts for deploying the model in edge and multi-cloud environments.
The following template shows publishing the model to the OCI (Open Container Initiative) Registry associated with the Wallaroo Ops Center, and the abbreviated output.
# default deployment configuration
publish = pipeline.publish(deployment_config=deployment_config)
display(publish)
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.
ID | 87 | |
...(additional rows) | ||
Docker Run Command |
Note: Please set the EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
Models published via the Wallaroo SDK return docker run
and helm install
template for deploying the model on edge and multi-cloud environments. The docker run
commands require the following environmental variables set before executing:
$PERSISTENT_VOLUME_DIR
(Optional: Used with Edge and Multi-Cloud Observability): The directory path for the model deployment’s persistent volume. This stores session and other data for connecting back to the Wallaroo instance for inference logs and other uses.$EDGE_PORT
: The network port used to submit inference requests to the deployed model.$OCI_USERNAME
: The user name or identifier to authenticate to the OCI (Open Container Initiative) Registry where the model was published.$OCI_PASSWORD
: The password or token to authenticate to the OCI (Open Container Initiative) Registry where the model was published.The following docker run
template is returned by Wallaroo during pipeline publish for deploying the model in an edge and multi-cloud environment.
docker run -v $PERSISTENT_VOLUME_DIR:/persist \
-p $EDGE_PORT:8080 \
-e OCI_USERNAME=$OCI_USERNAME \
-e OCI_PASSWORD=$OCI_PASSWORD \
-e PIPELINE_URL={PIPELINE_URL}:{PIPELINE_VERSION}\
{Wallaroo_Engine_URL}:{WALLAROO_ENGINE_VERSION}
The following helm install
template is returned by Wallaroo during pipeline publish for deploying the model in an edge and multi-cloud environment.
helm install --atomic $HELM_INSTALL_NAME \
{HELM CHART REGISTRY URL} \
--namespace $HELM_INSTALL_NAMESPACE \
--version {HELM VERSION} \
--set ociRegistry.username=$OCI_USERNAME \
--set ociRegistry.password=$OCI_PASSWORD
Deploying ML Models with Intel OpenVINO hardware with Intel GPUs in edge and multi-cloud environments via docker run
require additional parameters.
For more details, see:
For ML models deployed on OpenVino hardware with Intel GPUs, docker run
must include the following options:
--rm -it --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* ) --ulimit nofile=262144:262144 --cap-add=sys_nice
For example, the following docker run
templates demonstrates deploying a Wallaroo published model on OpenVINO hardware with Intel GPUs:
docker run -v $PERSISTENT_VOLUME_DIR:/persist \
--rm -it --device /dev/dri \
--group-add=$(stat -c "%g" /dev/dri/render* ) \
--ulimit nofile=262144:262144 --cap-add=sys_nice \
-p $EDGE_PORT:8080 \
-e OCI_USERNAME=$OCI_USERNAME \
-e OCI_PASSWORD=$OCI_PASSWORD \
-e PIPELINE_URL={PIPELINE_URL}:{PIPELINE_VERSION} \
{Wallaroo_Engine_URL}:{WALLAROO_ENGINE_VERSION}
ML models published to OCI registries via the Wallaroo SDK are provided with the Docker Run Command: a sample docker
script for deploying the model on edge and multicloud environments. For more details, see Edge and Multicloud Model Publish and Deploy.
For ML models deployed on Jetson accelerated hardware via Docker, the application docker
is replace by the nvidia-docker
application. For details on installing nvidia-docker
, see Installing the NVIDIA Container Toolkit. For example:
nvidia-docker run -v $PERSISTENT_VOLUME_DIR:/persist \
-e OCI_USERNAME=$OCI_USERNAME \
-e OCI_PASSWORD=$OCI_PASSWORD \
-e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/sample-edge-deploy:446aeed9-2d52-47ae-9e5c-f2a05ef0d4d6\
-e EDGE_BUNDLE=abc123 \
ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:2024.2.1
If the specified hardware accelerator or infrastructure is not available in the Wallaroo Ops cluster during deployment, the following error message is displayed:
The following examples are available to demonstrate uploading and publishing models with hardware accelerator support.
How to use deploy models on AIO.
How to use deploy models on CUDA.
How to use deploy models on Jetson.
How to use deploy models on OpenVINO.
How to use deploy models on OpenVINO with Intel GPUs.