Intel OpenVINO
Table of Contents
The following details deploying models for the Intel OpenVINO AI accelerators.
Upload Model
Accelerators are set during model upload and packaging. To change the accelerator used with a model, re-upload the model with the new accelerator setting for maximum compatibility and support.
Models uploaded to Wallaroo have the accelerator set via the wallaroo.client.upload_model method’s Optional accel: wallaroo.engine_config.Acceleration parameter. For more details on model uploads and other parameters, see Automated Model Packaging.
The following shows the process of uploading, deploying, publishing, and edge deploying a model with Wallaroo for X86 deployment with Intel OpenVino acceleration.
The first step is to upload the model. In this step the AI Accelerator is set.
model = wl.upload_model(name='sample_model',
path='model_file_path',
framework=framework,
input_schema=input_schema,
output_schema=output_schema,
accel=Acceleration.OpenVINO)
Deploy and Infer in the Cloud
Models deployed in the cloud in the Wallaroo Ops center take the following steps:
- Create a pipeline and add the model as a pipeline step.
- Deploy the model. This allocates resources for the models use.
When deploying a model, the deployment configurations inherits the model’s accelerator and architecture settings. Other settings, such as the number of CPUs, amount of RAM, etc can be changed without modifying the accelerator setting are modified with the deployment configuration; for full details, see Pipeline Deployment Configuration. If no deployment configuration is provided, then the default resource allocations are used.
To change the accelerator settings, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure. For more information on uploading models or new model versions, see Model Accelerator at Model Upload.
The following is a generic template for deploying models in Wallaroo with a deployment configuration.
# create the pipeline
pipeline = wl.build_pipeline(pipeline_name={Pipeline Name})
# add the model as a pipeline step
pipeline.add_model_step(model)
# create a deployment configuration
deployment_config = wallaroo.DeploymentConfigBuilder() \
.cpus({Number of CPUs}) \
.memory("{Amount of RAM}") \
.build()
# deploy the model with the deployment configuration
pipeline.deploy(deployment_config = deployment_config)
Deploy in Edge and Multi-cloud Environments
Deploying a model in an Edge and Multi-cloud environment takes two steps after uploading the model:
- Publish the Model: The model with its AI Accelerator settings is published to an OCI (Open Container Initiative) Registry.
- Deploy the Model on Edge: The model is deployed in an environment with hardware matching the AI Accelerator and architecture via
dockerorhelm.
Publish the Model
Publishing the pipeline to uses the method [wallaroo.pipeline.publish(deployment_config)].
This requires that the Wallaroo Ops has Edge Registry Services enabled.
The deployment configuration for the pipeline publish inherits the model’s accelerator. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator settings.
A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration.
Pipelines do not need to be deployed in the Wallaroo Ops Center before publishing the pipeline. This is useful in multicloud deployments to edge devices with different hardware accelerators than the Wallaroo Ops instance.
To change the model acceleration settings, upload the model as a new model or model version with the new acceleration settings.
For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.
The Wallaroo SDK publish includes the docker run and the helm run fields. These provide “copy and paste” scripts for deploying the model in edge and multi-cloud environments.
The following template shows publishing the model to the OCI (Open Container Initiative) Registry associated with the Wallaroo Ops Center, and the abbreviated output.
pipeline.publish(deployment_config=deployment_config)
Waiting for pipeline publish… It may take up to 600 sec.
Pipeline is publishing……. Published.
| ID | 30 | |
| Pipeline Name | sample-pipeline | |
| Pipeline Version | cbd1bb61-0711-4fef-b792-dfb427899935 | |
| Status | Published | |
| Engine URL | sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2025.2.2-6555 | |
| Pipeline URL | sample.registry.example.com/uat/pipelines/sample-pipeline:cbd1bb61-0711-4fef-b792-dfb427899935 | |
| Helm Chart URL | oci://sample.registry.example.com/uat/charts/sample-pipeline | |
| Helm Chart Reference | sample.registry.example.com/uat/charts@sha256:95981154bedd6af5b13557f98989f2f5da4520a0eea66a3701da9195a6056728 | |
| Helm Chart Version | 0.0.1-cbd1bb61-0711-4fef-b792-dfb427899935 | |
| Engine Config | {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'openvino', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}} | |
| User Images | [] | |
| Created By | example.user@wallaroo.ai | |
| Created At | 2024-06-25 18:35:52.841129+00:00 | |
| Updated At | 2024-06-25 18:35:52.841129+00:00 | |
| Replaces | ||
| Docker Run Command |
Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables. | |
| Podman Run Command |
Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables. | |
| Helm Install Command |
Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE,
OCI_USERNAME, and OCI_PASSWORD environment variables. |
ML models deployed on OpenVino accelerated hardware via Docker must include the following options:
--rm -it --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* ) --ulimit nofile=262144:262144 --cap-add=sys_nice
For the example above, this becomes:
docker run \
--rm -it --device /dev/dri \
--group-add=$(stat -c "%g" /dev/dri/render* ) \
--ulimit nofile=262144:262144 --cap-add=sys_nice \
-p $EDGE_PORT:8080 \
-e OCI_USERNAME=$OCI_USERNAME \
-e OCI_PASSWORD=$OCI_PASSWORD \
-e PIPELINE_URL=sample.registry.example.com/uat/pipelines/sample-pipeline:cbd1bb61-0711-4fef-b792-dfb427899935 \
-e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2024.2.0-main-530
Tutorials
The following examples are available to demonstrate uploading and publishing models with OpenVINO support.