Intel OpenVINO

Deployment strategies for the Intel OpenVINO AI accelerators.

The following details deploying models for the Intel OpenVINO AI accelerators.

Upload Model

Accelerators are set during model upload and packaging. To change the accelerator used with a model, re-upload the model with the new accelerator setting for maximum compatibility and support.

Models uploaded to Wallaroo have the accelerator set via the wallaroo.client.upload_model method’s Optional accel: wallaroo.engine_config.Acceleration parameter. For more details on model uploads and other parameters, see Automated Model Packaging.

The following shows the process of uploading, deploying, publishing, and edge deploying a model with Wallaroo for X86 deployment with Intel OpenVino acceleration.

The first step is to upload the model. In this step the AI Accelerator is set.

model = wl.upload_model(name='sample_model', 
                        path='model_file_path', 
                        framework=framework, 
                        input_schema=input_schema, 
                        output_schema=output_schema, 
                        accel=Acceleration.OpenVINO)

Deploy and Infer in the Cloud

Models deployed in the cloud in the Wallaroo Ops center take the following steps:

Create a pipeline and add the model as a pipeline step.
Deploy the model. This allocates resources for the models use.

When deploying a model, the deployment configurations inherits the model’s accelerator and architecture settings. Other settings, such as the number of CPUs, amount of RAM, etc can be changed without modifying the accelerator setting are modified with the deployment configuration; for full details, see Pipeline Deployment Configuration. If no deployment configuration is provided, then the default resource allocations are used.

To change the accelerator settings, models should be re-uploaded as either a new model or a new model version for maximum compatibility with the hardware infrastructure. For more information on uploading models or new model versions, see Model Accelerator at Model Upload.

The following is a generic template for deploying models in Wallaroo with a deployment configuration.

# create the pipeline

pipeline = wl.build_pipeline(pipeline_name={Pipeline Name})

# add the model as a pipeline step

pipeline.add_model_step(model)

# create a deployment configuration

deployment_config = wallaroo.DeploymentConfigBuilder() \
    .cpus({Number of CPUs}) \
    .memory("{Amount of RAM}") \
    .build()

# deploy the model with the deployment configuration
pipeline.deploy(deployment_config = deployment_config)

Deploy in Edge and Multi-cloud Environments

Deploying a model in an Edge and Multi-cloud environment takes two steps after uploading the model:

Publish the Model: The model with its AI Accelerator settings is published to an OCI (Open Container Initiative) Registry.
Deploy the Model on Edge: The model is deployed in an environment with hardware matching the AI Accelerator and architecture via docker or helm.

Publish the Model

Publishing the pipeline to uses the method [wallaroo.pipeline.publish(deployment_config)].

This requires that the Wallaroo Ops has Edge Registry Services enabled.

The deployment configuration for the pipeline publish inherits the model’s accelerator. Options such as the number of cpus, amount of memory, etc can be adjusted without impacting the model’s accelerator settings.

A deployment configuration must be included with the pipeline publish, even if no changes to the cpus, memory, etc are made. For more detail on deployment configurations, see Wallaroo SDK Essentials Guide: Pipeline Deployment Configuration.

Pipelines do not need to be deployed in the Wallaroo Ops Center before publishing the pipeline. This is useful in multicloud deployments to edge devices with different hardware accelerators than the Wallaroo Ops instance.

To change the model acceleration settings, upload the model as a new model or model version with the new acceleration settings.

For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

The Wallaroo SDK publish includes the docker run and the helm run fields. These provide “copy and paste” scripts for deploying the model in edge and multi-cloud environments.

The following template shows publishing the model to the OCI (Open Container Initiative) Registry associated with the Wallaroo Ops Center, and the abbreviated output.

pipeline.publish(deployment_config=deployment_config)

Waiting for pipeline publish… It may take up to 600 sec.
Pipeline is publishing……. Published.

ID 30

Pipeline Name sample-pipeline

Pipeline Version cbd1bb61-0711-4fef-b792-dfb427899935

Status Published

Engine URL sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:vv2025.2.3-6588

Pipeline URL sample.registry.example.com/uat/pipelines/sample-pipeline:cbd1bb61-0711-4fef-b792-dfb427899935

Helm Chart URL oci://sample.registry.example.com/uat/charts/sample-pipeline

Helm Chart Reference sample.registry.example.com/uat/charts@sha256:95981154bedd6af5b13557f98989f2f5da4520a0eea66a3701da9195a6056728

Helm Chart Version 0.0.1-cbd1bb61-0711-4fef-b792-dfb427899935

Engine Config {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'openvino', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}

User Images []

Created By example.user@wallaroo.ai

Created At 2024-06-25 18:35:52.841129+00:00

Updated At 2024-06-25 18:35:52.841129+00:00

Replaces

Docker Run Command

docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/sample-pipeline:cbd1bb61-0711-4fef-b792-dfb427899935 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:vv2025.2.3-6588

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.

Podman Run Command

podman run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/sample-pipeline:cbd1bb61-0711-4fef-b792-dfb427899935 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:vv2025.2.3-6588

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.

Helm Install Command

helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/sample-pipeline \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-cbd1bb61-0711-4fef-b792-dfb427899935 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

ML models deployed on OpenVino accelerated hardware via Docker must include the following options:

--rm -it --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* ) --ulimit nofile=262144:262144 --cap-add=sys_nice

For the example above, this becomes:

docker run \
    --rm -it --device /dev/dri \
    --group-add=$(stat -c "%g" /dev/dri/render* ) \
    --ulimit nofile=262144:262144 --cap-add=sys_nice \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/sample-pipeline:cbd1bb61-0711-4fef-b792-dfb427899935 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2024.2.0-main-530

Tutorials

The following examples are available to demonstrate uploading and publishing models with OpenVINO support.