Inference with Acceleration Libraries: Deploy on Intel OpenVINO

How to use deploy models on OpenVINO.

The following shows the process of uploading, deploying, publishing, and edge deploying a model with Wallaroo for X86 deployment with Intel OpenVino acceleration. For this example, the model is a Keras ML model.

The first step is to upload the model. In this step the AI Accelerator is set.

# define the input and output schemas

input_schema = pa.schema([
    pa.field('input', pa.list_(pa.float64(), list_size=10))
])
output_schema = pa.schema([
    pa.field('output', pa.list_(pa.float64(), list_size=32))
])

# upload the model

model = wl.upload_model(name='ynskerasmodelov', 
                        path='./models/single_io_keras_sequential_model.h5', 
                        framework=Framework.KERAS, 
                        input_schema=input_schema, 
                        output_schema=output_schema, 
                        accel=Acceleration.OpenVINO)
display(model)
  
Nameynskerasmodelov
Version7d893f8a-1800-4ae3-8a09-ae8d5aa6d0e6
File Namesingle_io_keras_sequential_model.h5
SHAf7e49538e38bebe066ce8df97bac8be239ae8c7d2733e500c8cd633706ae95a8
Statusready
Image Pathproxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy-openvino:v2024.2.0-main-5305
Architecturex86
Accelerationopenvino
Updated At2024-25-Jun 18:21:42
Workspace id20
Workspace nameexample.user@wallaroo.ai - Default Workspace
  • Adding the model to a pipeline as a pipeline step.
  • Setting the deployment configuration - the resources allocated to the model from the cluster. For this example, we allocate 1 CPU and 0.5 Gi RAM for the model. Note that we do not specify what type of accelerator or processor architecture is used - this is set at the model level.
  • Deploying the model. At this point, the model is ready to accept inference requests until it is undeployed.

# create the pipeline
pipeline = wl.build_pipeline("kerasaopenvino1")

# add the model as a pipeline step
pipeline.add_model_step(model)

# set the deployment configuration
deployment_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.25).memory('1Gi') \
    .sidekick_cpus(model, 1) \
    .sidekick_memory(model, '0.5Gi') \
    .build()


# deploy the model with the deployment configuration
pipeline.deploy(deployment_config=deployment_config)

Publishing the model stores a copy of the model and the inference engine in an OCI (Open Container Initiative) Registry that is set by the Wallaroo platform operations administrator. Once published, it is ready for deployment in any edge or multi-cloud environment with the same AI Accelerator and Architecture settings.

A template of the docker run command is included with the publish return.

We now publish the pipeline. Note that the Engine Config inherited the acceleration from the model.

pipeline.publish(deployment_config=deployment_config)

Waiting for pipeline publish… It may take up to 600 sec.
Pipeline is publishing……. Published.

ID30
Pipeline Namekerasaopenvino1
Pipeline Versioncbd1bb61-0711-4fef-b792-dfb427899935
StatusPublished
Engine URLsample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2024.2.0-main-5305
Pipeline URLsample.registry.example.com/uat/pipelines/kerasaopenvino1:cbd1bb61-0711-4fef-b792-dfb427899935
Helm Chart URLoci://sample.registry.example.com/uat/charts/kerasaopenvino1
Helm Chart Referencesample.registry.example.com/uat/charts@sha256:95981154bedd6af5b13557f98989f2f5da4520a0eea66a3701da9195a6056728
Helm Chart Version0.0.1-cbd1bb61-0711-4fef-b792-dfb427899935
Engine Config{'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'openvino', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}}
User Images[]
Created Byexample.user@wallaroo.ai
Created At2024-06-25 18:35:52.841129+00:00
Updated At2024-06-25 18:35:52.841129+00:00
Replaces
Docker Run Command
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/kerasaopenvino1:cbd1bb61-0711-4fef-b792-dfb427899935 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2024.2.0-main-5305

Note: Please set the EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://sample.registry.example.com/uat/charts/kerasaopenvino1 \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-cbd1bb61-0711-4fef-b792-dfb427899935 \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

Once published, the model is deployed on edge or multi-cloud environments through the docker run template. Before deploying, the following environmental variables are set:

  • $EDGE_PORT: The network port used to submit inference requests to the deployed model.
  • $OCI_USERNAME: The user name or identifier to authenticate to the OCI (Open Container Initiative) Registry where the model was published.
  • $OCI_PASSWORD: The password or token to authenticate to the OCI (Open Container Initiative) Registry where the model was published.
docker run \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e PIPELINE_URL=sample.registry.example.com/uat/pipelines/kerasaopenvino1:cbd1bb61-0711-4fef-b792-dfb427899935 \
    -e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2024.2.0-main-5305