Inference with Acceleration Libraries: Deploy on Intel OpenVINO with Intel GPUs
The following shows the process of uploading, deploying, publishing, and edge deploying a model with Wallaroo for X86 deployment with Intel OpenVino acceleration. For this example, the model is a YoloV8 ONNX model.
The first step is to upload the model. In this step the AI Accelerator is set.
# upload the model
model = wl.upload_model(name='yolov8x-onnx',
path='./models/yolov8x.onnx',
framework=Framework.KERAS,
input_schema=input_schema,
output_schema=output_schema,
accel=Acceleration.OpenVINO)
display(model)
Name | yolov8x-onnx |
Version | 7d893f8a-1800-4ae3-8a09-ae8d5aa6d0e6 |
File Name | yolov8x.onnx |
SHA | f7e49538e38bebe066ce8df97bac8be239ae8c7d2733e500c8cd633706ae95a8 |
Status | ready |
Image Path | proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy-openvino:v2024.2.0-main-5305 |
Architecture | x86 |
Acceleration | openvino |
Updated At | 2024-25-Jun 18:21:42 |
Workspace id | 20 |
Workspace name | example.user@wallaroo.ai - Default Workspace |
- Adding the model to a pipeline as a pipeline step.
- Setting the deployment configuration - the resources allocated to the model from the cluster. For this example, we allocate 1 CPU and 0.5 Gi RAM for the model. Note that we do not specify what type of accelerator or processor architecture is used - this is set at the model level.
- Deploying the model. At this point, the model is ready to accept inference requests until it is undeployed.
# create the pipeline
pipeline = wl.build_pipeline("yolov8x-onnx-ov-bench")
# add the model as a pipeline step
pipeline.add_model_step(model)
# set the deployment configuration
deployment_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
.cpus(0.25).memory('1Gi') \
.sidekick_cpus(model, 1) \
.sidekick_memory(model, '0.5Gi') \
.build()
# deploy the model with the deployment configuration
pipeline.deploy(deployment_config=deployment_config)
Publishing the model stores a copy of the model and the inference engine in an OCI (Open Container Initiative) Registry that is set by the Wallaroo platform operations administrator. Once published, it is ready for deployment in any edge or multi-cloud environment with the same AI Accelerator and Architecture settings.
A template of the docker run
command is included with the publish return.
We now publish the pipeline. Note that the Engine Config
inherited the acceleration from the model.
pipeline.publish(deployment_config=deployment_config)
Waiting for pipeline publish… It may take up to 600 sec.
Pipeline is publishing……. Published.
ID | 30 | |
Pipeline Name | yolov8x-onnx-ov-bench | |
Pipeline Version | cbd1bb61-0711-4fef-b792-dfb427899935 | |
Status | Published | |
Engine URL | sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2024.2.0-main-5305 | |
Pipeline URL | sample.registry.example.com/uat/pipelines/yolov8x-onnx-ov-bench:cbd1bb61-0711-4fef-b792-dfb427899935 | |
Helm Chart URL | oci://sample.registry.example.com/uat/charts/yolov8x-onnx-ov-bench | |
Helm Chart Reference | sample.registry.example.com/uat/charts@sha256:95981154bedd6af5b13557f98989f2f5da4520a0eea66a3701da9195a6056728 | |
Helm Chart Version | 0.0.1-cbd1bb61-0711-4fef-b792-dfb427899935 | |
Engine Config | {'engine': {'resources': {'limits': {'cpu': 1.0, 'memory': '512Mi'}, 'requests': {'cpu': 1.0, 'memory': '512Mi'}, 'accel': 'openvino', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': {}}} | |
User Images | [] | |
Created By | example.user@wallaroo.ai | |
Created At | 2024-06-25 18:35:52.841129+00:00 | |
Updated At | 2024-06-25 18:35:52.841129+00:00 | |
Replaces | ||
Docker Run Command |
Note: Please set the EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
ML models deployed on OpenVino accelerated hardware via Docker must include the following options:
--rm -it --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* ) --ulimit nofile=262144:262144 --cap-add=sys_nice
For the example above, this becomes:
docker run \
--rm -it --device /dev/dri \
--group-add=$(stat -c "%g" /dev/dri/render* ) \
--ulimit nofile=262144:262144 --cap-add=sys_nice \
-p $EDGE_PORT:8080 \
-e OCI_USERNAME=$OCI_USERNAME \
-e OCI_PASSWORD=$OCI_PASSWORD \
-e PIPELINE_URL=sample.registry.example.com/uat/pipelines/yolov8x-onnx-ov-bench:cbd1bb61-0711-4fef-b792-dfb427899935 \
-e CONFIG_CPUS=1 sample.registry.example.com/uat/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini-openvino:v2024.2.0-main-530