Model Drift Detection for Edge Deployments: Preparation
Features:
Models:
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
Model Drift Detection for Edge Deployments Tutorial: Preparation
The Model Insights feature lets you monitor how the environment that your model operates within may be changing in ways that affect it’s predictions so that you can intervene (retrain) in an efficient and timely manner. Changes in the inputs, data drift, can occur due to errors in the data processing pipeline or due to changes in the environment such as user preference or behavior.
Wallaroo Run Anywhere allows models to be deployed on edge and other locations, and have their inference result logs uploaded to the Wallaroo Ops center. Wallaroo assays allow for model drift detection to include the inference results from one or more deployment locations and compare any one or multiple locations results against an established baseline.
This notebook is designed to demonstrate the Wallaroo Run Anywhere with Model Drift Observability with Wallaroo Assays. This notebook will walk through the process of:
- Preparation: This notebook focuses on setting up the conditions for model edge deployments to different locations. This includes:
- Setting up a workspace, pipeline, and model for deriving the price of a house based on inputs.
- Performing a sample set of inferences to verify the model deployment.
- Publish the deployed model to an Open Container Initiative (OCI) Registry, and use that to deploy the model to two difference edge locations.
- Model Drift by Location:
- Perform inference requests on each of the model edge deployments.
- Perform the steps in creating an assay:
- Build an assay baseline with a specified location for inference results.
- Preview the assay and show different assay configurations based on selecting the inference data from the Wallaroo Ops model deployment versus the edge deployment.
- Create the assay.
- View assay results.
This notebook focuses on Preparation.
Goal
Model insights monitors the output of the house price model over a designated time window and compares it to an expected baseline distribution. We measure the performance of model deployments in different locations and compare that to the baseline to detect model drift.
Resources
This tutorial provides the following:
- Models:
models/rf_model.onnx
: The champion model that has been used in this environment for some time.- Various inputs:
smallinputs.df.json
: A set of house inputs that tends to generate low house price values.biginputs.df.json
: A set of house inputs that tends to generate high house price values.
Prerequisites
- A deployed Wallaroo instance with Edge Registry Services and Edge Observability enabled.
- The following Python libraries installed:
- A X64 Docker deployment to deploy the model on an edge location.
Steps
- Deploying a sample ML model used to determine house prices based on a set of input parameters.
- Publish the model deployment configuration to an OCI registry.
- Use the publish and set edge locations.
- Deploy the model to two different edge locations.
Import Libraries
The first step will be to import our libraries, and set variables used through this tutorial.
import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework
from IPython.display import display
# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
import datetime
import time
workspace_name = f'run-anywhere-assay-demonstration-tutorial'
main_pipeline_name = f'assay-demonstration-tutorial'
model_name_control = f'house-price-estimator'
model_file_name_control = './models/rf_model.onnx'
# Set the name of the assay
assay_name="ops assay example"
edge_assay_name = "edge assay example"
combined_assay_name = "combined assay example"
# ignoring warnings for demonstration
import warnings
warnings.filterwarnings('ignore')
# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
Connect to the Wallaroo Instance
The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. For more information on Wallaroo Client settings, see the Client Connection guide.
wl = wallaroo.Client()
Create Workspace
We will create a workspace to manage our pipeline and models. The following variables will set the name of our sample workspace then set it as the current workspace.
Workspace, pipeline, and model names should be unique to each user, so we’ll add in a randomly generated suffix so multiple people can run this tutorial in a Wallaroo instance without effecting each other.
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)
wl.set_current_workspace(workspace)
{'name': 'run-anywhere-assay-demonstration-tutorial', 'id': 15, 'archived': False, 'created_by': 'fb2916bc-551e-4a76-88e8-0f7d7720a0f9', 'created_at': '2024-07-30T15:55:03.564943+00:00', 'models': [], 'pipelines': []}
Upload The Champion Model
For our example, we will upload the champion model that has been trained to derive house prices from a variety of inputs. The model file is rf_model.onnx
, and is uploaded with the name house-price-estimator
.
housing_model_control = (wl.upload_model(model_name_control,
model_file_name_control,
framework=Framework.ONNX)
.configure(tensor_fields=["tensor"])
)
Build the Pipeline
This pipeline is made to be an example of an existing situation where a model is deployed and being used for inferences in a production environment. We’ll call it assay-demonstration-tutorial
, set housing_model_control
as a pipeline step, then run a few sample inferences.
This pipeline will be a simple one - just a single pipeline step.
mainpipeline = wl.build_pipeline(main_pipeline_name)
# clear the steps if used before
mainpipeline.clear()
mainpipeline.add_model_step(housing_model_control)
#minimum deployment config
deploy_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(0.5).memory("1Gi").build()
mainpipeline.deploy(deployment_config = deploy_config)
name | assay-demonstration-tutorial |
---|---|
created | 2024-07-30 15:55:05.574830+00:00 |
last_updated | 2024-07-30 15:55:05.980656+00:00 |
deployed | True |
workspace_id | 15 |
workspace_name | run-anywhere-assay-demonstration-tutorial |
arch | x86 |
accel | none |
tags | |
versions | 35614bc9-6a13-4f99-b446-78e03e3e9a65, 1d8c4af6-69d5-4305-a355-37b2a2f07bcb |
steps | house-price-estimator |
published | False |
mainpipeline.status()
{'status': 'Running',
'details': [],
'engines': [{'ip': '10.28.1.26',
'name': 'engine-6fcfc77b76-7r7cb',
'status': 'Running',
'reason': None,
'details': [],
'pipeline_statuses': {'pipelines': [{'id': 'assay-demonstration-tutorial',
'status': 'Running',
'version': '35614bc9-6a13-4f99-b446-78e03e3e9a65'}]},
'model_statuses': {'models': [{'name': 'house-price-estimator',
'sha': 'e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6',
'status': 'Running',
'version': 'e24fe787-af57-46ac-8582-39074c5d5294'}]}}],
'engine_lbs': [{'ip': '10.28.1.25',
'name': 'engine-lb-6b59985857-c6cm7',
'status': 'Running',
'reason': None,
'details': []}],
'sidekicks': []}
Testing
We’ll use two inferences as a quick sample test - one that has a house that should be determined around $700k
, the other with a house determined to be around $1.5
million.
normal_input = pd.DataFrame.from_records({"tensor": [[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]})
result = mainpipeline.infer(normal_input)
display(result)
time | in.tensor | out.variable | anomaly.count | |
---|---|---|---|---|
0 | 2024-07-30 15:55:19.976 | [4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0] | [718013.7] | 0 |
large_house_input = pd.DataFrame.from_records({'tensor': [[4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0, 10.0, 2760.0, 950.0, 47.6696, -122.261, 3970.0, 20000.0, 79.0, 0.0, 0.0]]})
large_house_result = mainpipeline.infer(large_house_input)
display(large_house_result)
time | in.tensor | out.variable | anomaly.count | |
---|---|---|---|---|
0 | 2024-07-30 15:55:20.186 | [4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0, 10.0, 2760.0, 950.0, 47.6696, -122.261, 3970.0, 20000.0, 79.0, 0.0, 0.0] | [1514079.4] | 0 |
Undeploy Main Pipeline
With the examples and examples complete, we will undeploy the main pipeline and return the resources back to the Wallaroo instance.
mainpipeline.undeploy()
name | assay-demonstration-tutorial |
---|---|
created | 2024-07-30 15:55:05.574830+00:00 |
last_updated | 2024-07-30 15:55:05.980656+00:00 |
deployed | False |
workspace_id | 15 |
workspace_name | run-anywhere-assay-demonstration-tutorial |
arch | x86 |
accel | none |
tags | |
versions | 35614bc9-6a13-4f99-b446-78e03e3e9a65, 1d8c4af6-69d5-4305-a355-37b2a2f07bcb |
steps | house-price-estimator |
published | False |
Edge Deployment
We can now deploy the pipeline to an edge device. This will require the following steps:
- Publish the pipeline: Publishes the pipeline to the OCI registry.
- Add Edge: Add the edge location to the pipeline publish.
- Deploy Edge: Deploy the edge device with the edge location settings.
Publish Pipeline
Publishing the pipeline uses the pipeline wallaroo.pipeline.publish()
command. This requires that the Wallaroo Ops instance have Edge Registry Services enabled.
The following publishes the pipeline to the OCI registry and displays the container details. For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.
assay_pub = mainpipeline.publish()
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing..... Published.
Add Edge Location
The edge location is added with the wallaroo.pipeline_publish.add_edge(name)
method. This returns the OCI registration information, and the EDGE_BUNDLE
information. The EDGE_BUNDLE
data is a base64 encoded set of parameters for the pipeline that the edge device is associated with, the workspace, and other data.
For full details, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication: Edge Observability.
For this example, we will add two locations:
houseprice-edge-demonstration-01
houseprice-edge-demonstration-02
These will be used in later steps for demonstrating inferences through different locations.
edge_name_01 = "houseprice-edge-demonstration-01"
edge_publish_01 = assay_pub.add_edge(edge_name_01)
display(edge_publish_01)
ID | 1 | |
Pipeline Name | assay-demonstration-tutorial | |
Pipeline Version | e1abe92f-82d2-494a-8d96-dbd5810dc198 | |
Status | Published | |
Engine URL | ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:v2024.2.0-main-5455 | |
Pipeline URL | ghcr.io/wallaroolabs/doc-samples/pipelines/assay-demonstration-tutorial:e1abe92f-82d2-494a-8d96-dbd5810dc198 | |
Helm Chart URL | oci://ghcr.io/wallaroolabs/doc-samples/charts/assay-demonstration-tutorial | |
Helm Chart Reference | ghcr.io/wallaroolabs/doc-samples/charts@sha256:1e13d208c02930ef2ee5d9ebc24f38bc44383cb9866c568d88009d2741e0aba5 | |
Helm Chart Version | 0.0.1-e1abe92f-82d2-494a-8d96-dbd5810dc198 | |
Engine Config | {'engine': {'resources': {'limits': {'cpu': 4.0, 'memory': '3Gi'}, 'requests': {'cpu': 4.0, 'memory': '3Gi'}, 'accel': 'none', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': None}} | |
User Images | [] | |
Created By | john.hansarick@wallaroo.ai | |
Created At | 2024-07-30 15:55:59.141648+00:00 | |
Updated At | 2024-07-30 15:55:59.141648+00:00 | |
Replaces | ||
Docker Run Command |
Note: Please set the PERSISTENT_VOLUME_DIR , EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
edge_name_02 = "houseprice-edge-demonstration-02"
edge_publish_02 = assay_pub.add_edge(edge_name_02)
display(edge_publish_02)
ID | 1 | |
Pipeline Name | assay-demonstration-tutorial | |
Pipeline Version | e1abe92f-82d2-494a-8d96-dbd5810dc198 | |
Status | Published | |
Engine URL | ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:v2024.2.0-main-5455 | |
Pipeline URL | ghcr.io/wallaroolabs/doc-samples/pipelines/assay-demonstration-tutorial:e1abe92f-82d2-494a-8d96-dbd5810dc198 | |
Helm Chart URL | oci://ghcr.io/wallaroolabs/doc-samples/charts/assay-demonstration-tutorial | |
Helm Chart Reference | ghcr.io/wallaroolabs/doc-samples/charts@sha256:1e13d208c02930ef2ee5d9ebc24f38bc44383cb9866c568d88009d2741e0aba5 | |
Helm Chart Version | 0.0.1-e1abe92f-82d2-494a-8d96-dbd5810dc198 | |
Engine Config | {'engine': {'resources': {'limits': {'cpu': 4.0, 'memory': '3Gi'}, 'requests': {'cpu': 4.0, 'memory': '3Gi'}, 'accel': 'none', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': None}} | |
User Images | [] | |
Created By | john.hansarick@wallaroo.ai | |
Created At | 2024-07-30 15:55:59.141648+00:00 | |
Updated At | 2024-07-30 15:55:59.141648+00:00 | |
Replaces | ||
Docker Run Command |
Note: Please set the PERSISTENT_VOLUME_DIR , EDGE_PORT , OCI_USERNAME , and OCI_PASSWORD environment variables. | |
Helm Install Command |
Note: Please set the HELM_INSTALL_NAME , HELM_INSTALL_NAMESPACE ,
OCI_USERNAME , and OCI_PASSWORD environment variables. |
DevOps Deployment
The edge deployment is performed with docker run
, docker compose
, or helm
installations. For our examples, we’ll verify the following variables are set for the docker run
deployment:
$PERSISTENT_VOLUME_DIR
: The location of the persistent volume storage for the deployment.$EDGE_PORT
: The external port to access the edge deployment endpoints. By default, this is port8080
. Since there are two deployments, verify that both are on separate ports. For our examples, we’ll use port8080
and8081
.$OCI_USERNAME
: The OCI registry username.$OCI_PASSWORD
: The OCI registry password.
For more details on model edge deployments with Wallaroo, see Model Operations: Run Anywhere.
Next Steps
The next notebook “Wallaroo Run Anywhere Model Drift Observability with Assays” details creating assay baselines from model edge deployments, and using the data from one or more edge locations to detect model drift.