Model Drift Detection for Edge Deployments: Preparation

How to detect model drift in Wallaroo Run Anywhere deployments using the house price model as an example.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Model Drift Detection for Edge Deployments Tutorial: Preparation

The Model Insights feature lets you monitor how the environment that your model operates within may be changing in ways that affect it’s predictions so that you can intervene (retrain) in an efficient and timely manner. Changes in the inputs, data drift, can occur due to errors in the data processing pipeline or due to changes in the environment such as user preference or behavior.

Wallaroo Run Anywhere allows models to be deployed on edge and other locations, and have their inference result logs uploaded to the Wallaroo Ops center. Wallaroo assays allow for model drift detection to include the inference results from one or more deployment locations and compare any one or multiple locations results against an established baseline.

This notebook is designed to demonstrate the Wallaroo Run Anywhere with Model Drift Observability with Wallaroo Assays. This notebook will walk through the process of:

  • Preparation: This notebook focuses on setting up the conditions for model edge deployments to different locations. This includes:
    • Setting up a workspace, pipeline, and model for deriving the price of a house based on inputs.
    • Performing a sample set of inferences to verify the model deployment.
    • Publish the deployed model to an Open Container Initiative (OCI) Registry, and use that to deploy the model to two difference edge locations.
  • Model Drift by Location:
    • Perform inference requests on each of the model edge deployments.
    • Perform the steps in creating an assay:
      • Build an assay baseline with a specified location for inference results.
      • Preview the assay and show different assay configurations based on selecting the inference data from the Wallaroo Ops model deployment versus the edge deployment.
      • Create the assay.
      • View assay results.

This notebook focuses on Preparation.

Goal

Model insights monitors the output of the house price model over a designated time window and compares it to an expected baseline distribution. We measure the performance of model deployments in different locations and compare that to the baseline to detect model drift.

Resources

This tutorial provides the following:

  • Models:
    • models/rf_model.onnx: The champion model that has been used in this environment for some time.
    • Various inputs:
      • smallinputs.df.json: A set of house inputs that tends to generate low house price values.
      • biginputs.df.json: A set of house inputs that tends to generate high house price values.

Prerequisites

  • A deployed Wallaroo instance with Edge Registry Services and Edge Observability enabled.
  • The following Python libraries installed:
    • wallaroo: The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.
    • pandas: Pandas, mainly used for Pandas DataFrame
  • A X64 Docker deployment to deploy the model on an edge location.

Steps

  • Deploying a sample ML model used to determine house prices based on a set of input parameters.
  • Publish the model deployment configuration to an OCI registry.
  • Use the publish and set edge locations.
  • Deploy the model to two different edge locations.

Import Libraries

The first step will be to import our libraries, and set variables used through this tutorial.

import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework

from IPython.display import display

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

import datetime
import time

workspace_name = f'run-anywhere-assay-demonstration-tutorial'
main_pipeline_name = f'assay-demonstration-tutorial'
model_name_control = f'house-price-estimator'
model_file_name_control = './models/rf_model.onnx'

# Set the name of the assay
assay_name="ops assay example"
edge_assay_name = "edge assay example"
combined_assay_name = "combined assay example"

# ignoring warnings for demonstration
import warnings
warnings.filterwarnings('ignore')

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

Connect to the Wallaroo Instance

The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

wl = wallaroo.Client()

Create Workspace

We will create a workspace to manage our pipeline and models. The following variables will set the name of our sample workspace then set it as the current workspace.

Workspace, pipeline, and model names should be unique to each user, so we’ll add in a randomly generated suffix so multiple people can run this tutorial in a Wallaroo instance without effecting each other.

workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)

wl.set_current_workspace(workspace)
{'name': 'run-anywhere-assay-demonstration-tutorial', 'id': 10, 'archived': False, 'created_by': '07256c6a-1f1e-4cc8-bff8-94c9fb7cb843', 'created_at': '2024-04-19T18:44:04.24582+00:00', 'models': [], 'pipelines': []}

Upload The Champion Model

For our example, we will upload the champion model that has been trained to derive house prices from a variety of inputs. The model file is rf_model.onnx, and is uploaded with the name house-price-estimator.

housing_model_control = (wl.upload_model(model_name_control, 
                                        model_file_name_control, 
                                        framework=Framework.ONNX)
                                        .configure(tensor_fields=["tensor"])
                        )

Build the Pipeline

This pipeline is made to be an example of an existing situation where a model is deployed and being used for inferences in a production environment. We’ll call it assay-demonstration-tutorial, set housing_model_control as a pipeline step, then run a few sample inferences.

This pipeline will be a simple one - just a single pipeline step.

mainpipeline = wl.build_pipeline(main_pipeline_name)
# clear the steps if used before
mainpipeline.clear()

mainpipeline.add_model_step(housing_model_control)

#minimum deployment config
deploy_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(0.5).memory("1Gi").build()

mainpipeline.deploy(deployment_config = deploy_config)
nameassay-demonstration-tutorial
created2024-04-19 18:44:05.950549+00:00
last_updated2024-04-19 18:44:06.310958+00:00
deployedTrue
archx86
accelnone
tags
versions528f3051-4934-438b-8548-f459208276ae, 8459e1e1-92bc-4403-83f7-fc81dd88a369
stepshouse-price-estimator
publishedFalse
mainpipeline.status()
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.28.0.199',
   'name': 'engine-8664bd4d97-7tp7t',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'assay-demonstration-tutorial',
      'status': 'Running',
      'version': '528f3051-4934-438b-8548-f459208276ae'}]},
   'model_statuses': {'models': [{'name': 'house-price-estimator',
      'sha': 'e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6',
      'status': 'Running',
      'version': '68d5113b-3742-41cd-ac4b-8583df55b134'}]}}],
 'engine_lbs': [{'ip': '10.28.2.228',
   'name': 'engine-lb-d7cc8fc9c-gzw5f',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Testing

We’ll use two inferences as a quick sample test - one that has a house that should be determined around $700k, the other with a house determined to be around $1.5 million.

normal_input = pd.DataFrame.from_records({"tensor": [[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0]]})
result = mainpipeline.infer(normal_input)
display(result)
timein.tensorout.variableanomaly.count
02024-04-19 18:44:20.982[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0, 8.0, 2900.0, 0.0, 47.6063, -122.02, 2970.0, 5251.0, 12.0, 0.0, 0.0][718013.7]0
large_house_input = pd.DataFrame.from_records({'tensor': [[4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0, 10.0, 2760.0, 950.0, 47.6696, -122.261, 3970.0, 20000.0, 79.0, 0.0, 0.0]]})
large_house_result = mainpipeline.infer(large_house_input)
display(large_house_result)
timein.tensorout.variableanomaly.count
02024-04-19 18:44:21.148[4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0, 10.0, 2760.0, 950.0, 47.6696, -122.261, 3970.0, 20000.0, 79.0, 0.0, 0.0][1514079.4]0

Undeploy Main Pipeline

With the examples and examples complete, we will undeploy the main pipeline and return the resources back to the Wallaroo instance.

mainpipeline.undeploy()
nameassay-demonstration-tutorial
created2024-04-19 18:44:05.950549+00:00
last_updated2024-04-19 18:44:06.310958+00:00
deployedFalse
archx86
accelnone
tags
versions528f3051-4934-438b-8548-f459208276ae, 8459e1e1-92bc-4403-83f7-fc81dd88a369
stepshouse-price-estimator
publishedFalse

Edge Deployment

We can now deploy the pipeline to an edge device. This will require the following steps:

  • Publish the pipeline: Publishes the pipeline to the OCI registry.
  • Add Edge: Add the edge location to the pipeline publish.
  • Deploy Edge: Deploy the edge device with the edge location settings.

Publish Pipeline

Publishing the pipeline uses the pipeline wallaroo.pipeline.publish() command. This requires that the Wallaroo Ops instance have Edge Registry Services enabled.

The following publishes the pipeline to the OCI registry and displays the container details. For more information, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication.

assay_pub = mainpipeline.publish()
Waiting for pipeline publish... It may take up to 600 sec.
Pipeline is publishing....... Published.

Add Edge Location

The edge location is added with the wallaroo.pipeline_publish.add_edge(name) method. This returns the OCI registration information, and the EDGE_BUNDLE information. The EDGE_BUNDLE data is a base64 encoded set of parameters for the pipeline that the edge device is associated with, the workspace, and other data.

For full details, see Wallaroo SDK Essentials Guide: Pipeline Edge Publication: Edge Observability.

For this example, we will add two locations:

  • houseprice-edge-demonstration-01
  • houseprice-edge-demonstration-02

These will be used in later steps for demonstrating inferences through different locations.

edge_name_01 = "houseprice-edge-demonstration-01"
edge_publish_01 = assay_pub.add_edge(edge_name_01)
display(edge_publish_01)
ID1
Pipeline Nameassay-demonstration-tutorial
Pipeline Version2c70d1a3-1430-421d-b00e-7d5d8f93413a
StatusPublished
Engine URLghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:v2024.1.0-main-4963
Pipeline URLghcr.io/wallaroolabs/doc-samples/pipelines/assay-demonstration-tutorial:2c70d1a3-1430-421d-b00e-7d5d8f93413a
Helm Chart URLoci://ghcr.io/wallaroolabs/doc-samples/charts/assay-demonstration-tutorial
Helm Chart Referenceghcr.io/wallaroolabs/doc-samples/charts@sha256:a4a307ef23cb5e1759abee88ec30442f5eb8343bf9cd764e76215bb14bd085c4
Helm Chart Version0.0.1-2c70d1a3-1430-421d-b00e-7d5d8f93413a
Engine Config{'engine': {'resources': {'limits': {'cpu': 4.0, 'memory': '3Gi'}, 'requests': {'cpu': 4.0, 'memory': '3Gi'}, 'accel': 'none', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': None}}
User Images[]
Created Byjohn.hansarick@wallaroo.ai
Created At2024-04-19 18:44:59.893730+00:00
Updated At2024-04-19 18:44:59.893730+00:00
Replaces
Docker Run Command
docker run -v $PERSISTENT_VOLUME_DIR:/persist \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e EDGE_BUNDLE=ZXhwb3J0IEJVTkRMRV9WRVJTSU9OPTEKZXhwb3J0IENPTkZJR19DUFVTPTQKZXhwb3J0IEVER0VfTkFNRT1ob3VzZXByaWNlLWVkZ2UtZGVtb25zdHJhdGlvbi0wMQpleHBvcnQgT1BTQ0VOVEVSX0hPU1Q9ZG9jLXRlc3QuZWRnZS53YWxsYXJvb2NvbW11bml0eS5uaW5qYQpleHBvcnQgUElQRUxJTkVfVVJMPWdoY3IuaW8vd2FsbGFyb29sYWJzL2RvYy1zYW1wbGVzL3BpcGVsaW5lcy9hc3NheS1kZW1vbnN0cmF0aW9uLXR1dG9yaWFsOjJjNzBkMWEzLTE0MzAtNDIxZC1iMDBlLTdkNWQ4ZjkzNDEzYQpleHBvcnQgSk9JTl9UT0tFTj0xMzVmZWVkNC00ZGU0LTQ2N2ItOWRiYi02NDY0NzBmNDg1YTUKZXhwb3J0IE9DSV9SRUdJU1RSWT1naGNyLmlv\
    -e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/assay-demonstration-tutorial:2c70d1a3-1430-421d-b00e-7d5d8f93413a \
    -e CONFIG_CPUS=4 ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:v2024.1.0-main-4963

Note: Please set the PERSISTENT_VOLUME_DIR, EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://ghcr.io/wallaroolabs/doc-samples/charts/assay-demonstration-tutorial \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-2c70d1a3-1430-421d-b00e-7d5d8f93413a \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD \
    --set edgeBundle=ZXhwb3J0IEJVTkRMRV9WRVJTSU9OPTEKZXhwb3J0IENPTkZJR19DUFVTPTQKZXhwb3J0IEVER0VfTkFNRT1ob3VzZXByaWNlLWVkZ2UtZGVtb25zdHJhdGlvbi0wMQpleHBvcnQgT1BTQ0VOVEVSX0hPU1Q9ZG9jLXRlc3QuZWRnZS53YWxsYXJvb2NvbW11bml0eS5uaW5qYQpleHBvcnQgUElQRUxJTkVfVVJMPWdoY3IuaW8vd2FsbGFyb29sYWJzL2RvYy1zYW1wbGVzL3BpcGVsaW5lcy9hc3NheS1kZW1vbnN0cmF0aW9uLXR1dG9yaWFsOjJjNzBkMWEzLTE0MzAtNDIxZC1iMDBlLTdkNWQ4ZjkzNDEzYQpleHBvcnQgSk9JTl9UT0tFTj0xMzVmZWVkNC00ZGU0LTQ2N2ItOWRiYi02NDY0NzBmNDg1YTUKZXhwb3J0IE9DSV9SRUdJU1RSWT1naGNyLmlv

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.
edge_name_02 = "houseprice-edge-demonstration-02"
edge_publish_02 = assay_pub.add_edge(edge_name_02)
display(edge_publish_02)
ID1
Pipeline Nameassay-demonstration-tutorial
Pipeline Version2c70d1a3-1430-421d-b00e-7d5d8f93413a
StatusPublished
Engine URLghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:v2024.1.0-main-4963
Pipeline URLghcr.io/wallaroolabs/doc-samples/pipelines/assay-demonstration-tutorial:2c70d1a3-1430-421d-b00e-7d5d8f93413a
Helm Chart URLoci://ghcr.io/wallaroolabs/doc-samples/charts/assay-demonstration-tutorial
Helm Chart Referenceghcr.io/wallaroolabs/doc-samples/charts@sha256:a4a307ef23cb5e1759abee88ec30442f5eb8343bf9cd764e76215bb14bd085c4
Helm Chart Version0.0.1-2c70d1a3-1430-421d-b00e-7d5d8f93413a
Engine Config{'engine': {'resources': {'limits': {'cpu': 4.0, 'memory': '3Gi'}, 'requests': {'cpu': 4.0, 'memory': '3Gi'}, 'accel': 'none', 'arch': 'x86', 'gpu': False}}, 'engineAux': {'autoscale': {'type': 'none'}, 'images': None}}
User Images[]
Created Byjohn.hansarick@wallaroo.ai
Created At2024-04-19 18:44:59.893730+00:00
Updated At2024-04-19 18:44:59.893730+00:00
Replaces
Docker Run Command
docker run -v $PERSISTENT_VOLUME_DIR:/persist \
    -p $EDGE_PORT:8080 \
    -e OCI_USERNAME=$OCI_USERNAME \
    -e OCI_PASSWORD=$OCI_PASSWORD \
    -e EDGE_BUNDLE=ZXhwb3J0IEJVTkRMRV9WRVJTSU9OPTEKZXhwb3J0IENPTkZJR19DUFVTPTQKZXhwb3J0IEVER0VfTkFNRT1ob3VzZXByaWNlLWVkZ2UtZGVtb25zdHJhdGlvbi0wMgpleHBvcnQgT1BTQ0VOVEVSX0hPU1Q9ZG9jLXRlc3QuZWRnZS53YWxsYXJvb2NvbW11bml0eS5uaW5qYQpleHBvcnQgUElQRUxJTkVfVVJMPWdoY3IuaW8vd2FsbGFyb29sYWJzL2RvYy1zYW1wbGVzL3BpcGVsaW5lcy9hc3NheS1kZW1vbnN0cmF0aW9uLXR1dG9yaWFsOjJjNzBkMWEzLTE0MzAtNDIxZC1iMDBlLTdkNWQ4ZjkzNDEzYQpleHBvcnQgSk9JTl9UT0tFTj1jN2FhYmU3OS1iYWY5LTQ4OWEtODg1Mi1hYmJiNDg4NjcxM2YKZXhwb3J0IE9DSV9SRUdJU1RSWT1naGNyLmlv\
    -e PIPELINE_URL=ghcr.io/wallaroolabs/doc-samples/pipelines/assay-demonstration-tutorial:2c70d1a3-1430-421d-b00e-7d5d8f93413a \
    -e CONFIG_CPUS=4 ghcr.io/wallaroolabs/doc-samples/engines/proxy/wallaroo/ghcr.io/wallaroolabs/fitzroy-mini:v2024.1.0-main-4963

Note: Please set the PERSISTENT_VOLUME_DIR, EDGE_PORT, OCI_USERNAME, and OCI_PASSWORD environment variables.
Helm Install Command
helm install --atomic $HELM_INSTALL_NAME \
    oci://ghcr.io/wallaroolabs/doc-samples/charts/assay-demonstration-tutorial \
    --namespace $HELM_INSTALL_NAMESPACE \
    --version 0.0.1-2c70d1a3-1430-421d-b00e-7d5d8f93413a \
    --set ociRegistry.username=$OCI_USERNAME \
    --set ociRegistry.password=$OCI_PASSWORD \
    --set edgeBundle=ZXhwb3J0IEJVTkRMRV9WRVJTSU9OPTEKZXhwb3J0IENPTkZJR19DUFVTPTQKZXhwb3J0IEVER0VfTkFNRT1ob3VzZXByaWNlLWVkZ2UtZGVtb25zdHJhdGlvbi0wMgpleHBvcnQgT1BTQ0VOVEVSX0hPU1Q9ZG9jLXRlc3QuZWRnZS53YWxsYXJvb2NvbW11bml0eS5uaW5qYQpleHBvcnQgUElQRUxJTkVfVVJMPWdoY3IuaW8vd2FsbGFyb29sYWJzL2RvYy1zYW1wbGVzL3BpcGVsaW5lcy9hc3NheS1kZW1vbnN0cmF0aW9uLXR1dG9yaWFsOjJjNzBkMWEzLTE0MzAtNDIxZC1iMDBlLTdkNWQ4ZjkzNDEzYQpleHBvcnQgSk9JTl9UT0tFTj1jN2FhYmU3OS1iYWY5LTQ4OWEtODg1Mi1hYmJiNDg4NjcxM2YKZXhwb3J0IE9DSV9SRUdJU1RSWT1naGNyLmlv

Note: Please set the HELM_INSTALL_NAME, HELM_INSTALL_NAMESPACE, OCI_USERNAME, and OCI_PASSWORD environment variables.

DevOps Deployment

The edge deployment is performed with docker run, docker compose, or helm installations. For our examples, we’ll verify the following variables are set for the docker run deployment:

  • $PERSISTENT_VOLUME_DIR: The location of the persistent volume storage for the deployment.
  • $EDGE_PORT: The external port to access the edge deployment endpoints. By default, this is port 8080. Since there are two deployments, verify that both are on separate ports. For our examples, we’ll use port 8080 and 8081.
  • $OCI_USERNAME: The OCI registry username.
  • $OCI_PASSWORD: The OCI registry password.

For more details on model edge deployments with Wallaroo, see Model Operations: Run Anywhere.

Next Steps

The next notebook “Wallaroo Run Anywhere Model Drift Observability with Assays” details creating assay baselines from model edge deployments, and using the data from one or more edge locations to detect model drift.