NLP Classification: Sentiment Analysis

Wallaroo Use Case Tutorials focused on solving Sentiment Analysis problems with NLP Classification models

1: Sentiment Analysis: Upload and Deploy
2: Sentiment Analysis: Validation Rules
3: Sentiment Analysis: Drift Detection

1 - Sentiment Analysis: Upload and Deploy

How to upload and deploy a Sentiment Analysis NLP Classification model to Wallaroo.

Tutorial Notebook 1: Build and Deploy a Model

For this tutorial, let’s pretend that you work for a movie promotion company tracking how people feel about a movie, and have developed a ML Model that tracks reviews written in the Internet Movie DataBase (IMDB) to gauge whether a review is positive or negative.

For this tutorial, you will train a ML model that uses data from the the Large Movie Review Dataset with sample data can be downloaded from the aclIMDB dataset to create a new model. (Or use the pre-trained one in the ./models folder).

Before we start, let’s load some libraries that we will need for this notebook (note that this may not be a complete list).

IMPORTANT NOTE: This tutorial is geared towards a Wallaroo 2023.2.1 environment.

# preload needed libraries 

import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework

from IPython.display import display

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

import json

import datetime
import time

# used for unique connection names

import string
import random

Exercise: Build a model

Sample data can be downloaded from the aclIMDB dataset to create a new model - or from the ./data/aclimdb folder. Some possible other sources in categorizing the original text are available from this ‘Fetching data, training a classifier’ example.

At the end of the exercise, you should have a notebook and possibly other artifacts to produce a model for predicting house prices. For the purposes of the exercise, please use a framework that can be converted to ONNX, such as scikit-learn or XGBoost.

For assistance converting a model to ONNX, see the Wallaroo Model Conversion Tutorials for some examples.

NOTE

If you prefer to shortcut this step, you can use one of the pre-trained model files in the models subdirectory.

## Blank space for training model, if needed

Getting Ready to deploy

Wallaroo natively supports models in the ONNX and Tensorflow frameworks, and other frameworks via containerization. For this exercise, we assume that you have a model that can be converted to the ONNX framework. The first steps to deploying in Wallaroo, then, is to convert your model to ONNX, and to add some extra functions to your processing modules so Wallaroo can call them.

Exercise: Convert your Model to ONNX

Take the model that you created in the previous exercises, and convert it to ONNX. If you need help, see the Wallaroo Conversion Tutorials, or other conversion documentation.

At the end of this exercise, you should have your model as a standalone artifact, for example, a file called model.onnx.

NOTE

If you prefer to shortcut this exercise, you can use one of the pre-converted onnx files in the models directory.

# Blank space to load for converting model, if needed

Get ready to work with Wallaroo

Now that you have a model ready to go, you can log into Wallaroo and set up a workspace to organize your deployment artifacts. A Wallaroo workspace is place to organize the deployment artifacts for a project, and to collaborate with other team members. For more information, see the Wallaroo 101.

Logging into Wallaroo via the cluster’s integrated JupyterLab is quite straightfoward:

# Login through local Wallaroo instance 
wl = wallaroo.Client()

See the documentation if you are logging into Wallaroo some other way.

Once you are logged in, you can create a workspace and set it as your working environment. To make the first exercise easier, here is a convenience function to get or create a workspace:

# return the workspace called <name>, or create it if it does not exist.
# this function assumes your connection to wallaroo is called wl
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

Then logging in and creating a workspace looks something like this:

# Login through local Wallaroo instance 
wl = wallaroo.Client()

Setting up the workspace may resemble this. Verify that the workspace name is unique across the Wallaroo instance.

# workspace names need to be globally unique, so add a random suffix to insure this
# especially important if the "main" workspace name is potentially a common one

suffix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))
workspace_name = "my-workspace"+suffix

workspace = get_workspace(workspace_name)

# set your current workspace to the workspace that you just created
wl.set_current_workspace(workspace)

# optionally, examine your current workspace
wl.get_current_workspace()

Exercise: Log in and create a workspace

Log into wallaroo, and create a workspace for this tutorial. Then set that new workspace to your current workspace.
Make sure you remember the name that you gave the workspace, as you will need it for later notebooks. Set that workspace to be your working environment.

Notes

Workspace names must be globally unique, so don’t pick something too common. The “random suffix” trick in the code snippet is one way to try to generate a unique workspace name, if you suspect you are using a common name.

At the end of the exercise, you should be in a new workspace to do further work.

# Login through local Wallaroo instance

wl = wallaroo.Client()

## Blank spot to connect to the workspace

# used to generate a random workspace for testing
suffix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))

workspace_name = f"tutorial-workspace"

workspace = get_workspace(workspace_name)

# set your current workspace to the workspace that you just created
wl.set_current_workspace(workspace)

# optionally, examine your current workspace
wl.get_current_workspace()

{'name': 'tutorial-workspace-jch', 'id': 19, 'archived': False, 'created_by': '0a36fba2-ad42-441b-9a8c-bac8c68d13fa', 'created_at': '2023-08-03T19:34:42.324336+00:00', 'models': [{'name': 'tutorial-model', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 8, 3, 19, 36, 31, 13200, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 8, 3, 19, 36, 31, 13200, tzinfo=tzutc())}], 'pipelines': [{'name': 'tutorialpipeline-jch', 'create_time': datetime.datetime(2023, 8, 3, 19, 36, 31, 732163, tzinfo=tzutc()), 'definition': '[]'}]}

Deploy a Simple Single-Step Pipeline

Once your model is in the ONNX format, and you have a workspace to work in, you can easily upload your model to Wallaroo’s production platform with just a few lines of code. For example, if you have a model called model.onnx, and you wish to upload it to Wallaroo with the name mymodel, then upload the model as follows (once you are in the appropriate workspace):


my_model = wl.upload_model("mymodel", "model.onnx", framework=Framework.ONNX).configure()

See Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX for full details.

The function upload_model() returns a handle to the uploaded model that you will continue to work with in the SDK.

Once the model has been uploaded, you can create a pipeline that contains the model. The pipeline is the mechanism that manages deployments. A pipeline contains a series of steps - sequential sets of models which take in the data from the preceding step, process it through the model, then return a result. Some pipelines can have just one step, while others may have multiple models with multiple steps or arranged for A/B testing. Deployed pipelines allocate resources and can then process data either through local files or through a deployment URL.

So for your model to accept inferences, you must add it to a pipeline. You can create a single step pipeline called mypipeline as follows.

# create the pipeline
my_pipeline = wl.build_pipeline("mypipeline").add_model_step(my_model)

# deploy the pipeline
my_pipeline = my_pipeline.deploy()

Deploying the pipeline means that resources from the cluster are allocated to the pipeline, and it is ready to accept inferences. You can “turn off” the pipeline with the call pipeline.undeploy(), which returns the resources back to the cluster. This is an important step - leaving pipeline deployed when they’re no longer needed takes up resources that may be needed by other pipelines or services.

See Wallaroo SDK Essentials Guide: Pipeline Management for full details.

Additional Data Processing Steps

In some cases, additional steps are used to format the data either before reaching the model (pre-process) or after the models output (post-process).

For this sample example, we have two models: one that prepares the data (aka the embedder) and the actual sentiment model. The both are ONNX models, and can be deployed as follows:

pipeline.add_model_step(embedder)
pipeline.add_model_step(sentiment_model).configure(runtime="onnx", tensor_fields=["flatten_1"])

Note the additional configuration for our sample sentiment model - this is to use the output from the embedder and recognize the tensor fields to look at (aka - flatten_1).

Inference data is passed to the first ML model, then the results of that model are passed to the next step. Any ML model can serve as a Python step in sequence, provided it is trained to input and output the data in the format that the previous or the next model in the sequence provides.

For convenience, pre and post process steps can be Python scripts and deployed to Wallaroo as pipeline model steps to make managing this data processing earlier.

For information on how to set up a pipeline step as a Python script for hosting models or for pre or post processing, see Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models.

More Hints

workspace = wl.get_current_workspace() gives you a handle to the current workspace
then workspace.models() will return a list of the models in the workspace
and workspace.pipelines() will return a list of the pipelines in the workspace

Exercise: Upload and deploy your model

Upload and deploy the ONNX model that you created in the previous exercise. For simplicity, do any needed pre-processing in the notebook.

At the end of the exercise, you should have a model and a deployed pipeline in your workspace.

## blank space to upload model, and create the pipeline

from wallaroo.framework import Framework

embedder = wl.upload_model('embedder', '../models/embedder.onnx', framework=Framework.ONNX)
sentiment_model = wl.upload_model('sentiment', '../models/sentiment_model.onnx', framework=Framework.ONNX).configure(runtime="onnx", tensor_fields=["flatten_1"])

pipeline = wl.build_pipeline("sentiment-analysis").add_model_step(embedder).add_model_step(sentiment_model)

pipeline.deploy()

name	sentiment-analysis
created	2023-08-11 15:34:49.622995+00:00
last_updated	2023-08-11 15:43:22.426116+00:00
deployed	True
tags
versions	cf689a7d-e51a-4d58-96fd-a024ebd3ddba, d1a918df-04fe-4b98-8d1e-01f7831aab44, 7668ad9a-c12d-40a5-8370-c681c87e4786, 8028e5fc-81b7-45a0-8347-61e0e17e20c4
steps	sentiment

Sending Data to your Pipeline

ONNX models generally expect their input as an array in a dictionary, keyed by input name. In Wallaroo, the default input name is “tensor”. So (outside of Wallaroo), an ONNX model that expected three numeric values as its input would expect input data similar to the below: (Note: The below examples are only notional, they aren’t intended to work with our example models.)

# one datum
singleton = {'tensor': [[1, 2, 3]] }

# two datums
two_inputs = {'tensor': [[1, 2, 3], [4, 5, 6]] }

In the Wallaroo SDK, you can send a pandas DataFrame representation of this dictionary (pandas record format) to the pipeline, via the pipeline.infer() method.

import pandas as pd

# one datum (notional example)
sdf = pd.DataFrame(singleton)
sdf
#       tensor
# 0  [1, 2, 3]

# send the datum to a pipeline for inference
# notional example - not houseprice model!
result = my_pipeline.infer(sdf)

# two datums
# Note that the value of 'tensor' must be a list, not a numpy array 
twodf = pd.DataFrame(two_inputs)
twodf
#      tensor
# 0  [1, 2, 3]
# 1  [4, 5, 6]

# send the data to a pipeline for inference
# notional example, not houseprice model!
result = my_pipeline.infer(twodf)

To send data to a pipeline via the inference URL (for example, via CURL), you need the JSON representation of these data frames.

#
# notional examples, not houseprice model!
#
sdf.to_json(orient='records')
# '[{"tensor":[1,2,3]}]'

twodf.to_json(orient='records')
# '[{"tensor":[1,2,3]},{"tensor":[4,5,6]}]'

If the JSON data is in a file, you can send it to the pipeline from within the SDK via the pipeline.infer_from_file() method.

In either case, a successful inference will return a data frame of inference results. The model inference(s) will be in the column out.<outputname>.

For more details, see Wallaroo SDK Essentials Guide: Inference Management.

Converting inputs

If your input data is in a standard tabular format, then you need to convert to pandas record format to send the data to your pipeline. See the pandas DataFrame documentation for methods on how to import a CSV file to a DataFrame.

The sample data provided is already tokenized in both Apache Arrow, JSON and pandas record format in the ./data folder. If using the pre-built models, try loading the ./data/test_data_50K.df.json in a pandas DataFrame and proceeding from there.

To help with the following exercises, here are some convenience functions you might find useful for doing this conversion. These functions convert input data in standard tabular format (in a pandas DataFrame) to the pandas record format that the model expects.

# pull a single datum from a data frame 
# and convert it to the format the model expects
def get_singleton(df, i):
    singleton = df.iloc[i,:].to_numpy().tolist()
    sdict = {'tensor': [singleton]}
    return pd.DataFrame.from_dict(sdict)

# pull a batch of data from a data frame
# and convert to the format the model expects
def get_batch(df, first=0, nrows=1):
    last = first + nrows
    batch = df.iloc[first:last, :].to_numpy().tolist()
    return pd.DataFrame.from_dict({'tensor': batch})

Execute the following code block to see examples of what `get_singleton` and `get_batch` do.

# RUN ME!

print('''TOY data for a model that takes inputs var1, var2, var3.
The dataframe is called df.
Pretend the model is in a Wallaroo pipeline called "toypipeline"''')

df = pd.DataFrame({
    'var1': [1, 3, 5],
    'var2': [33, 88, 45],
    'var3': [6, 20, 5]
})

display(df)

# create a model input from the first row
# this is now in the format that a model would accept
singleton = get_singleton(df, 0)

print('''The command "singleton = get_singleton(df, 0)" converts
the first row of the data frame into the format that Wallaroo pipelines accept.
You could now get a prediction by: "toypipeline.infer(singleton)".
''')
display(singleton)

# create a batch of queries from the entire dataframe
batch = get_batch(df, nrows=2)

print('''The command "batch = get_batch(df, nrows=2)" converts
the the first two rows of the data frame into a batch format that Wallaroo pipelines accept.
You could now get a batch prediction by: "toypipeline.infer(batch)".
''')
display(batch)

TOY data for a model that takes inputs var1, var2, var3.
The dataframe is called df.
Pretend the model is in a Wallaroo pipeline called "toypipeline"

	var1	var2	var3
0	1	33	6
1	3	88	20
2	5	45	5

The command "singleton = get_singleton(df, 0)" converts
the first row of the data frame into the format that Wallaroo pipelines accept.
You could now get a prediction by: "toypipeline.infer(singleton)".

	tensor
0	[1, 33, 6]

The command "batch = get_batch(df, nrows=2)" converts
the the first two rows of the data frame into a batch format that Wallaroo pipelines accept.
You could now get a batch prediction by: "toypipeline.infer(batch)".

	tensor
0	[1, 33, 6]
1	[3, 88, 20]

Exercise: Send data to your pipeline for inference.

Create some test data from the housing data and send it to the pipeline that you deployed in the previous exercise.

If you used the pre-provided models, then you can use test_data.csv from the data directory. This can be loaded directly into your sample pandas DataFrame - check the pandas documentation for a handy function for doing that. (We mention yours because sometimes people try to use the example code above rather than their own data.)
Start easy, with just one datum; retrieve the inference results. You can try small batches, as well. Use the above example as a guide.
Examine the inference results; observe what the model prediction column is called; it should be of the form out.<outputname>.

For more hints about the different ways of sending data to the pipeline, and to see an example of the inference result format, see the “Running Inferences” section of Wallaroo 101.

At the end of the exercise, you should have a set of inference results that you got through the Wallaroo pipeline.

##  blank space to create test data, and send some data to your model

df = pd.read_json('../data/test_data_50K.df.json')

singleton = get_singleton(df, 0)
display(singleton)

single_result = pipeline.infer(singleton)
display(single_result)

multiple_batch = get_batch(df, nrows=5)
multiple_result = pipeline.infer(multiple_batch)
display(multiple_result)

	tensor
0	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

tensor

[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

	time	in.tensor	out.dense_1	check_failures
0	2023-08-11 15:44:16.706	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.8980188]	0

	time	in.tensor	out.dense_1
0	2023-08-11 15:44:17.093	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.8980188]
1	2023-08-11 15:44:17.093	[[54.0, 548.0, 86.0, 70.0, 1213.0, 24.0, 746.0, 6.0, 11.0, 19.0, 6.0, 3.0, 1898.0, 90.0, 370.0, 113.0, 832.0, 367.0, 154.0, 10.0, 78.0, 21.0, 121.0, 135.0, 4717.0, 5.0, 350.0, 2.0, 1594.0, 122.0, 3.0, 26.0, 6.0, 2315.0, 30.0, 9.0, 22.0, 103.0, 1.0, 2253.0, 20.0, 1.0, 285.0, 2.0, 1.0, 93.0, 26.0, 44.0, 3.0, 367.0, 790.0, 87.0, 184.0, 26.0, 40.0, 9.0, 53.0, 26.0, 1383.0, 109.0, 1.0, 2211.0, 4.0, 688.0, 26.0, 6.0, 3.0, 75.0, 281.0, 26.0, 1784.0, 69.0, 4.0, 157.0, 4311.0, 1720.0, 2124.0, 46.0, 86.0, 44.0, 66.0, 11.0, 19.0, 614.0, 30.0, 540.0, 1927.0, 4588.0, 2.0, 159.0, 555.0, 118.0, 5924.0, 81.0, 264.0, 15.0, 2.0, 688.0, 530.0, 20.0]]	[0.056596935]
2	2023-08-11 15:44:17.093	[[1.0, 9259.0, 6.0, 8.0, 1.0, 3.0, 62.0, 4.0, 32.0, 4416.0, 34.0, 457.0, 8595.0, 31.0, 1.0, 497.0, 2.0, 8.0, 1.0, 972.0, 2847.0, 2178.0, 24.0, 110.0, 2.0, 1.0, 1918.0, 60.0, 1072.0, 1.0, 129.0, 26.0, 44.0, 410.0, 2353.0, 8.0, 49.0, 2.0, 442.0, 8.0, 1.0, 4287.0, 4.0, 24.0, 24.0, 116.0, 599.0, 5074.0, 2.0, 1135.0, 7093.0, 2602.0, 5120.0, 2.0, 22.0, 25.0, 3.0, 450.0, 8596.0, 16.0, 3036.0, 2.0, 1975.0, 385.0, 16.0, 1.0, 1023.0, 931.0, 4.0, 2137.0, 2.0, 1.0, 3022.0, 4.0, 309.0, 4416.0, 294.0, 32.0, 318.0, 19.0, 15.0, 145.0, 80.0, 807.0, 3264.0, 1.0, 4416.0, 294.0, 5034.0, 15.0, 3023.0, 6.0, 32.0, 5514.0, 4.0, 1.0, 1299.0, 2205.0, 493.0, 1.0]]	[0.9260802]
3	2023-08-11 15:44:17.093	[[10.0, 25.0, 107.0, 1.0, 343.0, 17.0, 3.0, 168.0, 150.0, 593.0, 100.0, 12.0, 10.0, 103.0, 29.0, 2278.0, 1.0, 83.0, 28.0, 6.0, 63.0, 21.0, 1.0, 115.0, 18.0, 42.0, 1.0, 88.0, 1060.0, 28.0, 204.0, 458.0, 103.0, 1.0, 228.0, 4.0, 6887.0, 4252.0, 297.0, 42.0, 63.0, 84.0, 48.0, 131.0, 490.0, 119.0, 79.0, 1.0, 2278.0, 23.0, 318.0, 8.0, 1.0, 315.0, 299.0, 190.0, 126.0, 576.0, 5.0, 103.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.926919]
4	2023-08-11 15:44:17.093	[[10.0, 37.0, 1.0, 49.0, 2.0, 442.0, 982.0, 10.0, 420.0, 1807.0, 8.0, 11.0, 17.0, 125.0, 71.0, 98.0, 17.0, 26.0, 44.0, 123.0, 221.0, 26.0, 283.0, 1.0, 1389.0, 9260.0, 121.0, 9.0, 29.0, 26.0, 628.0, 295.0, 26.0, 284.0, 480.0, 2.0, 3.0, 50.0, 4484.0, 482.0, 1.0, 189.0, 12.0, 9.0, 284.0, 47.0, 23.0, 3108.0, 180.0, 8.0, 1822.0, 2.0, 20.0, 699.0, 71.0, 72.0, 67.0, 101.0, 4.0, 405.0, 69.0, 437.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.6618577]

Undeploying Your Pipeline

You should always undeploy your pipelines when you are done with them, or don’t need them for a while. This releases the resources that the pipeline is using for other processes to use. You can always redeploy the pipeline when you need it again. As a reminder, here are the commands to deploy and undeploy a pipeline:


# when the pipeline is deployed, it's ready to receive data and infer
pipeline.deploy()

# "turn off" the pipeline and releaase its resources
pipeline.undeploy()

If you are continuing on to the next notebook now, you can leave the pipeline deployed to keep working; but if you are taking a break, then you should undeploy.

## blank space to undeploy the pipeline, if needed

pipeline.undeploy()

name	sentiment-analysis
created	2023-08-11 15:34:49.622995+00:00
last_updated	2023-08-11 15:43:22.426116+00:00
deployed	False
tags
versions	cf689a7d-e51a-4d58-96fd-a024ebd3ddba, d1a918df-04fe-4b98-8d1e-01f7831aab44, 7668ad9a-c12d-40a5-8370-c681c87e4786, 8028e5fc-81b7-45a0-8347-61e0e17e20c4
steps	sentiment

Congratulations!

You have now

Successfully trained a model
Converted your model and uploaded it to Wallaroo
Created and deployed a simple single-step pipeline
Successfully send data to your pipeline for inference

In the next notebook, you will look at two different ways to evaluate your model against the real world environment.

2 - Sentiment Analysis: Validation Rules

How to use set validation rules for model inference inputs and outputs with Wallaroo

Tutorial Notebook 2: Observability Part 1 - Validation Rules

In the previous notebooks you uploaded the models and artifacts, then deployed the models to production through provisioning workspaces and pipelines. Now you’re ready to put your feet up! But to keep your models operational, your work’s not done once the model is in production. You must continue to monitor the behavior and performance of the model to insure that the model provides value to the business.

In this notebook, you will learn about adding validation rules to pipelines.

Preliminaries

In the blocks below we will preload some required libraries; we will also redefine some of the convenience functions that you saw in the previous notebooks.

After that, you should log into Wallaroo and set your working environment to the workspace that you created for this tutorial. Please refer to Notebook 1 to refresh yourself on how to log in and set your working environment to the appropriate workspace.

# preload needed libraries 

import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework

from IPython.display import display

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

import json
import datetime
import time

# used for unique connection names

import string
import random

## convenience functions from the previous notebooks
## these functions assume your connection to wallaroo is called wl

# return the workspace called <name>, or create it if it does not exist.
# this function assumes your connection to wallaroo is called wl
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

# pull a single datum from a data frame 
# and convert it to the format the model expects
def get_singleton(df, i):
    singleton = df.iloc[i,:].to_numpy().tolist()
    sdict = {'tensor': [singleton]}
    return pd.DataFrame.from_dict(sdict)

# pull a batch of data from a data frame
# and convert to the format the model expects
def get_batch(df, first=0, nrows=1):
    last = first + nrows
    batch = df.iloc[first:last, :].to_numpy().tolist()
    return pd.DataFrame.from_dict({'tensor': batch})

# Get the most recent version of a model in the workspace
# Assumes that the most recent version is the first in the list of versions.
# wl.get_current_workspace().models() returns a list of models in the current workspace

def get_model(mname):
    modellist = wl.get_current_workspace().models()
    model = [m.versions()[-1] for m in modellist if m.name() == mname]
    if len(model) <= 0:
        raise KeyError(f"model {mname} not found in this workspace")
    return model[0]

# get a pipeline by name in the workspace
def get_pipeline(pname):
    plist = wl.get_current_workspace().pipelines()
    pipeline = [p for p in plist if p.name() == pname]
    if len(pipeline) <= 0:
        raise KeyError(f"pipeline {pname} not found in this workspace")
    return pipeline[0]

## blank space to log in and go to correct workspace

wl = wallaroo.Client()

workspace_name = f"tutorial-workspace"

workspace = get_workspace(workspace_name)

# set your current workspace to the workspace that you just created
wl.set_current_workspace(workspace)

# optionally, examine your current workspace
wl.get_current_workspace()

{'name': 'tutorial-workspace-jch', 'id': 19, 'archived': False, 'created_by': '0a36fba2-ad42-441b-9a8c-bac8c68d13fa', 'created_at': '2023-08-03T19:34:42.324336+00:00', 'models': [{'name': 'tutorial-model', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 8, 3, 19, 36, 31, 13200, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 8, 3, 19, 36, 31, 13200, tzinfo=tzutc())}, {'name': 'embedder', 'versions': 2, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 8, 11, 15, 43, 19, 353975, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 8, 11, 15, 34, 48, 164613, tzinfo=tzutc())}, {'name': 'sentiment', 'versions': 2, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 8, 11, 15, 43, 20, 40661, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 8, 11, 15, 34, 48, 913135, tzinfo=tzutc())}], 'pipelines': [{'name': 'tutorialpipeline-jch', 'create_time': datetime.datetime(2023, 8, 3, 19, 36, 31, 732163, tzinfo=tzutc()), 'definition': '[]'}, {'name': 'sentiment-analysis', 'create_time': datetime.datetime(2023, 8, 11, 15, 34, 49, 622995, tzinfo=tzutc()), 'definition': '[]'}]}

Model Validation Rules

A simple way to try to keep your model’s behavior up to snuff is to make sure that it receives inputs that it expects, and that its output is something that downstream systems can handle. This can entail specifying rules that document what you expect, and either enforcing these rules (by refusing to make a prediction), or at least logging an alert that the expectations described by your validation rules have been violated. As the developer of the model, the data scientist (along with relevant subject matter experts) will often be the person in the best position to specify appropriate validation rules.

In our IMDB sentiment model, rankings are typically arount 50-90%. So any rankings below 50% may indicate either a very negative review, or that some prediction is off with the model and should be investigated further.

Note that in this specific example, a model prediction outside the specified range may not necessarily be “wrong”; but out-of-range predictions are likely unusual enough that you may want to “sanity-check” the model’s behavior in these situations.

Wallaroo has functionality for specifying simple validation rules on model input and output values.

pipeline.add_validation(<rulename>, <expression>)

Here, <rulename> is the name of the rule, and <expression> is a simple logical expression that the data scientist expects to be true. This means if the expression proves false, then a check_failure flag is set in the inference results.

To add a validation step to a simple one-step pipeline, you need a handle to the pipeline (here called pipeline), and a handle to the model in the pipeline (here called model). Then you can specify an expected prediction range as follows:

Get the pipeline.
Depending on the steps, the pipeline can be cleared and the sample model added as a step.
Add the validation.
Deploy the pipeline to set the validation as part of its steps

# get the existing pipeline (in your workspace)
pipeline = get_pipeline("pipeline")

# you also need a handle to the model in this single-step pipeline.
# here are two ways to do it:
#
# (1) If you know the name of the model, you can also just use the get_model() convenience function above.
# In this example, the model has been uploaded to wallaroo with the name "mymodel"

model = get_model("mymodel") 

# (2) To get the model without knowing its name (for a single-step pipeline)
model = pipeline.model_configs()[0].model()

# specify the bounds
hi_bnd = 1500000.0 # 1.5M

#
# some examples of validation rules
#

# (1)  validation rule: prediction should be < 1.5 million
pipeline = pipeline.add_validation("less than 1.5m", model.outputs[0][0] < hi_bnd)

# deploy the pipeline to set the steps
pipeline.deploy()

When data is passed to the pipeline for inferences, the pipeline will log a check failure whenever one of the validation expressions evaluates to false. Here are some examples of inference results from a pipeline with the validation rule model.outputs[0][0] < 400000.0.

You can also find check failures in the logs:

logs = pipeline.logs()
logs.loc[logs['check_failures'] > 0]

Exercise: Add validation rules to your model pipeline

Add some simple validation rules to the model pipeline that you created in a previous exercise.

Add an upper bound or a lower bound to the model predictions
Try to create predictions that fall both in and out of the specified range
Look through the logs to find the check failures.

HINT 1: since the purpose of this exercise is try out validation rules, it might be a good idea to take a small data set and make predictions on that data set first, then set the validation rules based on those predictions, so that you can see the check failures trigger.

Don’t forget to undeploy your pipeline after you are done to free up resources.

## blank space to get your pipeline and run a small batch of data through it to see the range of predictions

embedder_name = 'embedder'
sentiment_model_name = 'sentiment'

pipeline_name = 'sentiment-analysis'

pipeline = get_pipeline(pipeline_name)

embedder_model = get_model(embedder_name)
sentiment_model = get_model(sentiment_model_name)

pipeline.clear()
pipeline.add_model_step(embedder_model)
pipeline.add_model_step(sentiment_model)

pipeline.deploy()

df = pd.read_json('../data/test_data_50K.df.json')

singleton = get_singleton(df, 0)
display(singleton)

single_result = pipeline.infer(singleton)
display(single_result)

	tensor
0	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

tensor

	time	in.tensor	out.dense_1	check_failures
0	2023-08-11 16:18:23.435	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.8980188]	0

# blank space to set a validation rule on the pipeline and check if it triggers as expected

# 50%
hi_bnd = 50.0

pipeline = pipeline.add_validation("less than 50%", sentiment_model.outputs[0][0] > hi_bnd / 100)

pipeline.deploy()

multiple_batch = get_batch(df, nrows=5)
multiple_result = pipeline.infer(multiple_batch)
display(multiple_result)

	time	in.tensor	out.dense_1	check_failures
0	2023-08-11 16:18:26.141	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.8980188]	0
1	2023-08-11 16:18:26.141	[[54.0, 548.0, 86.0, 70.0, 1213.0, 24.0, 746.0, 6.0, 11.0, 19.0, 6.0, 3.0, 1898.0, 90.0, 370.0, 113.0, 832.0, 367.0, 154.0, 10.0, 78.0, 21.0, 121.0, 135.0, 4717.0, 5.0, 350.0, 2.0, 1594.0, 122.0, 3.0, 26.0, 6.0, 2315.0, 30.0, 9.0, 22.0, 103.0, 1.0, 2253.0, 20.0, 1.0, 285.0, 2.0, 1.0, 93.0, 26.0, 44.0, 3.0, 367.0, 790.0, 87.0, 184.0, 26.0, 40.0, 9.0, 53.0, 26.0, 1383.0, 109.0, 1.0, 2211.0, 4.0, 688.0, 26.0, 6.0, 3.0, 75.0, 281.0, 26.0, 1784.0, 69.0, 4.0, 157.0, 4311.0, 1720.0, 2124.0, 46.0, 86.0, 44.0, 66.0, 11.0, 19.0, 614.0, 30.0, 540.0, 1927.0, 4588.0, 2.0, 159.0, 555.0, 118.0, 5924.0, 81.0, 264.0, 15.0, 2.0, 688.0, 530.0, 20.0]]	[0.056596935]	1
2	2023-08-11 16:18:26.141	[[1.0, 9259.0, 6.0, 8.0, 1.0, 3.0, 62.0, 4.0, 32.0, 4416.0, 34.0, 457.0, 8595.0, 31.0, 1.0, 497.0, 2.0, 8.0, 1.0, 972.0, 2847.0, 2178.0, 24.0, 110.0, 2.0, 1.0, 1918.0, 60.0, 1072.0, 1.0, 129.0, 26.0, 44.0, 410.0, 2353.0, 8.0, 49.0, 2.0, 442.0, 8.0, 1.0, 4287.0, 4.0, 24.0, 24.0, 116.0, 599.0, 5074.0, 2.0, 1135.0, 7093.0, 2602.0, 5120.0, 2.0, 22.0, 25.0, 3.0, 450.0, 8596.0, 16.0, 3036.0, 2.0, 1975.0, 385.0, 16.0, 1.0, 1023.0, 931.0, 4.0, 2137.0, 2.0, 1.0, 3022.0, 4.0, 309.0, 4416.0, 294.0, 32.0, 318.0, 19.0, 15.0, 145.0, 80.0, 807.0, 3264.0, 1.0, 4416.0, 294.0, 5034.0, 15.0, 3023.0, 6.0, 32.0, 5514.0, 4.0, 1.0, 1299.0, 2205.0, 493.0, 1.0]]	[0.9260802]	0
3	2023-08-11 16:18:26.141	[[10.0, 25.0, 107.0, 1.0, 343.0, 17.0, 3.0, 168.0, 150.0, 593.0, 100.0, 12.0, 10.0, 103.0, 29.0, 2278.0, 1.0, 83.0, 28.0, 6.0, 63.0, 21.0, 1.0, 115.0, 18.0, 42.0, 1.0, 88.0, 1060.0, 28.0, 204.0, 458.0, 103.0, 1.0, 228.0, 4.0, 6887.0, 4252.0, 297.0, 42.0, 63.0, 84.0, 48.0, 131.0, 490.0, 119.0, 79.0, 1.0, 2278.0, 23.0, 318.0, 8.0, 1.0, 315.0, 299.0, 190.0, 126.0, 576.0, 5.0, 103.0, 9.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.926919]	0
4	2023-08-11 16:18:26.141	[[10.0, 37.0, 1.0, 49.0, 2.0, 442.0, 982.0, 10.0, 420.0, 1807.0, 8.0, 11.0, 17.0, 125.0, 71.0, 98.0, 17.0, 26.0, 44.0, 123.0, 221.0, 26.0, 283.0, 1.0, 1389.0, 9260.0, 121.0, 9.0, 29.0, 26.0, 628.0, 295.0, 26.0, 284.0, 480.0, 2.0, 3.0, 50.0, 4484.0, 482.0, 1.0, 189.0, 12.0, 9.0, 284.0, 47.0, 23.0, 3108.0, 180.0, 8.0, 1822.0, 2.0, 20.0, 699.0, 71.0, 72.0, 67.0, 101.0, 4.0, 405.0, 69.0, 437.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.6618577]	0

Congratulations!

In this tutorial you have

Set a validation rule on your sentiment model pipeline pipeline.
Detected model predictions that failed the validation rule.

In the next notebook, you will learn how to monitor the distribution of model outputs for drift away from expected behavior.

Cleaning up.

At this point, if you are not continuing on to the next notebook, undeploy your pipeline to give the resources back to the environment.

## blank space to undeploy the pipeline

pipeline.undeploy()

name	sentiment-analysis
created	2023-08-11 15:34:49.622995+00:00
last_updated	2023-08-11 16:18:23.859761+00:00
deployed	False
tags
versions	3665c777-a8ee-409c-a058-40e90aeb60f7, 5bdf82eb-25f0-41c5-8812-06da87b5d2e1, 71fdf82d-9e04-4977-a69b-1e49e9452b5c, 1210dd5d-38db-4561-8721-694ac690063a, f7853459-c4f5-4b71-a3b4-520e3b257245, cf689a7d-e51a-4d58-96fd-a024ebd3ddba, d1a918df-04fe-4b98-8d1e-01f7831aab44, 7668ad9a-c12d-40a5-8370-c681c87e4786, 8028e5fc-81b7-45a0-8347-61e0e17e20c4
steps	sentiment

3 - Sentiment Analysis: Drift Detection

How to use Wallaroo to detect model drift with assays

Tutorial Notebook 4: Observability Part 2 - Drift Detection

In the previous notebook you learned how to add simple validation rules to a pipeline, to monitor whether outputs (or inputs) stray out of some expected range. In this notebook, you will monitor the distribution of the pipeline’s predictions to see if the model, or the environment that it runs it, has changed.

Preliminaries

In the blocks below we will preload some required libraries; we will also redefine some of the convenience functions that you saw in the previous notebooks.

# preload needed libraries 

import wallaroo
from wallaroo.object import EntityNotFoundError

from IPython.display import display

# used to display DataFrame information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

import json
import datetime
import time

# used for unique connection names

import string
import random

## convenience functions from the previous notebooks
## these functions assume your connection to wallaroo is called wl

# return the workspace called <name>, or create it if it does not exist.
# this function assumes your connection to wallaroo is called wl
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

# pull a single datum from a data frame 
# and convert it to the format the model expects
def get_singleton(df, i):
    singleton = df.iloc[i,:].to_numpy().tolist()
    sdict = {'tensor': [singleton]}
    return pd.DataFrame.from_dict(sdict)

# pull a batch of data from a data frame
# and convert to the format the model expects
def get_batch(df, first=0, nrows=1):
    last = first + nrows
    batch = df.iloc[first:last, :].to_numpy().tolist()
    return pd.DataFrame.from_dict({'tensor': batch})

# Get the most recent version of a model in the workspace
# Assumes that the most recent version is the first in the list of versions.
# wl.get_current_workspace().models() returns a list of models in the current workspace

def get_model(mname):
    modellist = wl.get_current_workspace().models()
    model = [m.versions()[-1] for m in modellist if m.name() == mname]
    if len(model) <= 0:
        raise KeyError(f"model {mname} not found in this workspace")
    return model[0]

# get a pipeline by name in the workspace
def get_pipeline(pname):
    plist = wl.get_current_workspace().pipelines()
    pipeline = [p for p in plist if p.name() == pname]
    if len(pipeline) <= 0:
        raise KeyError(f"pipeline {pname} not found in this workspace")
    return pipeline[0]

## blank space to log in and go to correct workspace

wl = wallaroo.Client()

workspace_name = f"tutorial-workspace"

workspace = get_workspace(workspace_name)

# set your current workspace to the workspace that you just created
wl.set_current_workspace(workspace)

# optionally, examine your current workspace
wl.get_current_workspace()

{'name': 'tutorial-workspace-jch', 'id': 19, 'archived': False, 'created_by': '0a36fba2-ad42-441b-9a8c-bac8c68d13fa', 'created_at': '2023-08-03T19:34:42.324336+00:00', 'models': [{'name': 'tutorial-model', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 8, 3, 19, 36, 31, 13200, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 8, 3, 19, 36, 31, 13200, tzinfo=tzutc())}, {'name': 'embedder', 'versions': 2, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 8, 11, 15, 43, 19, 353975, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 8, 11, 15, 34, 48, 164613, tzinfo=tzutc())}, {'name': 'sentiment', 'versions': 2, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 8, 11, 15, 43, 20, 40661, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 8, 11, 15, 34, 48, 913135, tzinfo=tzutc())}], 'pipelines': [{'name': 'tutorialpipeline-jch', 'create_time': datetime.datetime(2023, 8, 3, 19, 36, 31, 732163, tzinfo=tzutc()), 'definition': '[]'}, {'name': 'sentiment-analysis', 'create_time': datetime.datetime(2023, 8, 11, 15, 34, 49, 622995, tzinfo=tzutc()), 'definition': '[]'}]}

Monitoring for Drift: Shift Happens.

In machine learning, you use data and known answers to train a model to make predictions for new previously unseen data. You do this with the assumption that the future unseen data will be similar to the data used during training: the future will look somewhat like the past.

But the conditions that existed when a model was created, trained and tested can change over time, due to various factors.

A good model should be robust to some amount of change in the environment; however, if the environment changes too much, your models may no longer be making the correct decisions. This situation is known as concept drift; too much drift can obsolete your models, requiring periodic retraining.

Let’s consider the example we’ve been working on: home sale price prediction. You may notice over time that there has been a change in the mix of properties in the listings portfolio: for example a dramatic increase or decrease in expensive properties (or more precisely, properties that the model thinks are expensive)

Such a change could be due to many factors: a change in interest rates; the appearance or disappearance of major sources of employment; new housing developments opening up in the area. Whatever the cause, detecting such a change quickly is crucial, so that the business can react quickly in the appropriate manner, whether that means simply retraining the model on fresher data, or a pivot in business strategy.

In Wallaroo you can monitor your models for signs of drift through the model monitoring and insight capability called Assays. Assays help you track changes in the environment that your model operates within, which can affect the model’s outcome. It does this by tracking the model’s predictions and/or the data coming into the model against an established baseline. If the distribution of monitored values in the current observation window differs too much from the baseline distribution, the assay will flag it. The figure below shows an example of a running scheduled assay.

Figure: A daily assay that’s been running for a month. The dots represent the difference between the distribution of values in the daily observation window, and the baseline. When that difference exceeds the specified threshold (indicated by a red dot) an alert is set.

This next set of exercises will walk you through setting up an assay to monitor the predictions of your sentiment model, in order to detect drift.

NOTE

An assay is a monitoring process that typically runs over an extended, ongoing period of time. For example, one might set up an assay that every day monitors the previous 24 hours’ worth of predictions and compares it to a baseline. For the purposes of these exercises, we’ll be compressing processes what normally would take hours or days into minutes.

Exercise Prep: Create some datasets for demonstrating assays

Because assays are designed to detect changes in distributions, let’s try to set up data with different distributions to test with. Take your IMDB data and create two sets: a set with lower scored results, and a set with higher scored results. You can split however you choose.

The idea is we will pretend that the set of higher results represent the “typical” mix of reviews when you set your baseline. Later the “lower” results can be used to compare against the baseline to trigger an assay alert.

If you are using the pre-provided models to do these exercises, you can use the provided data sets lowscore.df.json and highscore.df.json. This is to establish our baseline as a set of known values, so the lower scores will trigger our assay alerts.

low_data = pd.read_json('lowscore.df.json')
high_data = pd.read_json('highscore.df.json')

Note that the data in these files are already in the form expected by the models, so you don’t need to use the get_singleton or get_batch convenience functions to infer.

At the end of this exercise, you should have two sets of data to demonstrate assays. In the discussion below, we’ll refer to these sets as low_data and high_data.

# blank spot to split or download data

low_data = pd.read_json('../data/lowscore.df.json')
high_data = pd.read_json('../data/highscore.df.json')

We will use this data to set up some “historical data” in the sentiment model pipeline that you build in the assay exercises.

Setting up a baseline for the assay

In order to know whether the distribution of your model’s predictions have changed, you need a baseline to compare them to. This baseline should represent how you expect the model to behave at the time it was trained. This might be approximated by the distribution of the model’s predictions over some “typical” period of time. For example, we might collect the predictions of our model over the first few days after it’s been deployed. For these exercises, we’ll compress that to a few minutes. Currently, to set up a wallaroo assay the pipeline must have been running for some period of time, and the assumption is that this period of time is “typical”, and that the distributions of the inputs and the outputs of the model during this period of time are “typical.”

Exercise Prep: Run some inferences and set some time stamps

Here, we simulate having a pipeline that’s been running for a long enough period of time to set up an assay.

To send enough data through the pipeline to create assays, you execute something like the following code (using the appropriate names for your pipelines and models). Note that this step will take a little while, because of the sleep interval.

You will need the timestamps baseline_start, and baseline_end, for the next exercises.

# get your pipeline (in this example named "mypipeline")

pipeline = get_pipeline("mypipeline")
pipeline.deploy()

## Run some baseline data
# Where the baseline data will start
baseline_start = datetime.datetime.now()

# the number of samples we'll use for the baseline
nsample = 500

# Wait 30 seconds to set this data apart from the rest
# then send the data in batch
time.sleep(30)

# get a sample
lowprice_data_sample = lowprice_data.sample(nsample, replace=True).reset_index(drop=True)
pipeline.infer(lowprice_data_sample)

# Set the baseline end
baseline_end = datetime.datetime.now()

# blank space to get pipeline and set up baseline data

## blank space to get your pipeline and run a small batch of data through it to see the range of predictions

embedder_name = 'embedder'
sentiment_model_name = 'sentiment'

pipeline_name = 'sentiment-analysis'

pipeline = get_pipeline(pipeline_name)

embedder_model = get_model(embedder_name)
sentiment_model = get_model(sentiment_model_name)

pipeline.clear()
pipeline.add_model_step(embedder_model)
pipeline.add_model_step(sentiment_model)

pipeline.deploy()

df = pd.read_json('../data/test_data_50K.df.json')

singleton = get_singleton(df, 0)
display(singleton)

single_result = pipeline.infer(singleton)
display(single_result)

	tensor
0	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]

tensor

	time	in.tensor	out.dense_1	check_failures
0	2023-08-11 16:37:27.521	[[11.0, 6.0, 1.0, 12.0, 112.0, 13.0, 14.0, 73.0, 14.0, 10.0, 470.0, 5.0, 116.0, 9.0, 207.0, 465.0, 96.0, 15.0, 69.0, 5.0, 231.0, 15.0, 9.0, 91.0, 812.0, 6.0, 28.0, 4.0, 58.0, 511.0, 9654.0, 148.0, 6792.0, 20.0, 1.0, 82.0, 505.0, 1098.0, 30.0, 3.0, 7476.0, 2.0, 2032.0, 96.0, 547.0, 1059.0, 2.0, 148.0, 42.0, 640.0, 4716.0, 8.0, 91.0, 1670.0, 4939.0, 783.0, 41.0, 3.0, 529.0, 449.0, 9.0, 492.0, 85.0, 3050.0, 2.0, 1.0, 357.0, 4.0, 1.0, 174.0, 468.0, 8.0, 84.0, 351.0, 155.0, 155.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]	[0.8980188]	0

Before setting up an assay on this pipeline’s output, we may want to look at the distribution of the predictions over our selected baseline period. To do that, we’ll create an assay_builder that specifies the pipeline, the model in the pipeline, and the baseline period.. We’ll also specify that we want to look at the output of the model, which in the example code is named variable, and would appear as out.variable in the logs.

# print out one of the logs to get the name of the output variable
display(pipeline.logs(limit=1))

# get the model name directly off the pipeline (you could just hard code this, if you know the name)

model_name = pipeline.model_configs()[0].model().name()

assay_builder = ( wl.build_assay(assay_name, pipeline, model_name, 
                     baseline_start, baseline_end)
                    .add_iopath("output variable 0") ) # specify that we are looking at the first output of the output variable "variable"

where baseline_start and baseline_end are the beginning and end of the baseline periods as datetime.datetime objects.

You can then examine the distribution of variable over the baseline period:

assay_builder.baseline_histogram()

Exercise: Create an assay builder and set a baseline

Create an assay builder to monitor the output of your sentiment pipeline. The baseline period should be from baseline_start to baseline_end.

You will need to know the name of your output variable, and the name of the model in the pipeline.

Examine the baseline distribution.

## Blank space to create an assay builder and examine the baseline distribution

display(pipeline.logs(limit=1))
model_name = pipeline.model_configs()[0].model().name()
display(model_name)

import datetime
import time
baseline_start = datetime.datetime.now()
time.sleep(5)

pipeline.infer(high_data)

time.sleep(5)

baseline_end = datetime.datetime.now()

assay_builder = ( wl.build_assay('sample imdb assay', pipeline, sentiment_model_name, 
                     baseline_start, baseline_end)
                    .add_iopath("output dense_1 0") )

assay_builder.baseline_histogram()

Warning: There are more logs available. Please set a larger limit or request a file using export_logs.

	time	in.tensor	out.dense_1	check_failures
0	2023-08-11 16:49:45.089	[10.0, 216.0, 11.0, 17.0, 20.0, 245.0, 2.0, 444.0, 9.0, 10.0, 241.0, 3.0, 144.0, 1688.0, 19.0, 334.0, 2.0, 11.0, 28.0, 13.0, 84.0, 1.0, 174.0, 13.0, 90.0, 4.0, 46.0, 63.0, 218.0, 81.0, 8221.0, 6.0, 207.0, 84.0, 2.0, 1021.0, 6.0, 8.0, 3.0, 2760.0, 4.0, 24.0, 202.0, 26.0, 67.0, 294.0, 196.0, 209.0, 2.0, 707.0, 8.0, 1.0, 169.0, 17.0, 37.0, 168.0, 406.0, 67.0, 1.0, 62.0, 344.0, 6.0, 84.0, 96.0, 1.0, 194.0, 4.0, 109.0, 499.0, 5.0, 790.0, 3.0, 55.0, 344.0, 4.0, 48.0, 77.0, 590.0, 2.0, 5.0, 358.0, 11.0, 55.0, 344.0, 5.0, 3625.0, 3.0, 4573.0, 1688.0, 6.0, 32.0, 218.0, 323.0, 2.0, 11.0, 17.0, 958.0, 9.0, 43.0, 8.0]	[0.871647]	0

'embedder'

An assay should detect if the distribution of model predictions changes from the above distribution over regularly sampled observation windows. This is called drift.

To show drift, we’ll run more data through the pipeline – first some data drawn from the same distribution as the baseline (lowprice_data). Then, we will gradually introduce more data from a different distribution (highprice_data). We should see the difference between the baseline distribution and the distribution in the observation window increase.

To set up the data, you should do something like the below. It will take a while to run, because of all the sleep intervals.

You will need the assay_window_end for a later exercise.

IMPORTANT NOTE: To generate the data for the assay, this process may take 4-5 minutes. Because the shortest period of time for an assay window is 1 minute, the intervals of inference data are spaced to fall within that time period. Here’s an example based on a house price model.

# Set the start for our assay window period.
assay_window_start = datetime.datetime.now()

# Run a set of house values, spread across a "longer" period of time

# run "typical" data
for x in range(4):
    pipeline.infer(lowprice_data.sample(2*nsample, replace=True).reset_index(drop=True))
    time.sleep(25)
    
# run a mix
for x in range(3):
    pipeline.infer(lowprice_data.sample(nsample, replace=True).reset_index(drop=True))
    pipeline.infer(highprice_data.sample(nsample, replace=True).reset_index(drop=True))
    time.sleep(25)
    
# high price houses dominate the sample
for x in range(3):
    pipeline.infer(highprice_data.sample(2*nsample, replace=True).reset_index(drop=True))
    time.sleep(25)

# End our assay window period
assay_window_end = datetime.datetime.now()

Exercise Prep: Run some inferences and set some time stamps

Run more data through the pipeline, manifesting a drift, like the example above. It may around 10 minutes depending on how you stagger the inferences.

## Blank space to run more data

assay_window_start = datetime.datetime.now()

# set the sample size
nsample = 500

# run "typical" data with "low" data
for x in range(4):
    pipeline.infer(high_data.sample(2*nsample, replace=True).reset_index(drop=True))
    time.sleep(35)
    pipeline.infer(low_data.sample(nsample, replace=True).reset_index(drop=True))
    time.sleep(35)
    
# End our assay window period
assay_window_end = datetime.datetime.now()

Defining the Assay Parameters

Now we’re finally ready to set up an assay!

The Observation Window

Once a baseline period has been established, you must define the window of observations that will be compared to the baseline. For instance, you might want to set up an assay that runs every 12 hours, collects the previous 24 hours’ predictions and compares the distribution of predictions within that 24 hour window to the baseline. To set such a comparison up would look like this:

assay_builder.window_builder().add_width(hours=24).add_interval(hours=12)

In other words width is the width of the observation window, and interval is how often an assay (comparison) is run. The default value of width is 24 hours; the default value of interval is to set it equal to width. The units can be specified in one of: minutes, hours, days, weeks.

The Comparison Threshold

Given an observation window and a baseline distribution, an assay computes the distribution of predictions in the observation window. It then calculates the “difference” (or “distance”) between the observed distribution and the baseline distribution. For the assay’s default distance metric (which we will use here), a good starting threshold is 0.1. Since a different value may work best for a specific situation, you can try interactive assay runs on historical data to find a good threshold, as we do in these exercises.

To set the assay threshold for the assays to 0.1:

assay_builder.add_alert_threshold(0.1)

Running an Assay on Historical Data

In this exercise, you will build an interactive assay over historical data. To do this, you need an end time (endtime).

Depending on the historical history, the window and interval may need adjusting. If using the previously generated information, an interval window as short as 1 minute may be useful.

Assuming you have an assay builder with the appropriate window parameters and threshold set, you can do an interactive run and look at the results would look like this.

# set the end of the interactive run
assay_builder.add_run_until(endtime)

# set the window

assay_builder.window_builder().add_width(hours=24).add_interval(hours=12)

assay_results = assay_builder.build().interactive_run()
df = assay_results.to_dataframe() # to return the results as a table
assay_results.chart_scores() # to plot the run

Exercise: Create an interactive assay

Use the assay_builder you created in the previous exercise to set up an interactive assay.

The assay should run every minute, on a window that is a minute wide.
Set the alert threshold to 0.1.
You can use assay_window_end (or a later timestamp) as the end of the interactive run.

Examine the assay results. Do you see any drift?

To try other ways of examining the assay results, see the “Interactive Assay Runs” section of the Model Insights tutorial.

# blank space for setting assay parameters, creating and examining an interactive assay

# set the end of the interactive run
assay_builder.add_run_until(assay_window_end)

# doing minutes to get our previous values in
assay_builder.window_builder().add_width(minutes=1).add_interval(minutes=1)
assay_builder.add_alert_threshold(0.1)
assay_results = assay_builder.build().interactive_run()
df = assay_results.to_dataframe() # to return the results as a table
assay_results.chart_scores() # to plot the run

Scheduling an Assay to run on ongoing data

(We won’t be doing an exercise here, this is for future reference).

Once you are satisfied with the parameters you have set, you can schedule an assay to run regularly .

# create a fresh assay builder with the correct parameters
assay_builder = ( wl.build_assay(assay_name, pipeline, model_name, 
                     baseline_start, baseline_end)
                    .add_iopath("output variable 0") )

# this assay runs every 24 hours on a 24 hour window
assay_builder.window_builder().add_width(hours=24)
assay_builder.add_alert_threshold(0.1)

# now schedule the assay
assay_id = assay_builder.upload()

You can use the assay id later to get the assay results.

You have now walked through setting up a basic assay and running it over historical data.

Congratulations!

In this tutorial you have

Deployed a single step sentiment model pipeline and sent data to it.
Set validation rules on the pipeline.
Set up an assay on the pipeline to monitor for drift in its predictions.

Great job!

Cleaning up.

Now that the tutorial is complete, don’t forget to undeploy your pipeline to free up the resources.

# blank space to undeploy your pipeline

pipeline.undeploy()

name	sentiment-analysis
created	2023-08-11 15:34:49.622995+00:00
last_updated	2023-08-11 16:37:16.355561+00:00
deployed	False
tags
versions	61e2fb5c-53de-48ea-b060-fa4616d6257b, 3665c777-a8ee-409c-a058-40e90aeb60f7, 5bdf82eb-25f0-41c5-8812-06da87b5d2e1, 71fdf82d-9e04-4977-a69b-1e49e9452b5c, 1210dd5d-38db-4561-8721-694ac690063a, f7853459-c4f5-4b71-a3b4-520e3b257245, cf689a7d-e51a-4d58-96fd-a024ebd3ddba, d1a918df-04fe-4b98-8d1e-01f7831aab44, 7668ad9a-c12d-40a5-8370-c681c87e4786, 8028e5fc-81b7-45a0-8347-61e0e17e20c4
steps	sentiment

NLP Classification: Sentiment Analysis

1 - Sentiment Analysis: Upload and Deploy

Tutorial Notebook 1: Build and Deploy a Model

Exercise: Build a model

Getting Ready to deploy

Exercise: Convert your Model to ONNX

Get ready to work with Wallaroo

Exercise: Log in and create a workspace

Deploy a Simple Single-Step Pipeline

Additional Data Processing Steps

Exercise: Upload and deploy your model

Sending Data to your Pipeline

Converting inputs

Execute the following code block to see examples of what get_singleton and get_batch do.

Exercise: Send data to your pipeline for inference.

Undeploying Your Pipeline

Congratulations!

2 - Sentiment Analysis: Validation Rules

Tutorial Notebook 2: Observability Part 1 - Validation Rules

Preliminaries

Model Validation Rules

Exercise: Add validation rules to your model pipeline

Congratulations!

Cleaning up.

3 - Sentiment Analysis: Drift Detection

Tutorial Notebook 4: Observability Part 2 - Drift Detection

Preliminaries

Monitoring for Drift: Shift Happens.

NOTE

Exercise Prep: Create some datasets for demonstrating assays

Setting up a baseline for the assay

Exercise Prep: Run some inferences and set some time stamps

Exercise: Create an assay builder and set a baseline

Exercise Prep: Run some inferences and set some time stamps

Defining the Assay Parameters

The Observation Window

The Comparison Threshold

Running an Assay on Historical Data

Exercise: Create an interactive assay

Scheduling an Assay to run on ongoing data

Congratulations!

Cleaning up.

Execute the following code block to see examples of what `get_singleton` and `get_batch` do.