Wallaroo Model Observability: Anomaly Detection with House Price Prediction

How to detect anomalous model inputs or outputs using the CCFraud model as an example.

The following tutorials are available from the Wallaroo Tutorials Repository.

Wallaroo Model Observability: Anomaly Detection with House Price Prediction

The following tutorial demonstrates the use case of detecting anomalies: inference input or output data that does not match typical validations.

Wallaroo provides validations to detect anomalous data from inference inputs and outputs. Validations are added to a Wallaroo pipeline with the wallaroo.pipeline.add_validations method.

Adding validations takes the format:

pipeline.add_validations(
    validation_name_01 = polars.col(in|out.{column_name}) EXPRESSION,
    validation_name_02 = polars.col(in|out.{column_name}) EXPRESSION
    ...{additional rules}
)
  • validation_name: The user provided name of the validation. The names must match Python variable naming requirements.
    • IMPORTANT NOTE: Using the name count as a validation name returns an error. Any validation rules named count are dropped upon request and a warning returned.
  • polars.col(in|out.{column_name}): Specifies the input or output for a specific field aka “column” in an inference result. Wallaroo inference requests are in the format in.{field_name} for inputs, and out.{field_name} for outputs.
  • EXPRESSION: The expression to validate. When the expression returns True, that indicates an anomaly detected.

The polars library version 0.18.5 is used to create the validation rule. This is installed by default with the Wallaroo SDK. This provides a powerful range of comparisons to organizations tracking anomalous data from their ML models.

When validations are added to a pipeline, inference request outputs return the following fields:

FieldTypeDescription
anomaly.countIntegerThe total of all validations that returned True.
anomaly.{validation name}BoolThe output of the validation {validation_name}.

When validation returns True, an anomaly is detected.

For example, adding the validation fraud to the following pipeline returns anomaly.count of 1 when the validation fraud returns True. The validation fraud returns True when the output field dense_1 at index 0 is greater than 0.9.

sample_pipeline = wallaroo.client.build_pipeline("sample-pipeline")
sample_pipeline.add_model_step(model)

# add the validation
sample_pipeline.add_validations(
    fraud=pl.col("out.dense_1").list.get(0) > 0.9,
    )

# deploy the pipeline
sample_pipeline.deploy()

# sample inference
display(sample_pipeline.infer_from_file("dev_high_fraud.json", data_format='pandas-records'))
 timein.tensorout.dense_1anomaly.countanomaly.fraud
02024-02-02 16:05:42.152[1.0678324729, 18.1555563975, -1.6589551058, 5…][0.981199]1True

Detecting Anomalies from Inference Request Results

When an inference request is submitted to a Wallaroo pipeline with validations, the following fields are output:

FieldTypeDescription
anomaly.countIntegerThe total of all validations that returned True.
anomaly.{validation name}BoolThe output of each pipeline validation {validation_name}.

For example, adding the validation fraud to the following pipeline returns anomaly.count of 1 when the validation fraud returns True.

sample_pipeline = wallaroo.client.build_pipeline("sample-pipeline")
sample_pipeline.add_model_step(model)

# add the validation
sample_pipeline.add_validations(
    fraud=pl.col("out.dense_1").list.get(0) > 0.9,
    )

# deploy the pipeline
sample_pipeline.deploy()

# sample inference
display(sample_pipeline.infer_from_file("dev_high_fraud.json", data_format='pandas-records'))
 timein.tensorout.dense_1anomaly.countanomaly.fraud
02024-02-02 16:05:42.152[1.0678324729, 18.1555563975, -1.6589551058, 5…][0.981199]1True

Anomaly Detection Demonstration

The following demonstrates how to:

  • Upload a house price ML model trained to predict house prices based on a set of inputs. This outputs the field variable as an float which is the predicted house price.
  • Add the house price model as a pipeline step.
  • Add the validation too_high to detect when a house price exceeds a certain value.
  • Deploy the pipeline and performing sample inferences on it.
  • Perform sample inferences to show when the too_high validation returns True and False.
  • Perform sample inference with different datasets to show enable or disable certain fields from displaying in the inference results.

Prerequisites

  • Wallaroo version 2023.4.1 and above.
  • polars version 0.18.5. This is installed by default with the Wallaroo SDK.

Tutorial Steps

Load Libraries

The first step is to import the libraries used in this notebook.

import wallaroo
wallaroo.__version__
'2023.4.1+379cb6b8a'

Connect to the Wallaroo Instance through the User Interface

The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

wl = wallaroo.Client()

Create a New Workspace

We’ll use the SDK below to create our workspace then assign as our current workspace. The current workspace is used by the Wallaroo SDK for where to upload models, create pipelines, etc. We’ll also set up variables for our models and pipelines down the road, so we have one spot to change names to whatever fits your organization’s standards best.

Before starting, verify that the workspace name is unique in your Wallaroo instance.

def get_workspace(name, client):
    workspace = None
    for ws in client.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = client.create_workspace(name)
    return workspace

workspace_name = 'validation-house-price-demonstration'
pipeline_name = 'validation-demo'
model_name = 'anomaly-housing-model'
model_file_name = './models/rf_model.onnx'
workspace = get_workspace(workspace_name, wl)
wl.set_current_workspace(workspace)
{'name': 'validation-house-price-demonstration', 'id': 25, 'archived': False, 'created_by': 'c97d480f-6064-4537-b18e-40fb1864b4cd', 'created_at': '2024-02-08T21:52:50.354176+00:00', 'models': [{'name': 'anomaly-housing-model', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2024, 2, 8, 21, 52, 51, 671284, tzinfo=tzutc()), 'created_at': datetime.datetime(2024, 2, 8, 21, 52, 51, 671284, tzinfo=tzutc())}], 'pipelines': [{'name': 'validation-demo', 'create_time': datetime.datetime(2024, 2, 8, 21, 52, 52, 879885, tzinfo=tzutc()), 'definition': '[]'}]}

Upload the Model

Upload the model to the Wallaroo workspace with the wallaroo.client.upload_model method. Our house price ML model is a Wallaroo Default Runtime of type ONNX, so all we need is the model name, the model file path, and the framework type of wallaroo.framework.Framework.ONNX.

model = (wl.upload_model(model_name, 
                                 model_file_name, 
                                 framework=wallaroo.framework.Framework.ONNX)
                )

Build the Pipeline

Pipelines are build with the wallaroo.client.build_pipeline method, which takes the pipeline name. This will create the pipeline in our default workspace. Note that if there are any existing pipelines with the same name in this workspace, this method will retrieve that pipeline for this SDK session.

Once the pipeline is created, we add the ccfraud model as our pipeline step.

sample_pipeline = wl.build_pipeline(pipeline_name)
sample_pipeline.clear()
sample_pipeline = sample_pipeline.add_model_step(model)
import onnx

model = onnx.load(model_file_name)
output =[node.name for node in model.graph.output]

input_all = [node.name for node in model.graph.input]
input_initializer =  [node.name for node in model.graph.initializer]
net_feed_input = list(set(input_all)  - set(input_initializer))

print('Inputs: ', net_feed_input)
print('Outputs: ', output)
Inputs:  ['float_input']
Outputs:  ['variable']

Add Validation

Now we add our validation to our new pipeline. We will give it the following configuration.

  • Validation Name: too_high
  • Validation Field: out.variable
  • Validation Field Index: 0
  • Validation Expression: Values greater than 1000000.0.

The polars library is required for creating the validation. We will import the polars library, then add our validation to the pipeline.

  • IMPORTANT NOTE: Validation names must be unique per pipeline. If a validation of the same name is added, both are included in the pipeline validations, but only most recent validation with the same name is displayed with the inference results. Anomalies detected by multiple validations of the same name are added to the anomaly.count inference result field.
import polars as pl

sample_pipeline = sample_pipeline.add_validations(
    too_high=pl.col("out.variable").list.get(0) > 1000000.0
)

Display Pipeline And Validation Steps

The method wallaroo.pipeline.steps() shows the current pipeline steps. The added validations are in the Check field. This is used for demonstration purposes to show the added validation to the pipeline.

sample_pipeline.steps()
[{'ModelInference': {'models': [{'name': 'anomaly-housing-model', 'version': '9a76a2cf-9ea3-4978-8fd5-005d0280e661', 'sha': 'e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6'}]}},
 {'Check': {'tree': ['{"Alias":[{"BinaryExpr":{"left":{"Function":{"input":[{"Column":"out.variable"},{"Literal":{"Int32":0}}],"function":{"ListExpr":"Get"},"options":{"collect_groups":"ApplyFlat","fmt_str":"","input_wildcard_expansion":false,"auto_explode":true,"cast_to_supertypes":false,"allow_rename":false,"pass_name_to_apply":false,"changes_length":false,"check_lengths":true,"allow_group_aware":true}}},"op":"Gt","right":{"Literal":{"Float64":1000000.0}}}},"too_high"]}']}}]

Deploy Pipeline

With the pipeline steps set and the validations created, we deploy the pipeline. Because of it’s size, we will only allocate 0.1 cpu from the cluster for the pipeline’s use.

deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.25)\
    .build()

sample_pipeline.deploy(deployment_config=deploy_config)
Waiting for undeployment - this will take up to 45s ..................................... ok
Waiting for deployment - this will take up to 45s ......... ok
namevalidation-demo
created2024-02-08 21:52:52.879885+00:00
last_updated2024-02-08 22:14:13.217863+00:00
deployedTrue
archNone
tags
versions2a8d204a-f359-4f02-b558-950dbab28dc6, 424c7f24-ca65-45af-825f-64d3e9f8e8c8, 190226c9-a536-4457-9851-a68ef968b6fc, e10707fd-75b8-4386-8466-e58dc13d2828, 53b87a42-8498-475a-bf30-73fdebbf85cc
stepsanomaly-housing-model
publishedFalse

Sample Inferences

Two sample inferences are performed with the method wallaroo.pipeline.infer_from_file that takes either a pandas Record JSON file or an Apache Arrow table as the input.

For our demonstration, we will use the following pandas Record JSON file with the following sample data:

  • ./data/houseprice_5000_data.json: A sample sets of 5000 houses to generates a range of predicted values.

The inference request returns a pandas DataFrame.

Each of the inference outputs will include the following fields:

FieldTypeDescription
timeDateTimeThe DateTime of the inference request.
in.{input_field_name}Input DependentEach input field submitted is labeled as in.{input_field_name} in the inference request result. For our example, this is tensor, so the input field in the returned inference request is in.tensor.
out.{model_output_field_name}Output DependentEach field output by the ML model is labeled as out.{model_output_field_name} in the inference request result. For our example, the ccfraud model returns dense_1 as its output field, so the output field in the returned inference request is out.dense_1.
anomaly.countIntegerThe total number of validations that returned True.
**anomaly.{validation_name}BoolEach validation added to the pipeline is returned as anomaly.{validation_name}, and returns either True if the validation returns True, indicating an anomaly is found, or False for an anomaly for the validation is not found. For our example, we will have anomaly.fraud returned.
results = sample_pipeline.infer_from_file('./data/test-1000.df.json')
# first 20 results
display(results.head(20))

# only results that trigger the anomaly too_high
results.loc[results['anomaly.too_high'] == True]
timein.float_inputout.variableanomaly.countanomaly.too_high
02024-02-08 22:14:47.075[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,...[718013.75]0False
12024-02-08 22:14:47.075[2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,...[615094.56]0False
22024-02-08 22:14:47.075[3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, ...[448627.72]0False
32024-02-08 22:14:47.075[4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,...[758714.2]0False
42024-02-08 22:14:47.075[3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4....[513264.7]0False
52024-02-08 22:14:47.075[3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0,...[668288.0]0False
62024-02-08 22:14:47.075[4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0,...[1004846.5]1True
72024-02-08 22:14:47.075[3.0, 2.0, 1280.0, 960.0, 2.0, 0.0, 0.0, 3.0, ...[684577.2]0False
82024-02-08 22:14:47.075[4.0, 2.5, 2820.0, 15000.0, 2.0, 0.0, 0.0, 4.0...[727898.1]0False
92024-02-08 22:14:47.075[3.0, 2.25, 1790.0, 11393.0, 1.0, 0.0, 0.0, 3....[559631.1]0False
102024-02-08 22:14:47.075[3.0, 1.5, 1010.0, 7683.0, 1.5, 0.0, 0.0, 5.0,...[340764.53]0False
112024-02-08 22:14:47.075[3.0, 2.0, 1270.0, 1323.0, 3.0, 0.0, 0.0, 3.0,...[442168.06]0False
122024-02-08 22:14:47.075[4.0, 1.75, 2070.0, 9120.0, 1.0, 0.0, 0.0, 4.0...[630865.6]0False
132024-02-08 22:14:47.075[4.0, 1.0, 1620.0, 4080.0, 1.5, 0.0, 0.0, 3.0,...[559631.1]0False
142024-02-08 22:14:47.075[4.0, 3.25, 3990.0, 9786.0, 2.0, 0.0, 0.0, 3.0...[909441.1]0False
152024-02-08 22:14:47.075[4.0, 2.0, 1780.0, 19843.0, 1.0, 0.0, 0.0, 3.0...[313096.0]0False
162024-02-08 22:14:47.075[4.0, 2.5, 2130.0, 6003.0, 2.0, 0.0, 0.0, 3.0,...[404040.8]0False
172024-02-08 22:14:47.075[3.0, 1.75, 1660.0, 10440.0, 1.0, 0.0, 0.0, 3....[292859.5]0False
182024-02-08 22:14:47.075[3.0, 2.5, 2110.0, 4118.0, 2.0, 0.0, 0.0, 3.0,...[338357.88]0False
192024-02-08 22:14:47.075[4.0, 2.25, 2200.0, 11250.0, 1.5, 0.0, 0.0, 5....[682284.6]0False
timein.float_inputout.variableanomaly.countanomaly.too_high
62024-02-08 22:14:47.075[4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0,...[1004846.5]1True
302024-02-08 22:14:47.075[4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0...[1514079.8]1True
402024-02-08 22:14:47.075[4.0, 4.5, 5120.0, 41327.0, 2.0, 0.0, 0.0, 3.0...[1204324.8]1True
632024-02-08 22:14:47.075[4.0, 3.0, 4040.0, 19700.0, 2.0, 0.0, 0.0, 3.0...[1028923.06]1True
1102024-02-08 22:14:47.075[4.0, 2.5, 3470.0, 20445.0, 2.0, 0.0, 0.0, 4.0...[1412215.3]1True
1302024-02-08 22:14:47.075[4.0, 2.75, 2620.0, 13777.0, 1.5, 0.0, 2.0, 4....[1223839.1]1True
1332024-02-08 22:14:47.075[5.0, 2.25, 3320.0, 13138.0, 1.0, 0.0, 2.0, 4....[1108000.1]1True
1542024-02-08 22:14:47.075[4.0, 2.75, 3800.0, 9606.0, 2.0, 0.0, 0.0, 3.0...[1039781.25]1True
1602024-02-08 22:14:47.075[5.0, 3.5, 4150.0, 13232.0, 2.0, 0.0, 0.0, 3.0...[1042119.1]1True
2102024-02-08 22:14:47.075[4.0, 3.5, 4300.0, 70407.0, 2.0, 0.0, 0.0, 3.0...[1115275.0]1True
2392024-02-08 22:14:47.075[4.0, 3.25, 5010.0, 49222.0, 2.0, 0.0, 0.0, 5....[1092274.1]1True
2482024-02-08 22:14:47.075[4.0, 3.75, 4410.0, 8112.0, 3.0, 0.0, 4.0, 3.0...[1967344.1]1True
2552024-02-08 22:14:47.075[4.0, 3.0, 4750.0, 21701.0, 1.5, 0.0, 0.0, 5.0...[2002393.5]1True
2712024-02-08 22:14:47.075[5.0, 3.25, 5790.0, 13726.0, 2.0, 0.0, 3.0, 3....[1189654.4]1True
2812024-02-08 22:14:47.075[3.0, 3.0, 3570.0, 6250.0, 2.0, 0.0, 2.0, 3.0,...[1124493.3]1True
2822024-02-08 22:14:47.075[3.0, 2.75, 3170.0, 34850.0, 1.0, 0.0, 0.0, 5....[1227073.8]1True
2832024-02-08 22:14:47.075[4.0, 2.75, 3260.0, 19542.0, 1.0, 0.0, 0.0, 4....[1364650.3]1True
2852024-02-08 22:14:47.075[4.0, 2.75, 4020.0, 18745.0, 2.0, 0.0, 4.0, 4....[1322835.9]1True
3232024-02-08 22:14:47.075[3.0, 3.0, 2480.0, 5500.0, 2.0, 0.0, 3.0, 3.0,...[1100884.1]1True
3512024-02-08 22:14:47.075[5.0, 4.0, 4660.0, 9900.0, 2.0, 0.0, 2.0, 4.0,...[1058105.0]1True
3602024-02-08 22:14:47.075[4.0, 3.5, 3770.0, 8501.0, 2.0, 0.0, 0.0, 3.0,...[1169643.0]1True
3982024-02-08 22:14:47.075[3.0, 2.25, 2390.0, 7875.0, 1.0, 0.0, 1.0, 3.0...[1364149.9]1True
4142024-02-08 22:14:47.075[5.0, 3.5, 5430.0, 10327.0, 2.0, 0.0, 2.0, 3.0...[1207858.6]1True
4432024-02-08 22:14:47.075[5.0, 4.0, 4360.0, 8030.0, 2.0, 0.0, 0.0, 3.0,...[1160512.8]1True
4972024-02-08 22:14:47.075[4.0, 2.5, 4090.0, 11225.0, 2.0, 0.0, 0.0, 3.0...[1048372.4]1True
5132024-02-08 22:14:47.075[4.0, 3.25, 3320.0, 8587.0, 3.0, 0.0, 0.0, 3.0...[1130661.0]1True
5202024-02-08 22:14:47.075[5.0, 3.75, 4170.0, 8142.0, 2.0, 0.0, 2.0, 3.0...[1098628.8]1True
5302024-02-08 22:14:47.075[4.0, 4.25, 3500.0, 8750.0, 1.0, 0.0, 4.0, 5.0...[1140733.8]1True
5352024-02-08 22:14:47.075[4.0, 3.5, 4460.0, 16271.0, 2.0, 0.0, 2.0, 3.0...[1208638.0]1True
5562024-02-08 22:14:47.075[4.0, 3.5, 4285.0, 9567.0, 2.0, 0.0, 1.0, 5.0,...[1886959.4]1True
6232024-02-08 22:14:47.075[4.0, 3.25, 4240.0, 25639.0, 2.0, 0.0, 3.0, 3....[1156651.3]1True
6242024-02-08 22:14:47.075[4.0, 3.5, 3440.0, 9776.0, 2.0, 0.0, 0.0, 3.0,...[1124493.3]1True
6342024-02-08 22:14:47.075[4.0, 3.25, 4700.0, 38412.0, 2.0, 0.0, 0.0, 3....[1164589.4]1True
6512024-02-08 22:14:47.075[3.0, 3.0, 3920.0, 13085.0, 2.0, 1.0, 4.0, 4.0...[1452224.5]1True
6582024-02-08 22:14:47.075[3.0, 3.25, 3230.0, 7800.0, 2.0, 0.0, 3.0, 3.0...[1077279.3]1True
6712024-02-08 22:14:47.075[3.0, 3.5, 3080.0, 6495.0, 2.0, 0.0, 3.0, 3.0,...[1122811.8]1True
6852024-02-08 22:14:47.075[4.0, 2.5, 4200.0, 35267.0, 2.0, 0.0, 0.0, 3.0...[1181336.0]1True
6862024-02-08 22:14:47.075[4.0, 3.25, 4160.0, 47480.0, 2.0, 0.0, 0.0, 3....[1082353.3]1True
6982024-02-08 22:14:47.075[4.0, 4.5, 5770.0, 10050.0, 1.0, 0.0, 3.0, 5.0...[1689843.3]1True
7112024-02-08 22:14:47.075[3.0, 2.5, 5403.0, 24069.0, 2.0, 1.0, 4.0, 4.0...[1946437.3]1True
7202024-02-08 22:14:47.075[5.0, 3.0, 3420.0, 18129.0, 2.0, 0.0, 0.0, 3.0...[1325961.0]1True
7222024-02-08 22:14:47.075[3.0, 3.25, 4560.0, 13363.0, 1.0, 0.0, 4.0, 3....[2005883.1]1True
7262024-02-08 22:14:47.075[5.0, 3.5, 4200.0, 5400.0, 2.0, 0.0, 0.0, 3.0,...[1052898.0]1True
7372024-02-08 22:14:47.075[4.0, 3.25, 2980.0, 7000.0, 2.0, 0.0, 3.0, 3.0...[1156206.5]1True
7402024-02-08 22:14:47.075[4.0, 4.5, 6380.0, 88714.0, 2.0, 0.0, 0.0, 3.0...[1355747.1]1True
7822024-02-08 22:14:47.075[5.0, 4.25, 4860.0, 9453.0, 1.5, 0.0, 1.0, 5.0...[1910823.8]1True
7982024-02-08 22:14:47.075[4.0, 2.5, 2790.0, 5450.0, 2.0, 0.0, 0.0, 3.0,...[1097757.4]1True
8182024-02-08 22:14:47.075[4.0, 4.0, 4620.0, 130208.0, 2.0, 0.0, 0.0, 3....[1164589.4]1True
8272024-02-08 22:14:47.075[4.0, 2.5, 3340.0, 10422.0, 2.0, 0.0, 0.0, 3.0...[1103101.4]1True
8282024-02-08 22:14:47.075[5.0, 3.5, 3760.0, 10207.0, 2.0, 0.0, 0.0, 3.0...[1489624.5]1True
9012024-02-08 22:14:47.075[4.0, 2.25, 4470.0, 60373.0, 2.0, 0.0, 0.0, 3....[1208638.0]1True
9122024-02-08 22:14:47.075[3.0, 2.25, 2960.0, 8330.0, 1.0, 0.0, 3.0, 4.0...[1178314.0]1True
9192024-02-08 22:14:47.075[4.0, 3.25, 5180.0, 19850.0, 2.0, 0.0, 3.0, 3....[1295531.3]1True
9412024-02-08 22:14:47.075[4.0, 3.75, 3770.0, 4000.0, 2.5, 0.0, 0.0, 5.0...[1182821.0]1True
9652024-02-08 22:14:47.075[6.0, 4.0, 5310.0, 12741.0, 2.0, 0.0, 2.0, 3.0...[2016006.0]1True
9732024-02-08 22:14:47.075[5.0, 2.0, 3540.0, 9970.0, 2.0, 0.0, 3.0, 3.0,...[1085835.8]1True
9972024-02-08 22:14:47.075[4.0, 3.25, 2910.0, 1880.0, 2.0, 0.0, 3.0, 5.0...[1060847.5]1True

Other Validation Examples

The following are additional examples of validations.

Multiple Validations

The following uses multiple validations to check for anomalies. We still use fraud which detects outputs that are greater than 1000000.0. The second validation too_low triggers an anomaly when the out.variable is under 250000.0.

After the validations are added, the pipeline is redeployed to “set” them.

sample_pipeline = sample_pipeline.add_validations(
    too_low=pl.col("out.variable").list.get(0) < 250000.0
)

deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.1)\
    .build()
sample_pipeline.undeploy()
sample_pipeline.deploy(deployment_config=deploy_config)
Waiting for undeployment - this will take up to 45s ..................................... ok
Waiting for deployment - this will take up to 45s .............. ok
namevalidation-demo
created2024-02-08 21:52:52.879885+00:00
last_updated2024-02-08 22:16:50.265231+00:00
deployedTrue
archNone
tags
versions053580e2-f73d-4b63-8c9c-0b5e06be96c2, 2a8d204a-f359-4f02-b558-950dbab28dc6, 424c7f24-ca65-45af-825f-64d3e9f8e8c8, 190226c9-a536-4457-9851-a68ef968b6fc, e10707fd-75b8-4386-8466-e58dc13d2828, 53b87a42-8498-475a-bf30-73fdebbf85cc
stepsanomaly-housing-model
publishedFalse
results = sample_pipeline.infer_from_file('./data/test-1000.df.json')
# first 20 results
display(results.head(20))

# only results that trigger the anomaly too_high
results.loc[results['anomaly.too_high'] == True]

# only results that trigger the anomaly too_low
results.loc[results['anomaly.too_low'] == True]
timein.float_inputout.variableanomaly.countanomaly.too_highanomaly.too_low
02024-02-08 22:17:23.630[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,...[718013.75]0FalseFalse
12024-02-08 22:17:23.630[2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,...[615094.56]0FalseFalse
22024-02-08 22:17:23.630[3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, ...[448627.72]0FalseFalse
32024-02-08 22:17:23.630[4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,...[758714.2]0FalseFalse
42024-02-08 22:17:23.630[3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4....[513264.7]0FalseFalse
52024-02-08 22:17:23.630[3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0,...[668288.0]0FalseFalse
62024-02-08 22:17:23.630[4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0,...[1004846.5]1TrueFalse
72024-02-08 22:17:23.630[3.0, 2.0, 1280.0, 960.0, 2.0, 0.0, 0.0, 3.0, ...[684577.2]0FalseFalse
82024-02-08 22:17:23.630[4.0, 2.5, 2820.0, 15000.0, 2.0, 0.0, 0.0, 4.0...[727898.1]0FalseFalse
92024-02-08 22:17:23.630[3.0, 2.25, 1790.0, 11393.0, 1.0, 0.0, 0.0, 3....[559631.1]0FalseFalse
102024-02-08 22:17:23.630[3.0, 1.5, 1010.0, 7683.0, 1.5, 0.0, 0.0, 5.0,...[340764.53]0FalseFalse
112024-02-08 22:17:23.630[3.0, 2.0, 1270.0, 1323.0, 3.0, 0.0, 0.0, 3.0,...[442168.06]0FalseFalse
122024-02-08 22:17:23.630[4.0, 1.75, 2070.0, 9120.0, 1.0, 0.0, 0.0, 4.0...[630865.6]0FalseFalse
132024-02-08 22:17:23.630[4.0, 1.0, 1620.0, 4080.0, 1.5, 0.0, 0.0, 3.0,...[559631.1]0FalseFalse
142024-02-08 22:17:23.630[4.0, 3.25, 3990.0, 9786.0, 2.0, 0.0, 0.0, 3.0...[909441.1]0FalseFalse
152024-02-08 22:17:23.630[4.0, 2.0, 1780.0, 19843.0, 1.0, 0.0, 0.0, 3.0...[313096.0]0FalseFalse
162024-02-08 22:17:23.630[4.0, 2.5, 2130.0, 6003.0, 2.0, 0.0, 0.0, 3.0,...[404040.8]0FalseFalse
172024-02-08 22:17:23.630[3.0, 1.75, 1660.0, 10440.0, 1.0, 0.0, 0.0, 3....[292859.5]0FalseFalse
182024-02-08 22:17:23.630[3.0, 2.5, 2110.0, 4118.0, 2.0, 0.0, 0.0, 3.0,...[338357.88]0FalseFalse
192024-02-08 22:17:23.630[4.0, 2.25, 2200.0, 11250.0, 1.5, 0.0, 0.0, 5....[682284.6]0FalseFalse
timein.float_inputout.variableanomaly.countanomaly.too_highanomaly.too_low
212024-02-08 22:17:23.630[2.0, 2.0, 1390.0, 1302.0, 2.0, 0.0, 0.0, 3.0,...[249227.8]1FalseTrue
692024-02-08 22:17:23.630[3.0, 1.75, 1050.0, 9871.0, 1.0, 0.0, 0.0, 5.0...[236238.66]1FalseTrue
832024-02-08 22:17:23.630[3.0, 1.75, 1070.0, 8100.0, 1.0, 0.0, 0.0, 4.0...[236238.66]1FalseTrue
952024-02-08 22:17:23.630[3.0, 2.5, 1340.0, 3011.0, 2.0, 0.0, 0.0, 3.0,...[244380.27]1FalseTrue
1242024-02-08 22:17:23.630[4.0, 1.5, 1200.0, 10890.0, 1.0, 0.0, 0.0, 5.0...[241330.19]1FalseTrue
.....................
9392024-02-08 22:17:23.630[3.0, 1.0, 1150.0, 4800.0, 1.5, 0.0, 0.0, 4.0,...[240834.92]1FalseTrue
9462024-02-08 22:17:23.630[2.0, 1.0, 780.0, 6250.0, 1.0, 0.0, 0.0, 3.0, ...[236815.78]1FalseTrue
9482024-02-08 22:17:23.630[1.0, 1.0, 620.0, 8261.0, 1.0, 0.0, 0.0, 3.0, ...[236815.78]1FalseTrue
9622024-02-08 22:17:23.630[3.0, 1.0, 1190.0, 7500.0, 1.0, 0.0, 0.0, 5.0,...[241330.19]1FalseTrue
9912024-02-08 22:17:23.630[2.0, 1.0, 870.0, 8487.0, 1.0, 0.0, 0.0, 4.0, ...[236238.66]1FalseTrue

62 rows × 6 columns

Compound Validations

The following combines multiple field checks into a single validation. For this, we will check for values of out.variable that are between 500000 and 1000000.

Each expression is separated by (). For example:

  • Expression 1: pl.col("out.variable").list.get(0) < 1000000.0
  • Expression 2: pl.col("out.variable").list.get(0) > 500000.0
  • Compound Expression: (pl.col("out.variable").list.get(0) < 1000000.0) & (pl.col("out.variable").list.get(0) > 500000.0)
sample_pipeline = sample_pipeline.add_validations(
    in_between=(pl.col("out.variable").list.get(0) < 1000000.0) & (pl.col("out.variable").list.get(0) > 500000.0)
)

deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
    .cpus(0.1)\
    .build()
sample_pipeline.undeploy()
sample_pipeline.deploy(deployment_config=deploy_config)
Waiting for undeployment - this will take up to 45s .................................... ok
Waiting for deployment - this will take up to 45s ............. ok
namevalidation-demo
created2024-02-08 21:52:52.879885+00:00
last_updated2024-02-08 22:18:18.613710+00:00
deployedTrue
archNone
tags
versions016728d7-3948-467f-8ecc-e0594f406884, 053580e2-f73d-4b63-8c9c-0b5e06be96c2, 2a8d204a-f359-4f02-b558-950dbab28dc6, 424c7f24-ca65-45af-825f-64d3e9f8e8c8, 190226c9-a536-4457-9851-a68ef968b6fc, e10707fd-75b8-4386-8466-e58dc13d2828, 53b87a42-8498-475a-bf30-73fdebbf85cc
stepsanomaly-housing-model
publishedFalse
results = sample_pipeline.infer_from_file('./data/test-1000.df.json')

results.loc[results['anomaly.in_between'] == True] 
timein.float_inputout.variableanomaly.countanomaly.in_betweenanomaly.too_highanomaly.too_low
02024-02-08 22:18:32.886[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,...[718013.75]1TrueFalseFalse
12024-02-08 22:18:32.886[2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,...[615094.56]1TrueFalseFalse
32024-02-08 22:18:32.886[4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,...[758714.2]1TrueFalseFalse
42024-02-08 22:18:32.886[3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4....[513264.7]1TrueFalseFalse
52024-02-08 22:18:32.886[3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0,...[668288.0]1TrueFalseFalse
........................
9892024-02-08 22:18:32.886[4.0, 2.75, 2500.0, 4950.0, 2.0, 0.0, 0.0, 3.0...[700271.56]1TrueFalseFalse
9932024-02-08 22:18:32.886[3.0, 2.5, 2140.0, 8925.0, 2.0, 0.0, 0.0, 3.0,...[669645.5]1TrueFalseFalse
9952024-02-08 22:18:32.886[3.0, 2.5, 2900.0, 23550.0, 1.0, 0.0, 0.0, 3.0...[827411.0]1TrueFalseFalse
9982024-02-08 22:18:32.886[3.0, 1.75, 2910.0, 37461.0, 1.0, 0.0, 0.0, 4....[706823.56]1TrueFalseFalse
9992024-02-08 22:18:32.886[3.0, 2.0, 2005.0, 7000.0, 1.0, 0.0, 0.0, 3.0,...[581003.0]1TrueFalseFalse

395 rows × 7 columns

Specify Dataset

Wallaroo inference requests allow datasets to be excluded or included with the dataset_exclude and dataset parameters.

ParameterTypeDescription
dataset_excludeList(String)The list of datasets to exclude. Values include:
  • metadata: Returns inference time per model, last model used, and other parameters.
  • anomaly: The anomaly results of all validations added to the pipeline.
datasetList(String)The list of datasets and fields to include.

For our example, we will exclude the anomaly dataset, but include the datasets 'time', 'in', 'out', 'anomaly.count'. Note that while we exclude anomaly, we override that with by setting the anomaly field 'anomaly.count' in our dataset parameter.

sample_pipeline.infer_from_file('./data/test-1000.df.json', 
                                dataset_exclude=['anomaly'], 
                                dataset=['time', 'in', 'out', 'anomaly.count']
                                )
timein.float_inputout.variableanomaly.count
02024-02-08 22:19:04.558[4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,...[718013.75]1
12024-02-08 22:19:04.558[2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,...[615094.56]1
22024-02-08 22:19:04.558[3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, ...[448627.72]0
32024-02-08 22:19:04.558[4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,...[758714.2]1
42024-02-08 22:19:04.558[3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4....[513264.7]1
...............
9952024-02-08 22:19:04.558[3.0, 2.5, 2900.0, 23550.0, 1.0, 0.0, 0.0, 3.0...[827411.0]1
9962024-02-08 22:19:04.558[4.0, 1.75, 2700.0, 7875.0, 1.5, 0.0, 0.0, 4.0...[441960.38]0
9972024-02-08 22:19:04.558[4.0, 3.25, 2910.0, 1880.0, 2.0, 0.0, 3.0, 5.0...[1060847.5]1
9982024-02-08 22:19:04.558[3.0, 1.75, 2910.0, 37461.0, 1.0, 0.0, 0.0, 4....[706823.56]1
9992024-02-08 22:19:04.558[3.0, 2.0, 2005.0, 7000.0, 1.0, 0.0, 0.0, 3.0,...[581003.0]1

1000 rows × 4 columns

Undeploy the Pipeline

With the demonstration complete, we undeploy the pipeline and return the resources back to the cluster.

sample_pipeline.undeploy()
Waiting for undeployment - this will take up to 45s .................................... ok
namevalidation-demo
created2024-02-08 21:52:52.879885+00:00
last_updated2024-02-08 22:18:18.613710+00:00
deployedFalse
archNone
tags
versions016728d7-3948-467f-8ecc-e0594f406884, 053580e2-f73d-4b63-8c9c-0b5e06be96c2, 2a8d204a-f359-4f02-b558-950dbab28dc6, 424c7f24-ca65-45af-825f-64d3e9f8e8c8, 190226c9-a536-4457-9851-a68ef968b6fc, e10707fd-75b8-4386-8466-e58dc13d2828, 53b87a42-8498-475a-bf30-73fdebbf85cc
stepsanomaly-housing-model
publishedFalse