The following tutorials are available from the Wallaroo Tutorials Repository.
The following tutorial demonstrates the use case of detecting anomalies: inference input or output data that does not match typical validations.
Wallaroo provides validations to detect anomalous data from inference inputs and outputs. Validations are added to a Wallaroo pipeline with the wallaroo.pipeline.add_validations
method.
Adding validations takes the format:
pipeline.add_validations(
validation_name_01 = polars.col(in|out.{column_name}) EXPRESSION,
validation_name_02 = polars.col(in|out.{column_name}) EXPRESSION
...{additional rules}
)
validation_name
: The user provided name of the validation. The names must match Python variable naming requirements.count
as a validation name returns a warning. Any validation rules named count
are dropped upon request and an warning returned.polars.col(in|out.{column_name})
: Specifies the input or output for a specific field aka “column” in an inference result. Wallaroo inference requests are in the format in.{field_name}
for inputs, and out.{field_name}
for outputs.EXPRESSION
: The expression to validate. When the expression returns True, that indicates an anomaly detected.The polars
library version 0.18.5 is used to create the validation rule. This is installed by default with the Wallaroo SDK. This provides a powerful range of comparisons to organizations tracking anomalous data from their ML models.
When validations are added to a pipeline, inference request outputs return the following fields:
Field | Type | Description |
---|---|---|
anomaly.count | Integer | The total of all validations that returned True. |
anomaly.{validation name} | Bool | The output of the validation {validation_name} . |
When validation returns True
, an anomaly is detected.
For example, adding the validation fraud
to the following pipeline returns anomaly.count
of 1
when the validation fraud
returns True
. The validation fraud
returns True
when the output field dense_1 at index 0 is greater than 0.9.
sample_pipeline = wallaroo.client.build_pipeline("sample-pipeline")
sample_pipeline.add_model_step(model)
# add the validation
sample_pipeline.add_validations(
fraud=pl.col("out.dense_1").list.get(0) > 0.9,
)
# deploy the pipeline
sample_pipeline.deploy()
# sample inference
display(sample_pipeline.infer_from_file("dev_high_fraud.json", data_format='pandas-records'))
time | in.tensor | out.dense_1 | anomaly.count | anomaly.fraud | |
---|---|---|---|---|---|
0 | 2024-02-02 16:05:42.152 | [1.0678324729, 18.1555563975, -1.6589551058, 5…] | [0.981199] | 1 | True |
When an inference request is submitted to a Wallaroo pipeline with validations, the following fields are output:
Field | Type | Description |
---|---|---|
anomaly.count | Integer | The total of all validations that returned True. |
anomaly.{validation name} | Bool | The output of each pipeline validation {validation_name} . |
For example, adding the validation fraud
to the following pipeline returns anomaly.count
of 1
when the validation fraud
returns True
.
sample_pipeline = wallaroo.client.build_pipeline("sample-pipeline")
sample_pipeline.add_model_step(model)
# add the validation
sample_pipeline.add_validations(
fraud=pl.col("out.dense_1").list.get(0) > 0.9,
)
# deploy the pipeline
sample_pipeline.deploy()
# sample inference
display(sample_pipeline.infer_from_file("dev_high_fraud.json", data_format='pandas-records'))
time | in.tensor | out.dense_1 | anomaly.count | anomaly.fraud | |
---|---|---|---|---|---|
0 | 2024-02-02 16:05:42.152 | [1.0678324729, 18.1555563975, -1.6589551058, 5…] | [0.981199] | 1 | True |
The following demonstrates how to:
variable
as an float which is the predicted house price.too_high
to detect when a house price exceeds a certain value.too_high
validation returns True
and False
.polars
version 0.18.5. This is installed by default with the Wallaroo SDK.The first step is to import the libraries used in this notebook.
import wallaroo
The next step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. For more information on Wallaroo Client settings, see the Client Connection guide.
wl = wallaroo.Client()
We’ll use the SDK below to create our workspace then assign as our current workspace. The current workspace is used by the Wallaroo SDK for where to upload models, create pipelines, etc. We’ll also set up variables for our models and pipelines down the road, so we have one spot to change names to whatever fits your organization’s standards best.
Before starting, verify that the workspace name is unique in your Wallaroo instance.
workspace_name = 'validation-house-price-demonstration'
pipeline_name = 'validation-demo'
model_name = 'anomaly-housing-model'
model_file_name = './models/rf_model.onnx'
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)
wl.set_current_workspace(workspace)
{'name': 'validation-house-price-demonstration', 'id': 9, 'archived': False, 'created_by': 'fb2916bc-551e-4a76-88e8-0f7d7720a0f9', 'created_at': '2024-07-29T20:21:47.038823+00:00', 'models': [{'name': 'anomaly-housing-model', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2024, 7, 29, 20, 21, 48, 414367, tzinfo=tzutc()), 'created_at': datetime.datetime(2024, 7, 29, 20, 21, 48, 414367, tzinfo=tzutc())}], 'pipelines': [{'name': 'validation-demo', 'create_time': datetime.datetime(2024, 7, 29, 20, 21, 48, 755346, tzinfo=tzutc()), 'definition': '[]'}]}
Upload the model to the Wallaroo workspace with the wallaroo.client.upload_model
method. Our house price ML model is a Wallaroo Default Runtime of type ONNX
, so all we need is the model name, the model file path, and the framework type of wallaroo.framework.Framework.ONNX
.
model = (wl.upload_model(model_name,
model_file_name,
framework=wallaroo.framework.Framework.ONNX)
)
Pipelines are build with the wallaroo.client.build_pipeline
method, which takes the pipeline name. This will create the pipeline in our default workspace. Note that if there are any existing pipelines with the same name in this workspace, this method will retrieve that pipeline for this SDK session.
Once the pipeline is created, we add the ccfraud model as our pipeline step.
sample_pipeline = wl.build_pipeline(pipeline_name)
sample_pipeline.clear()
sample_pipeline = sample_pipeline.add_model_step(model)
Now we add our validation to our new pipeline. We will give it the following configuration.
too_high
out.variable
0
1000000.0
.The polars
library is required for creating the validation. We will import the polars library, then add our validation to the pipeline.
anomaly.count
inference result field.import polars as pl
sample_pipeline = sample_pipeline.add_validations(
too_high=pl.col("out.variable").list.get(0) > 1000000.0
)
The method wallaroo.pipeline.steps()
shows the current pipeline steps. The added validations are in the Check
field. This is used for demonstration purposes to show the added validation to the pipeline.
sample_pipeline.steps()
[{'ModelInference': {'models': [{'name': 'anomaly-housing-model', 'version': 'd340bfca-2152-4a4e-9375-1abebfbcb647', 'sha': 'e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6'}]}},
{'Check': {'tree': ['{"Alias":[{"BinaryExpr":{"left":{"Function":{"input":[{"Column":"out.variable"},{"Literal":{"Int32":0}}],"function":{"ListExpr":"Get"},"options":{"collect_groups":"ApplyFlat","fmt_str":"","input_wildcard_expansion":false,"auto_explode":true,"cast_to_supertypes":false,"allow_rename":false,"pass_name_to_apply":false,"changes_length":false,"check_lengths":true,"allow_group_aware":true}}},"op":"Gt","right":{"Literal":{"Float64":1000000.0}}}},"too_high"]}']}}]
With the pipeline steps set and the validations created, we deploy the pipeline. Because of it’s size, we will only allocate 0.1
cpu from the cluster for the pipeline’s use.
deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
.cpus(0.25)\
.build()
sample_pipeline.deploy(deployment_config=deploy_config)
name | validation-demo |
---|---|
created | 2024-07-29 20:21:48.755346+00:00 |
last_updated | 2024-07-29 20:30:31.307477+00:00 |
deployed | True |
workspace_id | 9 |
workspace_name | validation-house-price-demonstration |
arch | x86 |
accel | none |
tags | |
versions | abb554f1-9d8e-47f5-8f24-ad4d7b9488c5, bae26ed5-972a-4456-8182-a87d5e676ea4, 55f16b02-18ed-4ddb-a4d4-c1f2a1b0f416, b4d30cf4-709f-44bb-a642-c857bda9eb4b, 4abc82f7-f548-4f18-ae50-6b348b81a56d, 58347b92-8ba8-40b1-b42d-8aaa393057bd |
steps | anomaly-housing-model |
published | False |
Two sample inferences are performed with the method wallaroo.pipeline.infer_from_file
that takes either a pandas Record JSON file or an Apache Arrow table as the input.
For our demonstration, we will use the following pandas Record JSON file with the following sample data:
./data/houseprice_5000_data.json
: A sample sets of 5000 houses to generates a range of predicted values.The inference request returns a pandas DataFrame.
Each of the inference outputs will include the following fields:
Field | Type | Description |
---|---|---|
time | DateTime | The DateTime of the inference request. |
in.{input_field_name} | Input Dependent | Each input field submitted is labeled as in.{input_field_name} in the inference request result. For our example, this is tensor , so the input field in the returned inference request is in.tensor . |
out.{model_output_field_name} | Output Dependent | Each field output by the ML model is labeled as out.{model_output_field_name} in the inference request result. For our example, the ccfraud model returns dense_1 as its output field, so the output field in the returned inference request is out.dense_1 . |
anomaly.count | Integer | The total number of validations that returned True . |
**anomaly.{validation_name} | Bool | Each validation added to the pipeline is returned as anomaly.{validation_name} , and returns either True if the validation returns True , indicating an anomaly is found, or False for an anomaly for the validation is not found. For our example, we will have anomaly.fraud returned. |
results = sample_pipeline.infer_from_file('./data/test-1000.df.json')
# first 20 results
display(results.head(20))
# only results that trigger the anomaly too_high
results.loc[results['anomaly.too_high'] == True]
time | in.float_input | out.variable | anomaly.count | anomaly.too_high | |
---|---|---|---|---|---|
0 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,... | [718013.75] | 0 | False |
1 | 2024-07-29 20:31:03.265 | [2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,... | [615094.56] | 0 | False |
2 | 2024-07-29 20:31:03.265 | [3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, ... | [448627.72] | 0 | False |
3 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,... | [758714.2] | 0 | False |
4 | 2024-07-29 20:31:03.265 | [3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.... | [513264.7] | 0 | False |
5 | 2024-07-29 20:31:03.265 | [3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0,... | [668288.0] | 0 | False |
6 | 2024-07-29 20:31:03.265 | [4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0,... | [1004846.5] | 1 | True |
7 | 2024-07-29 20:31:03.265 | [3.0, 2.0, 1280.0, 960.0, 2.0, 0.0, 0.0, 3.0, ... | [684577.2] | 0 | False |
8 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 2820.0, 15000.0, 2.0, 0.0, 0.0, 4.0... | [727898.1] | 0 | False |
9 | 2024-07-29 20:31:03.265 | [3.0, 2.25, 1790.0, 11393.0, 1.0, 0.0, 0.0, 3.... | [559631.1] | 0 | False |
10 | 2024-07-29 20:31:03.265 | [3.0, 1.5, 1010.0, 7683.0, 1.5, 0.0, 0.0, 5.0,... | [340764.53] | 0 | False |
11 | 2024-07-29 20:31:03.265 | [3.0, 2.0, 1270.0, 1323.0, 3.0, 0.0, 0.0, 3.0,... | [442168.06] | 0 | False |
12 | 2024-07-29 20:31:03.265 | [4.0, 1.75, 2070.0, 9120.0, 1.0, 0.0, 0.0, 4.0... | [630865.6] | 0 | False |
13 | 2024-07-29 20:31:03.265 | [4.0, 1.0, 1620.0, 4080.0, 1.5, 0.0, 0.0, 3.0,... | [559631.1] | 0 | False |
14 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 3990.0, 9786.0, 2.0, 0.0, 0.0, 3.0... | [909441.1] | 0 | False |
15 | 2024-07-29 20:31:03.265 | [4.0, 2.0, 1780.0, 19843.0, 1.0, 0.0, 0.0, 3.0... | [313096.0] | 0 | False |
16 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 2130.0, 6003.0, 2.0, 0.0, 0.0, 3.0,... | [404040.8] | 0 | False |
17 | 2024-07-29 20:31:03.265 | [3.0, 1.75, 1660.0, 10440.0, 1.0, 0.0, 0.0, 3.... | [292859.5] | 0 | False |
18 | 2024-07-29 20:31:03.265 | [3.0, 2.5, 2110.0, 4118.0, 2.0, 0.0, 0.0, 3.0,... | [338357.88] | 0 | False |
19 | 2024-07-29 20:31:03.265 | [4.0, 2.25, 2200.0, 11250.0, 1.5, 0.0, 0.0, 5.... | [682284.6] | 0 | False |
time | in.float_input | out.variable | anomaly.count | anomaly.too_high | |
---|---|---|---|---|---|
6 | 2024-07-29 20:31:03.265 | [4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0,... | [1004846.5] | 1 | True |
30 | 2024-07-29 20:31:03.265 | [4.0, 3.0, 3710.0, 20000.0, 2.0, 0.0, 2.0, 5.0... | [1514079.8] | 1 | True |
40 | 2024-07-29 20:31:03.265 | [4.0, 4.5, 5120.0, 41327.0, 2.0, 0.0, 0.0, 3.0... | [1204324.8] | 1 | True |
63 | 2024-07-29 20:31:03.265 | [4.0, 3.0, 4040.0, 19700.0, 2.0, 0.0, 0.0, 3.0... | [1028923.06] | 1 | True |
110 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 3470.0, 20445.0, 2.0, 0.0, 0.0, 4.0... | [1412215.3] | 1 | True |
130 | 2024-07-29 20:31:03.265 | [4.0, 2.75, 2620.0, 13777.0, 1.5, 0.0, 2.0, 4.... | [1223839.1] | 1 | True |
133 | 2024-07-29 20:31:03.265 | [5.0, 2.25, 3320.0, 13138.0, 1.0, 0.0, 2.0, 4.... | [1108000.1] | 1 | True |
154 | 2024-07-29 20:31:03.265 | [4.0, 2.75, 3800.0, 9606.0, 2.0, 0.0, 0.0, 3.0... | [1039781.25] | 1 | True |
160 | 2024-07-29 20:31:03.265 | [5.0, 3.5, 4150.0, 13232.0, 2.0, 0.0, 0.0, 3.0... | [1042119.1] | 1 | True |
210 | 2024-07-29 20:31:03.265 | [4.0, 3.5, 4300.0, 70407.0, 2.0, 0.0, 0.0, 3.0... | [1115275.0] | 1 | True |
239 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 5010.0, 49222.0, 2.0, 0.0, 0.0, 5.... | [1092274.1] | 1 | True |
248 | 2024-07-29 20:31:03.265 | [4.0, 3.75, 4410.0, 8112.0, 3.0, 0.0, 4.0, 3.0... | [1967344.1] | 1 | True |
255 | 2024-07-29 20:31:03.265 | [4.0, 3.0, 4750.0, 21701.0, 1.5, 0.0, 0.0, 5.0... | [2002393.5] | 1 | True |
271 | 2024-07-29 20:31:03.265 | [5.0, 3.25, 5790.0, 13726.0, 2.0, 0.0, 3.0, 3.... | [1189654.4] | 1 | True |
281 | 2024-07-29 20:31:03.265 | [3.0, 3.0, 3570.0, 6250.0, 2.0, 0.0, 2.0, 3.0,... | [1124493.3] | 1 | True |
282 | 2024-07-29 20:31:03.265 | [3.0, 2.75, 3170.0, 34850.0, 1.0, 0.0, 0.0, 5.... | [1227073.8] | 1 | True |
283 | 2024-07-29 20:31:03.265 | [4.0, 2.75, 3260.0, 19542.0, 1.0, 0.0, 0.0, 4.... | [1364650.3] | 1 | True |
285 | 2024-07-29 20:31:03.265 | [4.0, 2.75, 4020.0, 18745.0, 2.0, 0.0, 4.0, 4.... | [1322835.9] | 1 | True |
323 | 2024-07-29 20:31:03.265 | [3.0, 3.0, 2480.0, 5500.0, 2.0, 0.0, 3.0, 3.0,... | [1100884.1] | 1 | True |
351 | 2024-07-29 20:31:03.265 | [5.0, 4.0, 4660.0, 9900.0, 2.0, 0.0, 2.0, 4.0,... | [1058105.0] | 1 | True |
360 | 2024-07-29 20:31:03.265 | [4.0, 3.5, 3770.0, 8501.0, 2.0, 0.0, 0.0, 3.0,... | [1169643.0] | 1 | True |
398 | 2024-07-29 20:31:03.265 | [3.0, 2.25, 2390.0, 7875.0, 1.0, 0.0, 1.0, 3.0... | [1364149.9] | 1 | True |
414 | 2024-07-29 20:31:03.265 | [5.0, 3.5, 5430.0, 10327.0, 2.0, 0.0, 2.0, 3.0... | [1207858.6] | 1 | True |
443 | 2024-07-29 20:31:03.265 | [5.0, 4.0, 4360.0, 8030.0, 2.0, 0.0, 0.0, 3.0,... | [1160512.8] | 1 | True |
497 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 4090.0, 11225.0, 2.0, 0.0, 0.0, 3.0... | [1048372.4] | 1 | True |
513 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 3320.0, 8587.0, 3.0, 0.0, 0.0, 3.0... | [1130661.0] | 1 | True |
520 | 2024-07-29 20:31:03.265 | [5.0, 3.75, 4170.0, 8142.0, 2.0, 0.0, 2.0, 3.0... | [1098628.8] | 1 | True |
530 | 2024-07-29 20:31:03.265 | [4.0, 4.25, 3500.0, 8750.0, 1.0, 0.0, 4.0, 5.0... | [1140733.8] | 1 | True |
535 | 2024-07-29 20:31:03.265 | [4.0, 3.5, 4460.0, 16271.0, 2.0, 0.0, 2.0, 3.0... | [1208638.0] | 1 | True |
556 | 2024-07-29 20:31:03.265 | [4.0, 3.5, 4285.0, 9567.0, 2.0, 0.0, 1.0, 5.0,... | [1886959.4] | 1 | True |
623 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 4240.0, 25639.0, 2.0, 0.0, 3.0, 3.... | [1156651.3] | 1 | True |
624 | 2024-07-29 20:31:03.265 | [4.0, 3.5, 3440.0, 9776.0, 2.0, 0.0, 0.0, 3.0,... | [1124493.3] | 1 | True |
634 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 4700.0, 38412.0, 2.0, 0.0, 0.0, 3.... | [1164589.4] | 1 | True |
651 | 2024-07-29 20:31:03.265 | [3.0, 3.0, 3920.0, 13085.0, 2.0, 1.0, 4.0, 4.0... | [1452224.5] | 1 | True |
658 | 2024-07-29 20:31:03.265 | [3.0, 3.25, 3230.0, 7800.0, 2.0, 0.0, 3.0, 3.0... | [1077279.3] | 1 | True |
671 | 2024-07-29 20:31:03.265 | [3.0, 3.5, 3080.0, 6495.0, 2.0, 0.0, 3.0, 3.0,... | [1122811.8] | 1 | True |
685 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 4200.0, 35267.0, 2.0, 0.0, 0.0, 3.0... | [1181336.0] | 1 | True |
686 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 4160.0, 47480.0, 2.0, 0.0, 0.0, 3.... | [1082353.3] | 1 | True |
698 | 2024-07-29 20:31:03.265 | [4.0, 4.5, 5770.0, 10050.0, 1.0, 0.0, 3.0, 5.0... | [1689843.3] | 1 | True |
711 | 2024-07-29 20:31:03.265 | [3.0, 2.5, 5403.0, 24069.0, 2.0, 1.0, 4.0, 4.0... | [1946437.3] | 1 | True |
720 | 2024-07-29 20:31:03.265 | [5.0, 3.0, 3420.0, 18129.0, 2.0, 0.0, 0.0, 3.0... | [1325961.0] | 1 | True |
722 | 2024-07-29 20:31:03.265 | [3.0, 3.25, 4560.0, 13363.0, 1.0, 0.0, 4.0, 3.... | [2005883.1] | 1 | True |
726 | 2024-07-29 20:31:03.265 | [5.0, 3.5, 4200.0, 5400.0, 2.0, 0.0, 0.0, 3.0,... | [1052898.0] | 1 | True |
737 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 2980.0, 7000.0, 2.0, 0.0, 3.0, 3.0... | [1156206.5] | 1 | True |
740 | 2024-07-29 20:31:03.265 | [4.0, 4.5, 6380.0, 88714.0, 2.0, 0.0, 0.0, 3.0... | [1355747.1] | 1 | True |
782 | 2024-07-29 20:31:03.265 | [5.0, 4.25, 4860.0, 9453.0, 1.5, 0.0, 1.0, 5.0... | [1910823.8] | 1 | True |
798 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 2790.0, 5450.0, 2.0, 0.0, 0.0, 3.0,... | [1097757.4] | 1 | True |
818 | 2024-07-29 20:31:03.265 | [4.0, 4.0, 4620.0, 130208.0, 2.0, 0.0, 0.0, 3.... | [1164589.4] | 1 | True |
827 | 2024-07-29 20:31:03.265 | [4.0, 2.5, 3340.0, 10422.0, 2.0, 0.0, 0.0, 3.0... | [1103101.4] | 1 | True |
828 | 2024-07-29 20:31:03.265 | [5.0, 3.5, 3760.0, 10207.0, 2.0, 0.0, 0.0, 3.0... | [1489624.5] | 1 | True |
901 | 2024-07-29 20:31:03.265 | [4.0, 2.25, 4470.0, 60373.0, 2.0, 0.0, 0.0, 3.... | [1208638.0] | 1 | True |
912 | 2024-07-29 20:31:03.265 | [3.0, 2.25, 2960.0, 8330.0, 1.0, 0.0, 3.0, 4.0... | [1178314.0] | 1 | True |
919 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 5180.0, 19850.0, 2.0, 0.0, 3.0, 3.... | [1295531.3] | 1 | True |
941 | 2024-07-29 20:31:03.265 | [4.0, 3.75, 3770.0, 4000.0, 2.5, 0.0, 0.0, 5.0... | [1182821.0] | 1 | True |
965 | 2024-07-29 20:31:03.265 | [6.0, 4.0, 5310.0, 12741.0, 2.0, 0.0, 2.0, 3.0... | [2016006.0] | 1 | True |
973 | 2024-07-29 20:31:03.265 | [5.0, 2.0, 3540.0, 9970.0, 2.0, 0.0, 3.0, 3.0,... | [1085835.8] | 1 | True |
997 | 2024-07-29 20:31:03.265 | [4.0, 3.25, 2910.0, 1880.0, 2.0, 0.0, 3.0, 5.0... | [1060847.5] | 1 | True |
The following are additional examples of validations.
The following uses multiple validations to check for anomalies. We still use fraud
which detects outputs that are greater than 1000000.0
. The second validation too_low
triggers an anomaly when the out.variable
is under 250000.0
.
After the validations are added, the pipeline is redeployed to “set” them.
sample_pipeline = sample_pipeline.add_validations(
too_low=pl.col("out.variable").list.get(0) < 250000.0
)
deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
.cpus(0.1)\
.build()
sample_pipeline.undeploy()
sample_pipeline.deploy(deployment_config=deploy_config)
name | validation-demo |
---|---|
created | 2024-07-29 20:21:48.755346+00:00 |
last_updated | 2024-07-29 20:31:52.145946+00:00 |
deployed | True |
workspace_id | 9 |
workspace_name | validation-house-price-demonstration |
arch | x86 |
accel | none |
tags | |
versions | bc5edde2-0e3e-46bd-aedf-7fc974ec6845, abb554f1-9d8e-47f5-8f24-ad4d7b9488c5, bae26ed5-972a-4456-8182-a87d5e676ea4, 55f16b02-18ed-4ddb-a4d4-c1f2a1b0f416, b4d30cf4-709f-44bb-a642-c857bda9eb4b, 4abc82f7-f548-4f18-ae50-6b348b81a56d, 58347b92-8ba8-40b1-b42d-8aaa393057bd |
steps | anomaly-housing-model |
published | False |
results = sample_pipeline.infer_from_file('./data/test-1000.df.json')
# first 20 results
display(results.head(20))
# only results that trigger the anomaly too_high
results.loc[results['anomaly.too_high'] == True]
# only results that trigger the anomaly too_low
results.loc[results['anomaly.too_low'] == True]
time | in.float_input | out.variable | anomaly.count | anomaly.too_high | anomaly.too_low | |
---|---|---|---|---|---|---|
0 | 2024-07-29 20:32:31.437 | [4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,... | [718013.75] | 0 | False | False |
1 | 2024-07-29 20:32:31.437 | [2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,... | [615094.56] | 0 | False | False |
2 | 2024-07-29 20:32:31.437 | [3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, ... | [448627.72] | 0 | False | False |
3 | 2024-07-29 20:32:31.437 | [4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,... | [758714.2] | 0 | False | False |
4 | 2024-07-29 20:32:31.437 | [3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.... | [513264.7] | 0 | False | False |
5 | 2024-07-29 20:32:31.437 | [3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0,... | [668288.0] | 0 | False | False |
6 | 2024-07-29 20:32:31.437 | [4.0, 3.5, 3590.0, 5334.0, 2.0, 0.0, 2.0, 3.0,... | [1004846.5] | 1 | True | False |
7 | 2024-07-29 20:32:31.437 | [3.0, 2.0, 1280.0, 960.0, 2.0, 0.0, 0.0, 3.0, ... | [684577.2] | 0 | False | False |
8 | 2024-07-29 20:32:31.437 | [4.0, 2.5, 2820.0, 15000.0, 2.0, 0.0, 0.0, 4.0... | [727898.1] | 0 | False | False |
9 | 2024-07-29 20:32:31.437 | [3.0, 2.25, 1790.0, 11393.0, 1.0, 0.0, 0.0, 3.... | [559631.1] | 0 | False | False |
10 | 2024-07-29 20:32:31.437 | [3.0, 1.5, 1010.0, 7683.0, 1.5, 0.0, 0.0, 5.0,... | [340764.53] | 0 | False | False |
11 | 2024-07-29 20:32:31.437 | [3.0, 2.0, 1270.0, 1323.0, 3.0, 0.0, 0.0, 3.0,... | [442168.06] | 0 | False | False |
12 | 2024-07-29 20:32:31.437 | [4.0, 1.75, 2070.0, 9120.0, 1.0, 0.0, 0.0, 4.0... | [630865.6] | 0 | False | False |
13 | 2024-07-29 20:32:31.437 | [4.0, 1.0, 1620.0, 4080.0, 1.5, 0.0, 0.0, 3.0,... | [559631.1] | 0 | False | False |
14 | 2024-07-29 20:32:31.437 | [4.0, 3.25, 3990.0, 9786.0, 2.0, 0.0, 0.0, 3.0... | [909441.1] | 0 | False | False |
15 | 2024-07-29 20:32:31.437 | [4.0, 2.0, 1780.0, 19843.0, 1.0, 0.0, 0.0, 3.0... | [313096.0] | 0 | False | False |
16 | 2024-07-29 20:32:31.437 | [4.0, 2.5, 2130.0, 6003.0, 2.0, 0.0, 0.0, 3.0,... | [404040.8] | 0 | False | False |
17 | 2024-07-29 20:32:31.437 | [3.0, 1.75, 1660.0, 10440.0, 1.0, 0.0, 0.0, 3.... | [292859.5] | 0 | False | False |
18 | 2024-07-29 20:32:31.437 | [3.0, 2.5, 2110.0, 4118.0, 2.0, 0.0, 0.0, 3.0,... | [338357.88] | 0 | False | False |
19 | 2024-07-29 20:32:31.437 | [4.0, 2.25, 2200.0, 11250.0, 1.5, 0.0, 0.0, 5.... | [682284.6] | 0 | False | False |
time | in.float_input | out.variable | anomaly.count | anomaly.too_high | anomaly.too_low | |
---|---|---|---|---|---|---|
21 | 2024-07-29 20:32:31.437 | [2.0, 2.0, 1390.0, 1302.0, 2.0, 0.0, 0.0, 3.0,... | [249227.8] | 1 | False | True |
69 | 2024-07-29 20:32:31.437 | [3.0, 1.75, 1050.0, 9871.0, 1.0, 0.0, 0.0, 5.0... | [236238.66] | 1 | False | True |
83 | 2024-07-29 20:32:31.437 | [3.0, 1.75, 1070.0, 8100.0, 1.0, 0.0, 0.0, 4.0... | [236238.66] | 1 | False | True |
95 | 2024-07-29 20:32:31.437 | [3.0, 2.5, 1340.0, 3011.0, 2.0, 0.0, 0.0, 3.0,... | [244380.27] | 1 | False | True |
124 | 2024-07-29 20:32:31.437 | [4.0, 1.5, 1200.0, 10890.0, 1.0, 0.0, 0.0, 5.0... | [241330.19] | 1 | False | True |
... | ... | ... | ... | ... | ... | ... |
939 | 2024-07-29 20:32:31.437 | [3.0, 1.0, 1150.0, 4800.0, 1.5, 0.0, 0.0, 4.0,... | [240834.92] | 1 | False | True |
946 | 2024-07-29 20:32:31.437 | [2.0, 1.0, 780.0, 6250.0, 1.0, 0.0, 0.0, 3.0, ... | [236815.78] | 1 | False | True |
948 | 2024-07-29 20:32:31.437 | [1.0, 1.0, 620.0, 8261.0, 1.0, 0.0, 0.0, 3.0, ... | [236815.78] | 1 | False | True |
962 | 2024-07-29 20:32:31.437 | [3.0, 1.0, 1190.0, 7500.0, 1.0, 0.0, 0.0, 5.0,... | [241330.19] | 1 | False | True |
991 | 2024-07-29 20:32:31.437 | [2.0, 1.0, 870.0, 8487.0, 1.0, 0.0, 0.0, 4.0, ... | [236238.66] | 1 | False | True |
62 rows × 6 columns
The following combines multiple field checks into a single validation. For this, we will check for values of out.variable
that are between 500000 and 1000000.
Each expression is separated by ()
. For example:
pl.col("out.variable").list.get(0) < 1000000.0
pl.col("out.variable").list.get(0) > 500000.0
(pl.col("out.variable").list.get(0) < 1000000.0) & (pl.col("out.variable").list.get(0) > 500000.0)
sample_pipeline = sample_pipeline.add_validations(
in_between=(pl.col("out.variable").list.get(0) < 1000000.0) & (pl.col("out.variable").list.get(0) > 500000.0)
)
deploy_config = wallaroo.deployment_config.DeploymentConfigBuilder() \
.cpus(0.1)\
.build()
sample_pipeline.undeploy()
sample_pipeline.deploy(deployment_config=deploy_config)
name | validation-demo |
---|---|
created | 2024-07-29 20:21:48.755346+00:00 |
last_updated | 2024-07-29 20:33:20.980095+00:00 |
deployed | True |
workspace_id | 9 |
workspace_name | validation-house-price-demonstration |
arch | x86 |
accel | none |
tags | |
versions | d9c8f3f8-3418-48df-88a8-cf1ad812af70, bc5edde2-0e3e-46bd-aedf-7fc974ec6845, abb554f1-9d8e-47f5-8f24-ad4d7b9488c5, bae26ed5-972a-4456-8182-a87d5e676ea4, 55f16b02-18ed-4ddb-a4d4-c1f2a1b0f416, b4d30cf4-709f-44bb-a642-c857bda9eb4b, 4abc82f7-f548-4f18-ae50-6b348b81a56d, 58347b92-8ba8-40b1-b42d-8aaa393057bd |
steps | anomaly-housing-model |
published | False |
results = sample_pipeline.infer_from_file('./data/test-1000.df.json')
results.loc[results['anomaly.in_between'] == True]
time | in.float_input | out.variable | anomaly.count | anomaly.in_between | anomaly.too_high | anomaly.too_low | |
---|---|---|---|---|---|---|---|
0 | 2024-07-29 20:33:57.296 | [4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,... | [718013.75] | 1 | True | False | False |
1 | 2024-07-29 20:33:57.296 | [2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,... | [615094.56] | 1 | True | False | False |
3 | 2024-07-29 20:33:57.296 | [4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,... | [758714.2] | 1 | True | False | False |
4 | 2024-07-29 20:33:57.296 | [3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.... | [513264.7] | 1 | True | False | False |
5 | 2024-07-29 20:33:57.296 | [3.0, 2.0, 2140.0, 4923.0, 1.0, 0.0, 0.0, 4.0,... | [668288.0] | 1 | True | False | False |
... | ... | ... | ... | ... | ... | ... | ... |
989 | 2024-07-29 20:33:57.296 | [4.0, 2.75, 2500.0, 4950.0, 2.0, 0.0, 0.0, 3.0... | [700271.56] | 1 | True | False | False |
993 | 2024-07-29 20:33:57.296 | [3.0, 2.5, 2140.0, 8925.0, 2.0, 0.0, 0.0, 3.0,... | [669645.5] | 1 | True | False | False |
995 | 2024-07-29 20:33:57.296 | [3.0, 2.5, 2900.0, 23550.0, 1.0, 0.0, 0.0, 3.0... | [827411.0] | 1 | True | False | False |
998 | 2024-07-29 20:33:57.296 | [3.0, 1.75, 2910.0, 37461.0, 1.0, 0.0, 0.0, 4.... | [706823.56] | 1 | True | False | False |
999 | 2024-07-29 20:33:57.296 | [3.0, 2.0, 2005.0, 7000.0, 1.0, 0.0, 0.0, 3.0,... | [581003.0] | 1 | True | False | False |
395 rows × 7 columns
Wallaroo inference requests allow datasets to be excluded or included with the dataset_exclude
and dataset
parameters.
Parameter | Type | Description |
---|---|---|
dataset_exclude | List(String) | The list of datasets to exclude. Values include:
|
dataset | List(String) | The list of datasets and fields to include. |
For our example, we will exclude the anomaly
dataset, but include the datasets 'time'
, 'in'
, 'out'
, 'anomaly.count'
. Note that while we exclude anomaly
, we override that with by setting the anomaly field 'anomaly.count'
in our dataset
parameter.
sample_pipeline.infer_from_file('./data/test-1000.df.json',
dataset_exclude=['anomaly'],
dataset=['time', 'in', 'out', 'anomaly.count']
)
time | in.float_input | out.variable | anomaly.count | |
---|---|---|---|---|
0 | 2024-07-29 20:33:59.274 | [4.0, 2.5, 2900.0, 5505.0, 2.0, 0.0, 0.0, 3.0,... | [718013.75] | 1 |
1 | 2024-07-29 20:33:59.274 | [2.0, 2.5, 2170.0, 6361.0, 1.0, 0.0, 2.0, 3.0,... | [615094.56] | 1 |
2 | 2024-07-29 20:33:59.274 | [3.0, 2.5, 1300.0, 812.0, 2.0, 0.0, 0.0, 3.0, ... | [448627.72] | 0 |
3 | 2024-07-29 20:33:59.274 | [4.0, 2.5, 2500.0, 8540.0, 2.0, 0.0, 0.0, 3.0,... | [758714.2] | 1 |
4 | 2024-07-29 20:33:59.274 | [3.0, 1.75, 2200.0, 11520.0, 1.0, 0.0, 0.0, 4.... | [513264.7] | 1 |
... | ... | ... | ... | ... |
995 | 2024-07-29 20:33:59.274 | [3.0, 2.5, 2900.0, 23550.0, 1.0, 0.0, 0.0, 3.0... | [827411.0] | 1 |
996 | 2024-07-29 20:33:59.274 | [4.0, 1.75, 2700.0, 7875.0, 1.5, 0.0, 0.0, 4.0... | [441960.38] | 0 |
997 | 2024-07-29 20:33:59.274 | [4.0, 3.25, 2910.0, 1880.0, 2.0, 0.0, 3.0, 5.0... | [1060847.5] | 1 |
998 | 2024-07-29 20:33:59.274 | [3.0, 1.75, 2910.0, 37461.0, 1.0, 0.0, 0.0, 4.... | [706823.56] | 1 |
999 | 2024-07-29 20:33:59.274 | [3.0, 2.0, 2005.0, 7000.0, 1.0, 0.0, 0.0, 3.0,... | [581003.0] | 1 |
1000 rows × 4 columns
With the demonstration complete, we undeploy the pipeline and return the resources back to the cluster.
sample_pipeline.undeploy()
name | validation-demo |
---|---|
created | 2024-07-29 20:21:48.755346+00:00 |
last_updated | 2024-07-29 20:33:20.980095+00:00 |
deployed | False |
workspace_id | 9 |
workspace_name | validation-house-price-demonstration |
arch | x86 |
accel | none |
tags | |
versions | d9c8f3f8-3418-48df-88a8-cf1ad812af70, bc5edde2-0e3e-46bd-aedf-7fc974ec6845, abb554f1-9d8e-47f5-8f24-ad4d7b9488c5, bae26ed5-972a-4456-8182-a87d5e676ea4, 55f16b02-18ed-4ddb-a4d4-c1f2a1b0f416, b4d30cf4-709f-44bb-a642-c857bda9eb4b, 4abc82f7-f548-4f18-ae50-6b348b81a56d, 58347b92-8ba8-40b1-b42d-8aaa393057bd |
steps | anomaly-housing-model |
published | False |