Wallaroo SDK Essentials Guide: Anomaly Detection
Table of Contents
Add Validations Method
Wallaroo provides validations to detect anomalous data from inference inputs and outputs.
Validations are added to a Wallaroo pipeline with the wallaroo.pipeline.add_validations
method.
Adding validations to a pipeline takes the format:
pipeline.add_validations(
validation_name_01 = polars.Expr,
validation_name_02 = polars.Expr
...{additional validations}
)
Polars expressions are in the polars
library version 0.18.5 Expression Python library.
Add Validations Method Parameters
wallaroo.pipeline.add_validations
takes the following parameters.
Field | Type | Description |
---|---|---|
{validation name} | Python variable | The name of the validation. This must match Python variable naming conventions. The {validation name} may not be count . Any validations submitted with the name count are ignored and an warning returned. Other validations part of the add_validations request are added to the pipeline. |
{expression} | Polars Expression | The expression to validate the inference input or output data against. Must be in polars version 1.8.5 polars.Expr format. Expressions are typically set in the format below. |
Expressions are typically in the following format.
Validation Section | Description | Example |
---|---|---|
Data to Evaluate | The data to evaluate from either an inference input or output. | polars.col(in/out.{column_name}).list.get({index}) Retrieve the data from the input or output at the column name at the list index. |
Condition | The expression to perform on the data. If it returns True , then an anomaly is detected. | < 0.90 If the data is less than 0.90 , return True . |
Add Validations Method Successful Returns
N/A: Nothing is returned on a successful add_validations
request.
Add Validations Method Warning Returns
If any validations violate a warning condition, a warning is returned. Warning conditions include the following.
Warning | Cause | Result |
---|---|---|
count is not allowed. | A validation named count was included in the add_validations request. count is a reserved name. | All other validations other than count are added to the pipeline. |
Validation Examples
Common Data Selection Expressions
The following sample expressions demonstrate different methods of selecting which model input or output data to validate.
polars.col(in|out.{column_name}).list.get(index)
: Returns the index of a specific field. For example,pl.col("out.dense_1")
returns from the inference the output the field dense_1, andlist.get(0)
returns the first value in that list. Most output values from a Wallaroo inference result are a List of at least length 1, making this a common validation expression.polars.col(in.price_ranges).list.max()
: Returns from the inference request the input field price_ranges the maximum value from a list of values.polars.col(out.price_ranges).mean()
returns the mean for all values from the output fieldprice_ranges
.
For example, to the following validation fraud
detects values for the output of an inference request for the field dense_1
that are greater than 0.9, indicating a transaction has a high likelihood of fraud:
import polars as pl
pipeline.add_validations(
fraud = pl.col("out.dense_1").list.get(0) > 0.9
)
The following inference output shows the detected anomaly from an inference output:
time | in.tensor | out.dense_1 | anomaly.count | anomaly.fraud | |
---|---|---|---|---|---|
0 | 2024-02-02 16:05:42.152 | [1.0678324729, 18.1555563975, -1.6589551058, 5… | [0.981199] | 1 | True |
Detecting Input Anomalies
The following validation tests the inputs from sales figures for a week’s worth of sales:
week | site_id | sales_count | |
---|---|---|---|
0 | [28] | [site0001] | [1357, 1247, 350, 1437, 952, 757, 1831] |
To validate that any sales figure does not go below 500 units, the validation is:
import polars as pl
pipeline.add_validations(
minimum_sales=pl.col("in.sales_count").list.min() < 500
)
pipeline.deploy()
pipeline.infer_from_file(previous_week_sales)
For the input provided, the minimum_sales
validation would return True
, indicating an anomaly.
time | out.predicted_sales | anomaly.count | anomaly.minimum_sales | |
---|---|---|---|---|
0 | 2023-10-31 16:57:13.771 | [1527] | 1 | True |
Detecting Output Anomalies
The following validation detects an anomaly from a output.
fraud
: Detects when an inference output for the fielddense_1
at index0
is greater than 0.9, indicating fraud.
# create the pipeline
sample_pipeline = wallaroo.client.build_pipeline("sample-pipeline")
# add a model step
sample_pipeline.add_model_step(ccfraud_model)
# add validations to the pipeline
sample_pipeline.add_validations(
fraud=pl.col("out.dense_1").list.get(0) > 0.9
)
sample_pipeline.deploy()
sample_pipeline.infer_from_file("dev_high_fraud.json")
time | in.tensor | out.dense_1 | anomaly.count | anomaly.fraud | |
---|---|---|---|---|---|
0 | 2024-02-02 16:05:42.152 | [1.0678324729, 18.1555563975, -1.6589551058, 5... | [0.981199] | 1 | True |
Multiple Validations
The following demonstrates multiple validations added to a pipeline at once and their results from inference requests. Two validations that track the same output field and index are applied to a pipeline:
fraud
: Detects an anomaly when the inference output fielddense_1
at index0
value is greater than0.9
.too_low
: Detects an anomaly when the inference output fielddense_1
at the index0
value is lower than0.05
.
sample_pipeline.add_validations(
fraud=pl.col("out.dense_1").list.get(0) > 0.9,
too_low=pl.col("out.dense_1").list.get(0) < 0.05
)
Two separate inferences where the output of the first is over 0.9
and the second is under 0.05
would be the following.
sample_pipeline.infer_from_file("high_fraud_example.json")
time | in.tensor | out.dense_1 | anomaly.count | anomaly.fraud | anomaly.too_low | |
---|---|---|---|---|---|---|
0 | 2024-02-02 16:05:42.152 | [1.0678324729, 18.1555563975, -1.6589551058, 5… | [0.981199] | 1 | True | False |
sample_pipeline.infer_from_file("low_fraud_example.json")
time | in.tensor | out.dense_1 | anomaly.count | anomaly.fraud | anomaly.too_low | |
---|---|---|---|---|---|---|
0 | 2024-02-02 16:05:38.452 | [1.0678324729, 0.2177810266, -1.7115145262, 0…. | [0.0014974177] | 1 | False | True |
The following example tracks two validations for a model that takes the previous week’s sales and projects the next week’s average sales with the field predicted_sales
.
minimum_sales=pl.col("in.sales_count").list.min() < 500
: The input fieldsales_count
with a range of values has any minimum value under500
.average_sales_too_low=pl.col("out.predicted_sales").list.get(0) < 500
: The output fieldpredicted_sales
is less than500
.
The following inputs return the following values. Note how the anomaly.count
value changes by the number of validations that detect an anomaly.
Input 1:
In this example, one day had sales under 500
, which triggers the minimum_sales
validation to return True
. The predicted sales are above 500
, causing the average_sales_too_low
validation to return False.
week | site_id | sales_count | |
---|---|---|---|
0 | [28] | [site0001] | [1357, 1247, 350, 1437, 952, 757, 1831] |
Output 1:
time | out.predicted_sales | anomaly.count | anomaly.minimum_sales | anomaly.average_sales_too_low | |
---|---|---|---|---|---|
0 | 2023-10-31 16:57:13.771 | [1527] | 1 | True | False |
Input 2:
In this example, multiple days have sales under 500
, which triggers the minimum_sales
validation to return True
. The predicted average sales for the next week are above 500
, causing the average_sales_too_low
validation to return True
.
week | site_id | sales_count | |
---|---|---|---|
0 | [29] | [site0001] | [497, 617, 350, 200, 150, 400, 110] |
Output 2:
time | out.predicted_sales | anomaly.count | anomaly.minimum_sales | anomaly.average_sales_too_low | |
---|---|---|---|---|---|
0 | 2023-10-31 16:57:13.771 | [325] | 2 | True | True |
Input 3:
In this example, no sales day figures are below 500, which triggers the minimum_sales
validation to return False
. The predicted sales for the next week is below 500
, causing the average_sales_too_low
validation to return True
.
week | site_id | sales_count | |
---|---|---|---|
0 | [30] | [site0001] | [617, 525, 513, 517, 622, 757, 508] |
Output 3:
time | out.predicted_sales | anomaly.count | anomaly.minimum_sales | anomaly.average_sales_too_low | |
---|---|---|---|---|---|
0 | 2023-10-31 16:57:13.771 | [497] | 1 | False | True |
Compound Validations
The following combines multiple field checks into a single validation. For this, we will check for values of out.dense_1
that are between 0.05 and 0.9.
Each expression is separated by ()
. For example:
- Expression 1:
pl.col("out.dense_1").list.get(0) < 0.9
- Expression 2:
pl.col("out.dense_1").list.get(0) > 0.001
- Compound Expression:
(pl.col("out.dense_1").list.get(0) < 0.9) & (pl.col("out.dense_1").list.get(0) > 0.001)
sample_pipeline = sample_pipeline.add_validations(
in_between_2=(pl.col("out.dense_1").list.get(0) < 0.9) & (pl.col("out.dense_1").list.get(0) > 0.001)
)
results = sample_pipeline.infer_from_file("./data/cc_data_1k.df.json")
results.loc[results['anomaly.in_between_2'] == True]
time | in.dense_input | out.dense_1 | anomaly.count | anomaly.fraud | anomaly.in_between_2 | anomaly.too_low | |
---|---|---|---|---|---|---|---|
4 | 2024-02-08 17:48:49.305 | [0.5817662108, 0.097881551, 0.1546819424, 0.47... | [0.0010916889] | 1 | False | True | False |
7 | 2024-02-08 17:48:49.305 | [1.0379636346, -0.152987302, -1.0912561862, -0... | [0.0011294782] | 1 | False | True | False |
8 | 2024-02-08 17:48:49.305 | [0.1517283662, 0.6589966337, -0.3323713647, 0.... | [0.0018743575] | 1 | False | True | False |
9 | 2024-02-08 17:48:49.305 | [-0.1683100246, 0.7070470317, 0.1875234948, -0... | [0.0011520088] | 1 | False | True | False |
10 | 2024-02-08 17:48:49.305 | [0.6066235674, 0.0631839305, -0.0802961973, 0.... | [0.0016568303] | 1 | False | True | False |
... | ... | ... | ... | ... | ... | ... | ... |
982 | 2024-02-08 17:48:49.305 | [-0.0932906169, 0.2837744937, -0.061094265, 0.... | [0.0010192394] | 1 | False | True | False |
983 | 2024-02-08 17:48:49.305 | [0.0991458877, 0.5813808183, -0.3863062246, -0... | [0.0020678043] | 1 | False | True | False |
992 | 2024-02-08 17:48:49.305 | [1.0458395446, 0.2492453605, -1.5260449285, 0.... | [0.0013128221] | 1 | False | True | False |
998 | 2024-02-08 17:48:49.305 | [1.0046377125, 0.0343666504, -1.3512533246, 0.... | [0.0011070371] | 1 | False | True | False |
1000 | 2024-02-08 17:48:49.305 | [0.6118805301, 0.1726081102, 0.4310545502, 0.5... | [0.0012498498] | 1 | False | True | False |