Wallaroo SDK Essentials Guide: Anomaly Detection

How to detect model input and output anomalies via the Wallaroo SDK.

Add Validations Method

Wallaroo provides validations to detect anomalous data from inference inputs and outputs.

Validations are added to a Wallaroo pipeline with the wallaroo.pipeline.add_validations method.

Adding validations to a pipeline takes the format:

pipeline.add_validations(
    validation_name_01 = polars.Expr,
    validation_name_02 = polars.Expr
    ...{additional validations}
)

Polars expressions are in the polars library version 0.18.5 Expression Python library.

Add Validations Method Parameters

wallaroo.pipeline.add_validations takes the following parameters.

FieldTypeDescription
{validation name}Python variableThe name of the validation. This must match Python variable naming conventions. The {validation name} may not be count. Any validations submitted with the name count are ignored and an warning returned. Other validations part of the add_validations request are added to the pipeline.
{expression}Polars ExpressionThe expression to validate the inference input or output data against. Must be in polars version 1.8.5 polars.Expr format. Expressions are typically set in the format below.

Expressions are typically in the following format.

Validation SectionDescriptionExample
Data to EvaluateThe data to evaluate from either an inference input or output.polars.col(in/out.{column_name}).list.get({index}) Retrieve the data from the input or output at the column name at the list index.
ConditionThe expression to perform on the data. If it returns True, then an anomaly is detected.< 0.90 If the data is less than 0.90, return True.

Add Validations Method Successful Returns

N/A: Nothing is returned on a successful add_validations request.

Add Validations Method Warning Returns

If any validations violate a warning condition, a warning is returned. Warning conditions include the following.

WarningCauseResult
count is not allowed.A validation named count was included in the add_validations request. count is a reserved name.All other validations other than count are added to the pipeline.

Validation Examples

Common Data Selection Expressions

The following sample expressions demonstrate different methods of selecting which model input or output data to validate.

  • polars.col(in|out.{column_name}).list.get(index): Returns the index of a specific field. For example, pl.col("out.dense_1") returns from the inference the output the field dense_1, and list.get(0) returns the first value in that list. Most output values from a Wallaroo inference result are a List of at least length 1, making this a common validation expression.
  • polars.col(in.price_ranges).list.max(): Returns from the inference request the input field price_ranges the maximum value from a list of values.
  • polars.col(out.price_ranges).mean() returns the mean for all values from the output field price_ranges.

For example, to the following validation fraud detects values for the output of an inference request for the field dense_1 that are greater than 0.9, indicating a transaction has a high likelihood of fraud:

import polars as pl

pipeline.add_validations(
    fraud = fraud=pl.col("out.dense_1").list.get(0) > 0.9
)

The following inference output shows the detected anomaly from an inference output:

 timein.tensorout.dense_1anomaly.countanomaly.fraud
02024-02-02 16:05:42.152[1.0678324729, 18.1555563975, -1.6589551058, 5…[0.981199]1True

Detecting Input Anomalies

The following validation tests the inputs from sales figures for a week’s worth of sales:

 weeksite_idsales_count
0[28][site0001][1357, 1247, 350, 1437, 952, 757, 1831]

To validate that any sales figure does not go below 500 units, the validation is:

import polars as pl

pipeline.add_validations(
    minimum_sales=pl.col("in.sales_count").list.min() < 500
)

pipeline.deploy()

pipeline.infer_from_file(previous_week_sales)

For the input provided, the minimum_sales validation would return True, indicating an anomaly.

 timeout.predicted_salesanomaly.countanomaly.minimum_sales
02023-10-31 16:57:13.771[1527]1True

Detecting Output Anomalies

The following validation detects an anomaly from a output.

  • fraud: Detects when an inference output for the field dense_1 at index 0 is greater than 0.9, indicating fraud.
# create the pipeline
sample_pipeline = wallaroo.client.build_pipeline("sample-pipeline")

# add a model step
sample_pipeline.add_model_step(ccfraud_model)

# add validations to the pipeline
sample_pipeline.add_validations(
    fraud=pl.col("out.dense_1").list.get(0) > 0.9
    )
sample_pipeline.deploy()

sample_pipeline.infer_from_file("dev_high_fraud.json")
timein.tensorout.dense_1anomaly.countanomaly.fraud
02024-02-02 16:05:42.152[1.0678324729, 18.1555563975, -1.6589551058, 5...[0.981199]1True

Multiple Validations

The following demonstrates multiple validations added to a pipeline at once and their results from inference requests. Two validations that track the same output field and index are applied to a pipeline:

  • fraud: Detects an anomaly when the inference output field dense_1 at index 0 value is greater than 0.9.
  • too_low: Detects an anomaly when the inference output field dense_1 at the index 0 value is lower than 0.05.
sample_pipeline.add_validations(
    fraud=pl.col("out.dense_1").list.get(0) > 0.9,
    too_low=pl.col("out.dense_1").list.get(0) < 0.05
    )

Two separate inferences where the output of the first is over 0.9 and the second is under 0.05 would be the following.

sample_pipeline.infer_from_file("high_fraud_example.json")
 timein.tensorout.dense_1anomaly.countanomaly.fraudanomaly.too_low
02024-02-02 16:05:42.152[1.0678324729, 18.1555563975, -1.6589551058, 5…[0.981199]1TrueFalse
sample_pipeline.infer_from_file("low_fraud_example.json")
 timein.tensorout.dense_1anomaly.countanomaly.fraudanomaly.too_low
02024-02-02 16:05:38.452[1.0678324729, 0.2177810266, -1.7115145262, 0….[0.0014974177]1FalseTrue

The following example tracks two validations for a model that takes the previous week’s sales and projects the next week’s average sales with the field predicted_sales.

  • minimum_sales=pl.col("in.sales_count").list.min() < 500: The input field sales_count with a range of values has any minimum value under 500.
  • average_sales_too_low=pl.col("out.predicted_sales").list.get(0) < 500: The output field predicted_sales is less than 500.

The following inputs return the following values. Note how the anomaly.count value changes by the number of validations that detect an anomaly.

Input 1:

In this example, one day had sales under 500, which triggers the minimum_sales validation to return True. The predicted sales are above 500, causing the average_sales_too_low validation to return False.

 weeksite_idsales_count
0[28][site0001][1357, 1247, 350, 1437, 952, 757, 1831]

Output 1:

 timeout.predicted_salesanomaly.countanomaly.minimum_salesanomaly.average_sales_too_low
02023-10-31 16:57:13.771[1527]1TrueFalse

Input 2:

In this example, multiple days have sales under 500, which triggers the minimum_sales validation to return True. The predicted average sales for the next week are above 500, causing the average_sales_too_low validation to return True.

 weeksite_idsales_count
0[29][site0001][497, 617, 350, 200, 150, 400, 110]

Output 2:

 timeout.predicted_salesanomaly.countanomaly.minimum_salesanomaly.average_sales_too_low
02023-10-31 16:57:13.771[325]2TrueTrue

Input 3:

In this example, no sales day figures are below 500, which triggers the minimum_sales validation to return False. The predicted sales for the next week is below 500, causing the average_sales_too_low validation to return True.

 weeksite_idsales_count
0[30][site0001][617, 525, 513, 517, 622, 757, 508]

Output 3:

 timeout.predicted_salesanomaly.countanomaly.minimum_salesanomaly.average_sales_too_low
02023-10-31 16:57:13.771[497]1FalseTrue

Compound Validations

The following combines multiple field checks into a single validation. For this, we will check for values of out.dense_1 that are between 0.05 and 0.9.

Each expression is separated by (). For example:

  • Expression 1: pl.col("out.dense_1").list.get(0) < 0.9
  • Expression 2: pl.col("out.dense_1").list.get(0) > 0.001
  • Compound Expression: (pl.col("out.dense_1").list.get(0) < 0.9) & (pl.col("out.dense_1").list.get(0) > 0.001)
sample_pipeline = sample_pipeline.add_validations(
    in_between_2=(pl.col("out.dense_1").list.get(0) < 0.9) & (pl.col("out.dense_1").list.get(0) > 0.001)
)

results = sample_pipeline.infer_from_file("./data/cc_data_1k.df.json")

results.loc[results['anomaly.in_between_2'] == True] 
timein.dense_inputout.dense_1anomaly.countanomaly.fraudanomaly.in_between_2anomaly.too_low
42024-02-08 17:48:49.305[0.5817662108, 0.097881551, 0.1546819424, 0.47...[0.0010916889]1FalseTrueFalse
72024-02-08 17:48:49.305[1.0379636346, -0.152987302, -1.0912561862, -0...[0.0011294782]1FalseTrueFalse
82024-02-08 17:48:49.305[0.1517283662, 0.6589966337, -0.3323713647, 0....[0.0018743575]1FalseTrueFalse
92024-02-08 17:48:49.305[-0.1683100246, 0.7070470317, 0.1875234948, -0...[0.0011520088]1FalseTrueFalse
102024-02-08 17:48:49.305[0.6066235674, 0.0631839305, -0.0802961973, 0....[0.0016568303]1FalseTrueFalse
........................
9822024-02-08 17:48:49.305[-0.0932906169, 0.2837744937, -0.061094265, 0....[0.0010192394]1FalseTrueFalse
9832024-02-08 17:48:49.305[0.0991458877, 0.5813808183, -0.3863062246, -0...[0.0020678043]1FalseTrueFalse
9922024-02-08 17:48:49.305[1.0458395446, 0.2492453605, -1.5260449285, 0....[0.0013128221]1FalseTrueFalse
9982024-02-08 17:48:49.305[1.0046377125, 0.0343666504, -1.3512533246, 0....[0.0011070371]1FalseTrueFalse
10002024-02-08 17:48:49.305[0.6118805301, 0.1726081102, 0.4310545502, 0.5...[0.0012498498]1FalseTrueFalse