Pipelines represent how data is submitted to your uploaded Machine Learning (ML) models. Pipelines allow you to:
Submit information through an uploaded file or through the Pipeline’s Deployment URL.
Have the Pipeline submit the information to one or more models in sequence.
Once complete, output the result from the model(s).
How to Create a Pipeline and Use a Pipeline
Pipelines can be created through the Wallaroo Dashboard and the Wallaroo SDK. For specifics on using the SDK, see the Wallaroo SDK Guide. For more detailed instructions and step-by-step examples with real models and data, see the Wallaroo Tutorials.
The following instructions are focused on how to use the Wallaroo Dashboard for creating, deploying, and undeploying pipelines.
How to Create a Pipeline using the Wallaroo Dashboard
Prerequisites
Before creating a pipeline through the Wallaroo Dashboard, a model must be uploaded into the workspace through the SDK. For more information, see the Wallaroo SDK Essentials Guide.
IMPORTANT NOTICE
Pipeline names are not forced to be unique. You can have 50 pipelines all named my-pipeline, which can cause confusion in determining which pipeline to use.
It is recommended that organizations agree on a naming convention and select pipeline to use rather than creating a new one each time. See the SDK guides for more information on how to select an existing pipeline.
To create a pipeline:
From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
From the upper right hand corner, select Create Pipeline.
Enter the following:
Pipeline Name: The name of the new pipeline. Pipeline names should be unique across the Wallaroo instance.
Add Pipeline Step: Select the models to be used as the pipeline steps.
When finished, select Next.
Review the name of the pipeline and the steps. If any adjustments need to be made, select either Back to rename the pipeline or Add Step(s) to change the pipeline’s steps.
When finished, select Build to create the pipeline in this workspace. The pipeline will be built and be ready for deployment within a minute.
How to Deploy and Undeploy a Pipeline using the Wallaroo Dashboard
Deployed pipelines create new namespaces in the Kubernetes environment where the Wallaroo instance is deployed, and allocate resources from the Kubernetes environment to run the pipeline and its steps.
To deploy a pipeline:
From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
Select the pipeline to deploy.
From the right navigation panel, select Deploy.
A popup module will request verification to deploy the pipeline. Select Deploy again to deploy the pipeline.
Undeploying a pipeline returns resources back to the Kubernetes environment and removes the namespaces created when the pipeline was deployed.
To undeploy a pipeline:
From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
Select the pipeline to deploy.
From the right navigation panel, select Undeploy.
A popup module will request verification to undeploy the pipeline. Select Undeploy again to undeploy the pipeline.
How to View a Pipeline Details and Metrics
To view a pipeline’s details:
From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
To view details on the pipeline, select the name of the pipeline.
A list of the pipeline’s details will be displayed.
To view a pipeline’s metrics:
From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
To view details on the pipeline, select the name of the pipeline.
A list of the pipeline’s details will be displayed.
Select Metrics to view the following information. From here you can select the time period to display metrics from through the drop down to display the following:
Requests per second
Cluster inference rate
Inference latency
The Audit Log and Anomaly Log are available to view further details of the pipeline’s activities.
Pipeline Details
The following is available from the Pipeline Details page:
The name of the pipeline.
The pipeline ID: This is in UUID format.
Pipeline steps: The steps and the models in each pipeline step.
Version History: how the pipeline has been updated over time.
1 - Wallaroo Pipeline Tag Management
How to manage tags and pipelines.
Tags can be used to label, search, and track pipelines across a Wallaroo instance. The following guide will demonstrate how to:
Create a tag for a specific pipeline.
Remove a tag for a specific pipeline.
The example shown uses the pipeline ccfraudpipeline.
Steps
Add a New Tag to a Pipeline
To set a tag to pipeline using the Wallaroo Dashboard:
Log into your Wallaroo instance.
Select the workspace the pipelines are associated with.
Select View Pipelines.
From the Pipeline Select Dashboard page, select the pipeline to update.
From the Pipeline Dashboard page, select the + icon under the name of the pipeline and it’s hash value.
Enter the name of the new tag. When complete, select Enter. The tag will be set for this pipeline.
Remove a Tag from a Pipeline
To remove a tag from a pipeline:
IMPORTANT NOTE
Once a tag is deleted from a pipeline, it can not be undeleted.
Log into your Wallaroo instance.
Select the workspace the pipelines are associated with.
Select View Pipelines.
From the Pipeline Select Dashboard page, select the pipeline to update.
From the Pipeline Dashboard page, select the select the X for the tag to delete. The tag will be removed from the pipeline.
Wallaroo SDK Tag Management
Tags are applied to either model versions or pipelines. This allows organizations to track different versions of models, and search for what pipelines have been used for specific purposes such as testing versus production use.
Create Tag
Tags are created with the Wallaroo client command create_tag(String tagname). This creates the tag and makes it available for use.
The tag will be saved to the variable currentTag to be used in the rest of these examples.
# Now we create our tagcurrentTag = wl.create_tag("My Great Tag")
List Tags
Tags are listed with the Wallaroo client command list_tags(), which shows all tags and what models and pipelines they have been assigned to.
Tags are used with pipelines to track different pipelines that are built or deployed with different features or functions.
Add Tag to Pipeline
Tags are added to a pipeline through the Wallaroo Tag add_to_pipeline(pipeline_id) method, where pipeline_id is the pipeline’s integer id.
For this example, we will add currentTag to testtest_pipeline, then verify it has been added through the list_tags command and list_pipelines command.
# add this tag to the pipelinecurrentTag.add_to_pipeline(tagtest_pipeline.id())
{'pipeline_pk_id': 1, 'tag_pk_id': 1}
Search Pipelines by Tag
Pipelines can be searched through the Wallaroo Client search_pipelines(search_term) method, where search_term is a string value for tags assigned to the pipelines.
In this example, the text “My Great Tag” that corresponds to currentTag will be searched for and displayed.
wl.search_pipelines('My Great Tag')
name
version
creation_time
last_updated_time
deployed
tags
steps
tagtestpipeline
5a4ff3c7-1a2d-4b0a-ad9f-78941e6f5677
2022-29-Nov 17:15:21
2022-29-Nov 17:15:21
(unknown)
My Great Tag
Remove Tag from Pipeline
Tags are removed from a pipeline with the Wallaroo Tag remove_from_pipeline(pipeline_id) command, where pipeline_id is the integer value of the pipeline’s id.
For this example, currentTag will be removed from tagtest_pipeline. This will be verified through the list_tags and search_pipelines command.
## remove from pipelinecurrentTag.remove_from_pipeline(tagtest_pipeline.id())
{'pipeline_pk_id': 1, 'tag_pk_id': 1}
2 - Wallaroo Assays Management
How to create and use assays to monitor model inputs and outputs.
Model Insights and Interactive Analysis Introduction
Wallaroo provides the ability to perform interactive analysis so organizations can explore the data from a pipeline and learn how the data is behaving. With this information and the knowledge of your particular business use case you can then choose appropriate thresholds for persistent automatic assays as desired.
IMPORTANT NOTE
Model insights operates over time and is difficult to demo in a notebook without pre-canned data. <strong>We assume you have an active pipeline that has been running and making predictions over time and show you the code you may use to analyze your pipeline.</strong>
Monitoring tasks called assays monitors a model’s predictions or the data coming into the model against an established baseline. Changes in the distribution of this data can be an indication of model drift, or of a change in the environment that the model trained for. This can provide tips on whether a model needs to be retrained or the environment data analyzed for accuracy or other needs.
Assay Details
Assays contain the following attributes:
Attribute
Default
Description
Name
The name of the assay. Assay names must be unique.
Baseline Data
Data that is known to be “typical” (typically distributed) and can be used to determine whether the distribution of new data has changed.
Schedule
Every 24 hours at 1 AM
New assays are configured to run a new analysis for every 24 hours starting at the end of the baseline period. This period can be configured through the SDK.
Group Results
Daily
Groups assay results into groups based on either Daily (the default), Weekly, or Monthly.
Metric
PSI
Population Stability Index (PSI) is an entropy-based measure of the difference between distributions. Maximum Difference of Bins measures the maximum difference between the baseline and current distributions (as estimated using the bins). Sum of the difference of bins sums up the difference of occurrences in each bin between the baseline and current distributions.
Threshold
0.1
The threshold for deciding whether the difference between distributions, as evaluated by the above metric, is large (the distributions are different) or small (the distributions are similar). The default of 0.1 is generally a good threshold when using PSI as the metric.
Number of Bins
5
Sets the number of bins that will be used to partition the baseline data for comparison against how future data falls into these bins. By default, the binning scheme is percentile (quantile) based. The binning scheme can be configured (see Bin Mode, below). Note that the total number of bins will include the set number plus the left_outlier and the right_outlier, so the total number of bins will be the total set + 2.
Bin Mode
Quantile
Set the binning scheme. Quantile binning defines the bins using percentile ranges (each bin holds the same percentage of the baseline data). Equal binning defines the bins using equally spaced data value ranges, like a histogram. Custom allows users to set the range of values for each bin, with the Left Outlier always starting at Min (below the minimum values detected from the baseline) and the Right Outlier always ending at Max (above the maximum values detected from the baseline).
Bin Weight
Equally Weighted
The bin weights can be either set to Equally Weighted (the default) where each bin is weighted equally, or Custom where the bin weights can be adjusted depending on which are considered more important for detecting model drift.
Manage Assays via the Wallaroo Dashboard
Assays can be created and used via the Wallaroo Dashboard.
Accessing Assays Through the Pipeline Dashboard
Assays created through the Wallaroo Dashboard are accessed through the Pipeline Dashboard through the following process.
Log into the Wallaroo Dashboard.
Select the workspace containing the pipeline with the models being monitored from the Change Current Workspace and Workspace Management drop down.
Select View Pipelines.
Select the pipeline containing the models being monitored.
Select Insights.
The Wallaroo Assay Dashboard contains the following elements. For more details of each configuration type, see the Model Insights and Assays Introduction.
(A) Filter Assays: Filter assays by the following:
Name
Status:
Active: The assay is currently running.
Paused: The assay is paused until restarted.
Drift Detected: One or more drifts have been detected.
Sort By
Sort by Creation Date: Sort by the most recent Assays first.
Last Assay Run: Sort by the most recent Assay Last Run date.
(B) Create Assay: Create a new Assay.
(C) Pause/Start Assay: Pause or Start an assay.
(D) Collapse Assay: Collapse or Expand the assay for view.
(E) Time Period for Assay Data: Set the time period for data to be used in displaying the assay results.
Show Assay Details**: View assay details. See Assay Details View for more details.
(F) Assay Events: Select an individual assay event to see more details. See View Assay Alert Details for more information.
Assay Details View
The following details are visible by selecting the Assay View Details icon:
(A) Assay Name: The name of the assay displayed.
(B) Model: The model being monitored.
(C) Baseline: The time period used to generate the baseline.
(D) Last Run: The date and time the assay was last run.
(E) Next Run: The future date and time the assay will be run again. NOTE: If the assay is paused, then it will not run at the scheduled time. When unpaused, the date will be updated to the next date and time that the assay will be run.
(F) Aggregation Type: The aggregation type used with the assay.
(G) Threshold: The threshold value used for the assay.
(H) Metric: The metric type used for the assay.
(I) Number of Bins: The number of bins used for the assay.
(J) Bin Weight: The weight applied to each bin.
(K) Bin Mode: The type of bin node applied to each bin.
View Assay Alert Details
To view details on an assay alert:
Select the data with available alert data.
Mouse hover of a specific Assay Event Alert to view the data and time of the event and the alert value.
Select the Assay Event Alert to view the Baseline and Window details of the alert including the left_outlier and right_outlier.
Hover over a bar chart graph to view additional details.
Select the ⊗ symbol to exit the Assay Event Alert details and return to the Assay View.
Build an Assay Through the Pipeline Dashboard
To create a new assay through the Wallaroo Pipeline Dashboard:
Log into the Wallaroo Dashboard.
Select the workspace containing the pipeline with the models being monitored from the Change Current Workspace and Workspace Management drop down.
Select View Pipelines.
Select the pipeline containing the models being monitored.
Select Insights.
Select +Create Assay.
On the Assay Name module, enter the following:
<figure>
<img src="/images/wallaroo-pipeline-management/wallaroo-assays-management/wallaroo-assay-assay-name-module.png"
alt="Assay Name Module" width="800"/>
</figure>
Assay Name: The name of the new assay.
Select Model to monitor: Select the model that will be monitored by the assay.
(A) Select the data to use for the baseline. This can either be set with a preset recent time period (last 30 seconds, last 60 seconds, etc) or with a custom date range.
Once selected, a preview graph of the baseline values will be displayed (B). Note that this may take a few seconds to generate.
Select Next to continue.
On the Settings Module:
Set the date and time range to view values generated by the assay. This can either be set with a preset recent time period (last 30 seconds, last 60 seconds, etc) or with a custom date range.
New assays are configured to run a new analysis for every 24 hours starting at the end of the baseline period. For information on how to adjust the scheduling period and other settings for the assay scheduling window, see the SDK section on how to Schedule Assay.
Assays are built with the Wallaroo client.build_assay(assayName, pipeline, modelName, baselineStart, baselineEnd), and returns the wallaroo.assay_config.AssayBuilder. The method requires the following parameters:
Parameter
Type
Description
assayName
String
The human friendly name of the created assay.
pipeline
Wallaroo.pipeline
The pipeline the assay is assigned to.
modelName
String
The model to perform the assay on.
baselineStart
DateTime
When to start the baseline period.
baselineStart
DateTime
When to end the baseline period.
When called, this method will then pool the pipeline between the baseline start and end periods to establish what values are considered normal outputs for the specified model.
Assays by default will run a new a new analysis every 24 hours starting at the end of the baseline period, using a 24 hour observation window.
In this example, an assay will be created named example assay and stored into the variable assay_builder.
By default assays are scheduled to run every 24 hours starting immediately after the baseline period ends. This scheduled period is referred to as the assay window and has the following properties:
width: The period of data included in the analysis. By default this is 24 hours.
interval:
How often the analysis is run (every 5 minutes, every 24 hours, etc). By default this is the window width.
start: When the analysis should start. By default this is at the end of the baseline period.
These are adjusted through the assay window_builder method that includes the following methods:
add_width: Sets the width of the window.
add_interval: Sets how often the analysis is run.
In this example, the assay will be set to run an analysis every 12 hours on the previous 24 hours of data:
Interactive baselines can be run against an assay to generate a list of the values that are established in the baseline. This is done through the AssayBuilder.interactive_baseline_run() method, which returns the following:
Parameter
Type
Description
count
Integer
The number of records evaluated.
min
Float
The minimum value found
max
Float
The maximum value found
mean
Float
The mean value derived from the values evaluated.
median
Float
The median value derived from the values evaluated.
std
Float
The standard deviation from the values evaluated.
start
DateTime
The start date for the records to evaluate.
end
DateTime
The end date for the records to evaluate.
In this example, an interactive baseline will be run against a new assay, and the results displayed:
Histogram, kernel density estimate (KDE), and Empirical Cumulative Distribution (ecdf) charts can be generated from an assay to provide a visual representation of the values evaluated and where they fit within the established baseline.
These methods are part of the AssayBuilder object and are as follows:
Method
Description
baseline_histogram()
Creates a histogram chart from the assay baseline.
baseline_kde()
Creates a kernel density estimate (KDE) chart from the assay baseline.
baseline_ecdf()
Creates an Empirical Cumulative Distribution (ecdf) from the assay baseline.
In this example, each of the three different charts will be generated from an assay:
assay_builder.baseline_histogram()
assay_builder.baseline_kde()
assay_builder.baseline_ecdf()
Run Interactive Assay
Users can issue an assay to be run through an interactive assay instead of waiting for the next scheduled assay to run through the wallaroo.assay_config.interactive_run method. This is usually run through the wallaroo.client.build_assay method, which returns a wallaroo.assay_config.AssayBuilder object.
The following example creates the AssayBuilder object then runs an interactive assay.
As defined under Assay Details, bins can be adjusted by number of bins, bin mode, and bin weight.
Number of Bins
The number of bins can be changed from the default of 5 through the wallaroo.assay_config.summarizer_builder.add_num_buns method. Note that the total number of bins will include the set bins, plus the left_outlier and the right_outlier bins. So the total number of bins are the set number of bins + 2.
The following example shows how to change the number of bins to 10 in an assay, then the assay results displayed in a chart with the total bins of 12 total (10 manually set, 1 left_outlier, 1 right_outlier).
BinMode.QUANTILE (Default): Defines the bins using percentile ranges (each bin holds the same percentage of the baseline data).
BinMode.EQUAL defines the bins using equally spaced data value ranges, like a histogram.
Custom aka BinMode.PROVIDED allows users to set the range of values for each bin, with the Left Outlier always starting at Min (below the minimum values detected from the baseline) and the Right Outlier always ending at Max (above the maximum values detected from the baseline). When using BinMode.PROVIDED the edges are passed as an array value.
Bin modes are set through the wallaroo.assay_config.summarizer_builder.add_bin_mode method.
The following examples will demonstrate changing the bin mode to equal, then setting custom provided values.
Bin weights can be adjusted so bins that that bins with more importance can be given more prominence in the final assay score. This is done through the wallaroo.assay_config.summarizer_builder.add_bin_weights, where the weights are assigned as array values matching the bins.
The following example has 10 bins (12 total including the left_outlier and the right_outlier bins), with weights assigned of 0 for the first six bins, 1 for the last six, and the resulting score from these weights.