Wallaroo Glossary

Definitions for Wallaroo terms and concepts.
Term Definition
Assays An assay in Wallaroo are a series of built-in automated validation checks used for data analysis. Assays in Wallaroo are defined and used by data scientists to troubleshoot models in Wallaroo by generating drift detection reports and alerts on a given model’s data inputs or inference results. Wallaroo assays consist of the following:
  • baseline: A set of data within expected values.
  • window: An assay window sets the interval of time between when assays are run, and the width - or time period - of the data to analyze and compare against the baseline. Typically the window is set to 24 hour intervals with a width of 24 hours.
  • threshold: The amount of variance allowed between the established baseline and the analyzed data.
Data Connectors And Connections Data Connectors encapsulate details of a data source or sink scoped to a specific Wallaroo workspace. This allows customers to specify the external data stores the Wallaroo platform uses to ingest data for running models or stream data from inference results and logs to external data stores and other services. Data connectors are managed in Wallaroo through pipeline orchestration.
Engine Replicas Engine replicas are instances of the Wallaroo inference engine dynamically created allocate compute resources to run inferences on deployed models.
Logs Wallaroo provides Pipeline Logs as part of its architecture. These are records of inference requests and their results. These logs include the input data, the output results, any shadow outputs, check failures, and other details.
MLOps APIs MLOps APIs are a set of endpoints that allow external systems to interact with the Wallaroo platform programmatically from their ecosystem (CI/CD, ML Platforms etc.) and perform the necessary model operations. MLOps APIs support user management, workspace management, model upload, pipeline deployment, model version management, pipeline version management, pipeline inferencing, model serving, generating inference logs, generating model monitoring assays.
Model A model or Machine Learning (ML) model is an algorithm developed using historical datasets (also known as training data) to generate a specific set of insights. Trained models can operate on future datasets (non-training sets) and offer predictions (also known as inferences). Inferences help inform decisions based on the similarity between historical data and future data.
Some examples of using a ML model are:
  • Approving credit card transaction based on fraud predictions.
  • Recommending a specific therapy to a patient based on diagnosis predictions.
  • Recommending a specific product to purchase in an e-commerce experience based on consumer’s likelihood to be interested in it, their predicted shopping budgets as well as projected revenue from this consumer.

Model in Wallaroo refers to the resulting object from converting the model file artifact. For example, a model file would typically be produced from training a model (e.g .zip file, .onnx file etc) outside of Wallaroo. Uploading the model file to be able to run in a given Wallaroo runtime (onnx, TensorFlow etc.) results in a Wallaroo model object. Model artifacts imported to Wallaroo may include other files related to a given model such as preprocessing files, postprocessing files, training sets, notebooks etc.
Model Artifacts Artifacts or model artifacts are specific files and elements used or generated during model training to develop, test and track the algorithm from early experimentation to a fully trained model. Artifacts are intended to represent everything an AI team would need to be able to run and track a model from development to production.
Artifacts typically include:
  • Test datasets
  • Model worksheets/notebooks
  • Model test results
  • Model files generated from training a model (.onnx files, .zip files etc
  • Pre-processing methods to prepare the data for consumption by the model.
  • Post-processing methods to format the data for use by external services.

As models transition from the development stage to the production stage, it is important to keep track of model artifacts. This guarantees a smooth transition from the development to production, but enable AI teams developing the models to continuously optimize/tune their models leveraging production insights.
Model Serving Model Serving is the process of integrating a ML model with operations that consume its predictions to make a decision. In Wallaroo, model serving is managed leveraging ML pipelines, which expose an integration endpoint (also call inference endpoints) to consume the predictions/inferences from a model.
Model Version Model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.
Pipeline Pipeline or Wallaroo ML Pipelines are the vehicle for deploying, serving and monitoring ML models in the Wallaroo platform. A pipeline contains all the artifacts required for a specific ML process:
  • Trained models
  • Intermediate data processing required for the models
  • Data postprocessing required for exporting the models’ outputs to other services.

The simplest pipeline is a single model; more complicated pipelines can include chained models, or multiple models being compared in an A/B test or other types of tests (shadow/canary etc.). All artifacts - preprocessing or intermediate models, the core models, postprocessing - deployed in the pipeline are represented within the pipeline as pipeline steps.
Upon deployment in Wallaroo, pipelines utilize underlying compute resources orchestrated via the Wallaroo engine for deployment, serving, and monitoring of ML models in the Wallaroo platform.
Pipeline Orchestration Pipeline orchestration allows ML pipelines in Wallaroo to run models with their pre/post processing requirements based on specific triggers or schedule. Pipelines orchestration supports running repeatable large scale experiments from data scientists in Wallaroo on evolving datasets. In our current definition of this capability, pipeline orchestration would offer the ability to schedule pipelines in Wallaroo to run on a given schedule while leveraging predefined connections to ingest data from external data sources to run models in Wallaroo, and stream model results and logs to external data sources for analysis outside of Wallaroo.
Pipeline Step A pipeline step is one stage of processing in an ML pipeline. Most commonly a step is a Model, but includes data processing and transformations algorithms to prepare incoming data for running inferences in the model or outgoing data to be consumed by other services.
Product User Persona Product user personas in Wallaroo align with specific job titles and responsibilities within a Wallaroo customer’s organization. Wallaroo supports the following personas:
  • The Wallaroo platform admin
  • The data/ML scientists
  • The ML Engineer

A 4th persona, called the analyst, is being considered to complement the role of the ML/Data scientist by focusing on aligning model analytics to business analytics."
User Type User types in Wallaroo are split into 2 categories:
  • Platform User(Default): Users who work within their assigned workspaces and collaborate with other users as needed in the context of a workspace in Wallaroo.
  • Platform Admin: Set up the Wallaroo instance and have access to all workspaces in Wallaroo regardless of their membership.
Wallaroo Admin Console The Wallaroo Admin Console Interface for Wallaroo administrators to manage wallaroo platform configurations in the cloud environment in which Wallaroo has been installed. The administration console also includes managing installations and version updates for the Wallaroo platform.
Wallaroo Engine The Wallaroo Engine is a distributed computing and orchestration framework written in Rust to ensure the necessary underlying computational resources are utilized efficiently to perform deployment, inferences, management and observability on ML models in Wallaroo. The Wallaroo engine offers a set of runtime environments that are allow running models developed across the most common ML development frameworks in the market (TF, sk-learn, XGB, PyTorch, HF etc.) with optimal performance with reduced infrastructure overhead.
Workspace Workspaces are used to segment groups of models and pipelines into separate environments. This allows different users to either manage as workspace owners or have access as workspace collaborators to each workspace, controlling the models and pipelines assigned to the workspace. For more information, see the Workspace Management Guide.