Wallaroo Operations Guide

The Wallaroo Operations Guide is made to help users and system administrators with their Wallaroo instance. The guides are broken down into the following format:

Install Guides: How to install Wallaroo Community or Enterprise in different environments.
User Management: How to invite users into your Wallaroo instance and manage them.
Workspace Management: How to create a Workspace and manage its users, models, and pipelines.
Model Management: How to convert, upload and replace ML Models into a Wallaroo workspace.
Pipeline Management: How to build, update and deploy pipelines.

Some other resources you may find useful:

Wallaroo Developer Guides: The SDK commands that you’ll use to work with everything Wallaroo can do for you.
Wallaroo Tutorials: A set of tutorials that can be used directly with the Jupyter Hub service built into Wallaroo.

1 - Wallaroo Install Guides

How to set up Wallaroo in the minimum number of steps

This guide is targeted towards system administrators and data scientists who want to work with the easiest, fastest, and free method of running your own machine learning models. Some knowledge of the following will be useful in working with this guide:

Working knowledge of Linux distributions, particularly Ubuntu.
Either Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure experience.
Working knowledge of Kubernetes, mainly kubectl and kots.
Desire to see your models working in the cloud.

Select either Wallaroo Community or Enterprise for the general steps on how to install Wallaroo. Organizations that already have a prepared environment can skip directly to the respective installation guide for their edition of Wallaroo.

Step	Description	Average Setup Time
Setup Environment	Create an environment that meets the Wallaroo prerequisites	30 minutes
Install Wallaroo	Install Wallaroo into a prepared environment	15 minutes

Step	Description	Average Setup Time
Setup Environment	Create an environment that meets the Wallaroo prerequisites	30 minutes
Install Wallaroo	Install Wallaroo into a prepared environment	15 minutes
Configure Wallaroo	Update Wallaroo post-install with DNS integration and user setup.	Variable

Note the differences between the Wallaroo Community and Wallaroo Enterprise. Wallaroo Community is limited to a maximum of 32 cores and 2 pipelines. For organizations that require more resources, the Wallaroo Enterprise Edition may be more appropriate.

For more information, Contact Us so we can help you find out which is better for your needs.

1.1 - Wallaroo Prerequisites Guide

Software and other local system requirements before installing Wallaroo

General Time to Completion: 30 minutes.

Before installing Wallaroo version, verify that the following hardware and software requirements are met.

Environment Requirements

Environment Hardware Requirements

The following system requirements are required for the minimum settings for running Wallaroo in a Kubernetes cloud cluster.

Minimum number of nodes: 4
Minimum Number of CPU Cores: 8
Minimum RAM per node: 16 GB
Minimum Storage: A total of 625 GB of storage will be allocated for the entire cluster based on 5 users with up to four pipelines with five steps per pipeline, with 50 GB allocated per node, including 50 GB specifically for the Jupyter Hub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.

Wallaroo recommends at least 16 cores total to enable all services. At less than 16 cores, services will have to be disabled to allow basic functionality as detailed in this table.

Note that even when disabling these services, Wallaroo performance may be impacted by the models, pipelines, and data used. The greater the size of the models and steps in a pipeline, the more resources will be required for Wallaroo to operate efficiently. Pipeline resources are set by the pipeline configuration to control how many resources are allocated from the cluster to maintain peak effectiveness for other Wallaroo services. See the following guides for more details.


Cluster Size		8 core	16 core	32 core	Description
Inference		✔	✔	✔	The Wallaroo inference engine that performs inference requests from deployed pipelines.
Dashboard		✔	✔	✔	The graphics user interface for configuring workspaces, deploying pipelines, tracking metrics, and other uses.
Jupyter HUB/Lab					The JupyterHub service for running Python scripts, JupyterNotebooks, and other related tasks within the Wallaroo instance.
	Single Lab	✔	✔	✔
	Multiple Labs	✘	✔	✔
Prometheus		✔	✔	✔	Used for collecting and reporting on metrics. Typical metrics are values such as CPU utilization and memory usage.
	Alerting	✘	✔	✔
	Model Validation	✘	✔	✔
	Dashboard Graphs	✔	✔	✔
Plateau		✘	✔	✔	A Wallaroo developed service for storing inference logs at high speed. This is not a long term service; organizations are encouraged to store logs in long term solutions if required.
	Model Insights	✘	✔	✔
Python API
	Model Conversion	✔	✔	✔	Converts models into a native runtime for use with the Wallaroo inference engine.

To install Wallaroo with minimum services, a configuration file will be used as parts of the kots based installation. For full details on the Wallaroo installation process, see the Wallaroo Install Guides.

Enterprise Network Requirements

The following network requirements are required for the minimum settings for running Wallaroo:

For Wallaroo Enterprise users: 200 IP addresses are required to be allocated per cloud environment.
For Wallaroo Community users: 98 IP addresses are required to be allocated per cloud environment.
DNS services integration is required for Wallaroo Enterprise edition. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.
DNS services integration is required to provide access to the various supporting services that are part of the Wallaroo instance. These include:
- Simplified user authentication and management.
- Centralized services for accessing the Wallaroo Dashboard, Wallaroo SDK and Authentication.
- Collaboration features allowing teams to work together.
- Managed security, auditing and traceability.

Environment Software Requirements

The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.

Software or Runtime	Description	Minimum Supported Version	Preferred Version(s)
Kubernetes	Cluster deployment management	1.23	1.25
containerd	Container Management	1.7.0	1.7.0
kubectl	Kubernetes administrative console application	1.26	1.26

Node Selectors

Wallaroo uses different nodes for various services, which can be assigned to a different node pool to contain resources separate from other nodes. The following nodes selectors can be configured:

ML Engine node selector
ML Engine Load Balance node selector
Database Node Selector
Grafana node selector
Prometheus node selector
Each Lab * Node Selector

For Kots based installs:
- kots Version 1.91.3

For Helm installs:
- helm: Install Helm.
  - Minimum supported version: Helm 3.11.2
- krew: Install Krew
- krew preflight and krew support-bundle.

Cost Calculators

Organizations that intend to install Wallaroo into a Cloud environment can obtain an estimate of environment costs. The Wallaroo Install Guides list recommended virtual machine types and other settings that can be used to calculate costs for the environment.

For more information, see the pricing calculators for the following cloud services:

Kubernetes Admin Requirements

Before installing Wallaroo, the administrative node managing the Kubernetes cluster will require these tools.

kubectl
- For Kots based installs:
  - kots Version 1.91.3
- For Helm installs:
  - helm: Install Helm.
    - Minimum supported version: Helm 3.11.2
  - krew: Install Krew
  - krew preflight and krew support-bundle.

The following are quick guides on how to install kubectl and kots to install and perform updates to Wallaroo. For a helm based installation, see the How to Install Wallaroo Enterprise via Helm guides.

kubectl Quick Install Guide

The following are quick guides for installing kubectl for different operating systems. For more details, see the instructions for your specific environment.

kubectl Install For Deb Package based Linux Systems

For users running a deb based package system such as Ubuntu Linux, the following commands will install kubectl and kots into the local system. They assume the user has sudo level access to the system.

Update the apt-get repository:
```
sudo apt-get update
```

Install the prerequisite software apt-transport-https, ca-certificates, and curl.

sudo apt-get install -y \
    apt-transport-https \
    ca-certificates curl

Download the install the Google Cloud repository key:

sudo curl -fsSLo \
    /usr/share/keyrings/kubernetes-archive-keyring.gpg \
    https://packages.cloud.google.com/apt/doc/apt-key.gpg

Install the Google Cloud repository into the local repository configuration:

echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" \
    | sudo tee /etc/apt/sources.list.d/kubernetes.list

Update the apt-get repository, then install kubectl:

sudo apt-get update

sudo apt-get install -y kubectl

Verify the kubectl installation:
```
kubectl version --client
```

kubectl Install For macOS Using Homebrew

To install kubectl on a macOS system using Homebrew:

Issue the brew install command:
```
brew install kubectl
```
Verify the installation:
```
kubectl version --client
```

kots Quick Install Guide

The following are quick guides for installing kots for different operating systems. For more details, see the instructions for your specific environment.

IMPORTANT NOTE
As of this time, Wallaroo requires kots version 1.91.3. Please verify that version is installed before starting the Wallaroo installation process.

Install curl.
1. For deb based Linux systems, update the apt-get repository and install curl:
```
sudo apt-get update
sudo apt-get install curl
```
2. For macOS based systems curl is installed by default.
Install kots by downloading the script and piping it into the bash shell:
```
curl https://kots.io/install/1.91.3 | REPL_USE_SUDO=y bash
```

Manual Kots Install

A manual method to install KOTS is:

Download from https://github.com/replicatedhq/kots/releases/tag/v1.91.3. Linux and MacOS are supported.
Unpack the release
Rename the kots executable to kubectl-kots.
Copy the renamed kubectl-kots to anywhere on the PATH.

Next, verify successful installation.

~ kubectl kots version

Replicated KOTS 1.91.3

1.2 - Wallaroo Enterprise Install Guides

1.2.1 - Wallaroo Enterprise Comprehensive Install Guide

How to set up Wallaroo Enterprise, environments, and other configurations.

This guide is targeted towards system administrators and data scientists who want to work with the easiest, fastest, and comprehensive method of running your own machine learning models.

A typical installation of Wallaroo follows this process:

Step	Description	Average Setup Time
Setup Environment	Create an environment that meets the Wallaroo prerequisites	30 minutes
Install Wallaroo	Install Wallaroo into a prepared environment	15 minutes
Configure Wallaroo	Update Wallaroo with required post-install configurations.	Variable

Some knowledge of the following will be useful in working with this guide:

Working knowledge of Linux distributions, particularly Ubuntu.
A cloud provider including Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure experience.
Working knowledge of Kubernetes, mainly kubectl and kots or helm.

For more information, Contact Us for additional details.

The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.

Software or Runtime	Description	Minimum Supported Version	Preferred Version(s)
Kubernetes	Cluster deployment management	1.23	1.25
containerd	Container Management	1.7.0	1.7.0
kubectl	Kubernetes administrative console application	1.26	1.26

Custom Configurations

Wallaroo can be configured with custom installations depending on your organization’s needs. The following options are available:

Install Wallaroo to Specific Nodes: How to specify what nodes in a Kubernetes cluster to install Wallaroo to.
Install Wallaroo with Minimum Services: How to install Wallaroo with reduced services to fit into a lower-resource environment.
Taints and Tolerances Guide: How to configure Wallaroo for specific taints and tolerances so ensure that only Wallaroo services are running in specific nodes.

Environment Setup Guides

The following setup guides are used to set up the environment that will host the Wallaroo instance. Verify that the environment is prepared and meets the Wallaroo Prerequisites Guide.

Uninstall Guides

The following is a short version of the uninstallation procedure to remove a previously installed version of Wallaroo. For full details, see the How to Uninstall Wallaroo. These instructions assume administrative use of the Kubernetes command kubectl.

To uninstall a previously installed Wallaroo instance:

Delete any Wallaroo pipelines still deployed with the command kubectl delete namespace {namespace}. Typically these are the pipeline name with some numerical ID. For example, in the following list of namespaces the namespace ccfraud-pipeline-21 correspond to the Wallaroo pipeline ccfraud-pipeline. Verify these are Wallaroo pipelines before deleting.

  -> kubectl get namespaces
    NAME			    STATUS        AGE
    default		        Active        7d4h
    kube-node-lease	    Active		    7d4h
    kube-public		    Active		    7d4h
    ccfraud-pipeline-21    Active         4h23m
    wallaroo             Active         3d6h

  -> kubectl delete namespaces ccfraud-pipeline-21

Use the following bash script or run the commands individually. Warning: If the selector is incorrect or missing from the kubectl command, the cluster could be damaged beyond repair. For a default installation, the selector and namespace will be wallaroo.
```
#!/bin/bash
kubectl delete ns wallaroo && \ 
kubectl delete all,secret,configmap,clusterroles,clusterrolebindings,storageclass,crd \
--selector app.kubernetes.io/part-of=wallaroo --selector kots.io/app-slug=wallaroo
```

Wallaroo can now be reinstalled into this environment.

Environment Setup Guides

AWS Cluster for Wallaroo Enterprise Instructions

The following instructions are made to assist users set up their Amazon Web Services (AWS) environment for running Wallaroo Enterprise using AWS Elastic Kubernetes Service (EKS).

These represent a recommended setup, but can be modified to fit your specific needs.

AWS Prerequisites

To install Wallaroo in your AWS environment based on these instructions, the following prerequisites must be met:

Register an AWS account: https://aws.amazon.com/ and assign the proper permissions according to your organization’s needs.
The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux with using the containerd driver.
- Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Recommended Aws Machine type: c5.4xlarge. For more information, see the AWS Instance Types.
Installed eksctl version 0.101.0 and above.
If the cluster will utilize autoscaling, install the Cluster Autoscaler on AWS.

IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

AWS Environment Setup Steps

The following steps are guidelines to assist new users in setting up their AWS environment for Wallaroo. Feel free to replace these with commands with ones that match your needs.

These commands make use of the command line tool eksctl which streamlines the process in creating Amazon Elastic Kubernetes Service clusters for our Wallaroo environment.

The following are used for the example commands below. Replace them with your specific environment settings:

AWS Cluster Name: wallarooAWS
Create an AWS EKS Cluster

The following eksctl configuration file is an example of setting up the AWS environment for a Wallaroo cluster, including the static and adaptive nodepools. Adjust these names and settings based on your organizations requirements.

This sample YAML file can be downloaded from here:wallaroo_enterprise_aws_install.yaml

Or copied from here:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: wallarooAWS
  region: us-east-1
  version: "1.25"

addons:
  - name: aws-ebs-csi-driver

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: cluster-autoscaler
      namespace: kube-system
      labels: {aws-usage: "cluster-ops"}
    wellKnownPolicies:
      autoScaler: true
    roleName: eksctl-cluster-autoscaler-role

nodeGroups:
  - name: mainpool
    instanceType: m5.2xlarge
    desiredCapacity: 3
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: postgres
    instanceType: m5.2xlarge
    desiredCapacity: 1
    taints:
      - key: wallaroo.ai/postgres
        value: "true"
        effect: NoSchedule
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine-lb
    instanceType: c5.4xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/enginelb
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine-lb
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine
    instanceType: c5.2xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/engine
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a

Create the Cluster

Create the cluster with the following command, which creates the environment and sets the correct Kubernetes version.

eksctl create cluster -f wallaroo_enterprise_aws_install.yaml

During the process the Kubernetes credentials will be copied into the local environment. To verify the setup is complete, use the kubectl get nodes command to display the available nodes as in the following example:

kubectl get nodes

NAME                                           STATUS   ROLES    AGE     VERSION
ip-192-168-21-253.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-30-36.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-38-31.us-east-2.compute.internal    Ready    <none>   9m46s   v1.23.8-eks-9017834
ip-192-168-55-123.us-east-2.compute.internal   Ready    <none>   12m     v1.23.8-eks-9017834
ip-192-168-79-70.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-37-222.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834

Azure Cluster for Wallaroo Enterprise Instructions

The following instructions are made to assist users set up their Microsoft Azure Kubernetes environment for running Wallaroo Enterprise. These represent a recommended setup, but can be modified to fit your specific needs.

If your prepared to install the environment now, skip to Setup Environment Steps.

There are two methods we’ve detailed here on how to setup your Kubernetes cloud environment in Azure:

Quick Setup Script: Download a bash script to automatically set up the Azure environment through the Microsoft Azure command line interface az.
Manual Setup Guide: A list of the az commands used to create the environment through manual commands.
Azure Prerequisites

To install Wallaroo in your Microsoft Azure environment, the following prerequisites must be met:

Register a Microsoft Azure account: https://azure.microsoft.com/.
Install the Microsoft Azure CLI and complete the Azure CLI Get Started Guide to connect your az application to your Microsoft Azure account.
The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux the containerd driver as the default.
- Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Minimum machine type is set to to Standard_D8s_v4.

IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

Standard Setup Variables

The following variables are used in the Quick Setup Script and the Manual Setup Guide detailed below. Modify them as best fits your organization.

Variable Name	Default Value	Description
WALLAROO_RESOURCE_GROUP	wallaroogroup	The Azure Resource Group used for the KUbernetes environment.
WALLAROO_GROUP_LOCATION	eastus	The region that the Kubernetes environment will be installed to.
WALLAROO_CONTAINER_REGISTRY	wallarooacr	The Azure Container Registry used for the Kubernetes environment.
WALLAROO_CLUSTER	wallarooaks	The name of the Kubernetes cluster that Wallaroo is installed to.
WALLAROO_SKU_TYPE	Base	The Azure Kubernetes Service SKU type.
WALLAROO_VM_SIZE	Standard_D8s_v4	The VM type used for the standard Wallaroo cluster nodes.
POSTGRES_VM_SIZE	Standard_D8s_v4	The VM type used for the postgres nodepool.
ENGINELB_VM_SIZE	Standard_D8s_v4	The VM type used for the engine-lb nodepool.
ENGINE_VM_SIZE	Standard_F8s_v2	The VM type used for the engine nodepool.

Setup Environment Steps
Quick Setup Script

A sample script is available here, and creates an Azure Kubernetes environment ready for use with Wallaroo Enterprise. This script requires the following prerequisites listed above and uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.

The following script is available for download: wallaroo_enterprise_azure_expandable.bash

The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

Download the script above.
In a terminal window set the script status as execute with the command chmod +x wallaroo_enterprise_install_azure_expandable.bash.
Modify the script variables listed above based on your requirements.
Run the script with either bash wallaroo_enterprise_install_azure_expandable.bash or ./wallaroo_enterprise_install_azure_expandable.bash from the same directory as the script.

Manual Setup Guide

The following steps are guidelines to assist new users in setting up their Azure environment for Wallaroo.
The process uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.

See the Azure Command-Line Interface for full details on commands and settings.

Setting up an Azure AKS environment is based on the Azure Kubernetes Service tutorial, streamlined to show the minimum steps in setting up your own Wallaroo environment in Azure.

This follows these major steps:

Set Variables

The following are the variables used for the rest of the commands. Modify them as fits your organization’s needs.

WALLAROO_RESOURCE_GROUP=wallaroogroup
WALLAROO_GROUP_LOCATION=eastus
WALLAROO_CONTAINER_REGISTRY=wallarooacr
WALLAROO_CLUSTER=wallarooaks
WALLAROO_SKU_TYPE=Base
WALLAROO_VM_SIZE=Standard_D8s_v4
POSTGRES_VM_SIZE=Standard_D8s_v4
ENGINELB_VM_SIZE=Standard_D8s_v4
ENGINE_VM_SIZE=Standard_F8s_v2

Create an Azure Resource Group

To create an Azure Resource Group for Wallaroo in Microsoft Azure, use the following template:

az group create --name $WALLAROO_RESOURCE_GROUP --location $WALLAROO_GROUP_LOCATION

(Optional): Set the default Resource Group to the one recently created. This allows other Azure commands to automatically select this group for commands such as az aks list, etc.

az configure --defaults group={Resource Group Name}

For example:

az configure --defaults group=wallarooGroup

Create an Azure Container Registry

An Azure Container Registry(ACR) manages the container images for services includes Kubernetes. The template for setting up an Azure ACR that supports Wallaroo is the following:

az acr create -n $WALLAROO_CONTAINER_REGISTRY \
-g $WALLAROO_RESOURCE_GROUP \
--sku $WALLAROO_SKU_TYPE \
--location $WALLAROO_GROUP_LOCATION

Create an Azure Kubernetes Services

Now we can create our Kubernetes service in Azure that will host our Wallaroo with the az aks create command.

az aks create \
--resource-group $WALLAROO_RESOURCE_GROUP \
--name $WALLAROO_CLUSTER \
--node-count 3 \
--generate-ssh-keys \
--vm-set-type VirtualMachineScaleSets \
--load-balancer-sku standard \
--node-vm-size $WALLAROO_VM_SIZE \
--nodepool-name mainpool \
--attach-acr $WALLAROO_CONTAINER_REGISTRY \
--kubernetes-version=1.23.15 \
--zones 1 \
--location $WALLAROO_GROUP_LOCATION

Wallaroo Enterprise Nodepools

Wallaroo Enterprise supports autoscaling and static nodepools. The following commands are used to create both to support the Wallaroo Enterprise cluster.

The following static nodepools are set up to support the Wallaroo cluster for postgres. Update the VM_SIZE based on your requirements.

az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name postgres \
--node-count 1 \
--node-vm-size $POSTGRES_VM_SIZE \
--no-wait \
--node-taints wallaroo.ai/postgres=true:NoSchedule \
--zones 1

The following autoscaling nodepools are used for the engineLB and the engine nodepools. Adjust the settings based on your organizations requirements.

az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name enginelb \
--node-count 1 \
--node-vm-size $ENGINELB_VM_SIZE \
--no-wait \
--enable-cluster-autoscaler \
--max-count 3 \
--min-count 1 \
--node-taints wallaroo.ai/enginelb=true:NoSchedule \
--labels wallaroo-node-type=enginelb \
--zones 1

az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name engine \
--node-count 1 \
--node-vm-size $ENGINE_VM_SIZE \
--no-wait \
--enable-cluster-autoscaler \
--max-count 3 \
--min-count 1 \
--node-taints wallaroo.ai/engine=true:NoSchedule \
--labels wallaroo-node-type=engine \
--zones 1

For additional settings such as customizing the node pools for your Wallaroo Kubernetes cluster to customize the type of virtual machines used and other settings, see the Microsoft Azure documentation on using system node pools.

Download Wallaroo Kubernetes Configuration

Once the Kubernetes environment is complete, associate it with the local Kubernetes configuration by importing the credentials through the following template command:

az aks get-credentials --resource-group $WALLAROO_RESOURCE_GROUP --name $WALLAROO_CLUSTER

Verify the cluster is available through the kubectl get nodes command.

kubectl get nodes

NAME                               STATUS   ROLES   AGE   VERSION
aks-engine-99896855-vmss000000     Ready    agent   40m   v1.23.8
aks-enginelb-54433467-vmss000000   Ready    agent   48m   v1.23.8
aks-mainpool-37402055-vmss000000   Ready    agent   81m   v1.23.8
aks-mainpool-37402055-vmss000001   Ready    agent   81m   v1.23.8
aks-mainpool-37402055-vmss000002   Ready    agent   81m   v1.23.8
aks-postgres-40215394-vmss000000   Ready    agent   52m   v1.23.8

The following instructions are made to assist users set up their Google Cloud Platform (GCP) Kubernetes environment for running Wallaroo. These represent a recommended setup, but can be modified to fit your specific needs. In particular, these instructions will provision a GKE cluster with 56 CPUs in total. Please ensure that your project’s resource limits support that.

Quick Setup Script: Download a bash script to automatically set up the GCP environment through the Google Cloud Platform command line interface gcloud.
Manual Setup Guide: A list of the gcloud commands used to create the environment through manual commands.
GCP Prerequisites

Organizations that wish to run Wallaroo in their Google Cloud Platform environment must complete the following prerequisites:

Register a Google Cloud Account: https://cloud.google.com/
Create a Google Cloud project: https://cloud.google.com/resource-manager/docs/creating-managing-projects
Install gcloud and run gcloud init or gcloud init –console on the local system used to set up your environment: https://cloud.google.com/sdk/docs/install
Enable the Google Compute Engine(GCE): https://cloud.google.com/endpoints/docs/openapi/enable-api
Enable the Google Kubernetes Engine(GKE) on your project: https://console.cloud.google.com/apis/enableflow?apiid=container.googleapis.com
Select a default Computer Engine region and zone: https://cloud.google.com/compute/docs/regions-zones.

IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

Standard Setup Variables

The following variables are used in the Quick Setup Script and the Manual Setup Guide. Modify them as best fits your organization.

Variable Name	Default Value	Description
WALLAROO_GCP_PROJECT	wallaroo	The name of the Google Project used for the Wallaroo instance.
WALLAROO_CLUSTER	wallaroo	The name of the Kubernetes cluster for the Wallaroo instance.
WALLAROO_GCP_REGION	us-central1	The region the Kubernetes environment is installed to. Update this to your GCP Computer Engine region.
WALLAROO_NODE_LOCATION	us-central1-f	The location the Kubernetes nodes are installed to. Update this to your GCP Compute Engine Zone.
WALLAROO_GCP_NETWORK_NAME	wallaroo-network	The Google network used with the Kubernetes environment.
WALLAROO_GCP_SUBNETWORK_NAME	wallaroo-subnet-1	The Google network subnet used with the Kubernets environment.
DEFAULT_VM_SIZE	e2-standard-8	The VM type used for the default nodepool.
POSTGRES_VM_SIZE	n2-standard-8	The VM type used for the postgres nodepool.
ENGINELB_VM_SIZE	c2-standard-8	The VM type used for the engine-lb nodepool.
ENGINE_VM_SIZE	c2-standard-8	The VM type used for the engine nodepool.

Quick Setup Script

A sample script is available here, and creates a Google Kubernetes Engine cluster ready for use with Wallaroo Enterprise. This script requires the prerequisites listed above and uses the variables as listed in Standard Setup Variables

The following script is available for download: wallaroo_enterprise_gcp_expandable.bash

The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

Download the script above.
In a terminal window set the script status as execute with the command chmod +x bash wallaroo_enterprise_gcp_expandable.bash.
Modify the script variables listed above based on your requirements.
Run the script with either bash wallaroo_enterprise_gcp_expandable.bash or ./wallaroo_enterprise_gcp_expandable.bash from the same directory as the script.

Set Variables

The following are the variables used in the environment setup process. Modify them as best fits your organization’s needs.

WALLAROO_GCP_PROJECT=wallaroo
WALLAROO_CLUSTER=wallaroo
WALLAROO_GCP_REGION=us-central1
WALLAROO_NODE_LOCATION=us-central1-f
WALLAROO_GCP_NETWORK_NAME=wallaroo-network
WALLAROO_GCP_SUBNETWORK_NAME=wallaroo-subnet-1
DEFAULT_VM_SIZE=n2-standard-8
POSTGRES_VM_SIZE=n2-standard-8
ENGINELB_VM_SIZE=c2-standard-8
ENGINE_VM_SIZE=c2-standard-8

Manual Setup Guide

The following steps are guidelines to assist new users in setting up their GCP environment for Wallaroo. The variables used in the commands are as listed in Standard Setup Variables listed above. Feel free to replace these with ones that match your needs.

See the Google Cloud SDK for full details on commands and settings.

Create a GCP Network

First create a GCP network that is used to connect to the cluster with the gcloud compute networks create command. For more information, see the gcloud compute networks create page.

gcloud compute networks \
create $WALLAROO_GCP_NETWORK_NAME \
--bgp-routing-mode regional \
--subnet-mode custom

Verify it’s creation by listing the GCP networks:

gcloud compute networks list

Create the GCP Wallaroo Cluster

Once the network is created, the gcloud container clusters create command is used to create a cluster. For more information see the gcloud container clusters create page.

The following is a recommended format, replacing the {} listed variables based on your setup. For Google GKE containerd is enabled by default.

gcloud container clusters \
create $WALLAROO_CLUSTER \
--region $WALLAROO_GCP_REGION \
--node-locations $WALLAROO_NODE_LOCATION \
--machine-type $DEFAULT_VM_SIZE \
--network $WALLAROO_GCP_NETWORK_NAME \
--create-subnetwork name=$WALLAROO_GCP_SUBNETWORK_NAME \
--enable-ip-alias \
--cluster-version=1.23

The command can take several minutes to complete based on the size and complexity of the clusters. Verify the process is complete with the clusters list command:

gcloud container clusters list

Wallaroo Enterprise Nodepools

The following static nodepools can be set based on your organizations requirements. Adjust the settings or names based on your requirements.

gcloud container node-pools create postgres \
--cluster=$WALLAROO_CLUSTER \
--machine-type=$POSTGRES_VM_SIZE \
--num-nodes=1 \
--region $WALLAROO_GCP_REGION \
--node-taints wallaroo.ai/postgres=true:NoSchedule

The following autoscaling nodepools are used for the engine load balancers and Wallaroo engine. Again, replace names and virtual machine types based on your organizations requirements.

gcloud container node-pools create engine-lb \
--cluster=$WALLAROO_CLUSTER \
--machine-type=$ENGINELB_VM_SIZE \
--enable-autoscaling \
--num-nodes=1 \
--min-nodes=0 \
--max-nodes=3 \
--region $WALLAROO_GCP_REGION \
--node-taints wallaroo-engine-lb=true:NoSchedule,wallaroo.ai/enginelb=true:NoSchedule \
--node-labels wallaroo-node-type=engine-lb

gcloud container node-pools create engine \
--cluster=$WALLAROO_CLUSTER \
--machine-type=$ENGINE_VM_SIZE \
--enable-autoscaling \
--num-nodes=1 \
--min-nodes=0 \
--max-nodes=3 \
--region $WALLAROO_GCP_REGION \
--node-taints wallaroo.ai/engine=true:NoSchedule \
--node-labels=wallaroo-node-type=engine

Retrieving Kubernetes Credentials

Once the GCP cluster is complete, the Kubernetes credentials can be installed into the local administrative system with the gcloud container clusters get-credentials (https://cloud.google.com/sdk/gcloud/reference/container/clusters/get-credentials) command:

gcloud container clusters \
get-credentials $WALLAROO_CLUSTER \
--region $WALLAROO_GCP_REGION

To verify the Kubernetes credentials for your cluster have been installed locally, use the kubectl get nodes command. This will display the nodes in the cluster as demonstrated below:

kubectl get nodes

NAME                                         STATUS   ROLES    AGE   VERSION
gke-wallaroo-default-pool-863f02db-7xd4   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-default-pool-863f02db-8j2d   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-default-pool-863f02db-hn06   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-engine-3946eaca-4l3s         Ready    <none>   89s   v1.21.6-gke.1503
gke-wallaroo-engine-lb-2e33a27f-64wb      Ready    <none>   26m   v1.21.6-gke.1503
gke-wallaroo-postgres-d22d73d3-5qp5       Ready    <none>   28m   v1.21.6-gke.1503

Troubleshooting
- What does the error ‘Insufficient project quota to satisfy request: resource “CPUS_ALL_REGIONS”’ mean?
  Make sure that the Compute Engine Zone and Region are properly set based on your organization’s requirements. The instructions above default to us-central1, so change that zone to install your Wallaroo instance in the correct location.

Single Node Linux

Organizations can run Wallaroo within a single node Linux environment that meet the prerequisites.

The following guide is based on installing Wallaroo Enterprise into virtual machines based on Ubuntu 22.04 hosted in Google Cloud Platform (GCP), Amazon Web Services (AWS) and Microsoft Azure. For other environments and configurations, consult your Wallaroo support representative.

Prerequisites

Before starting the bare Linux installation, the following conditions must be met:

Have a Wallaroo Enterprise license file. For more information, you can request a demonstration.
A Linux bare-metal system or virtual machine with at least 32 cores and 64 GB RAM with Ubuntu 20.04 installed.
- See the Install Wallaroo with Minimum Services for installing Wallaroo with reduced services.
650 GB allocated for the root partition, plus 50 GB allocated per node and another 50 GB for the JupyterHub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.
Ensure memory swapping is disabled by removing it from /etc/fstab if needed.
DNS services for integrating your Wallaroo Enterprise instance. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.
IMPORTANT NOTE
- Wallaroo requires out-bound network connections to download the required container images and other tasks. For situations that require limiting out-bound access, refer to the air-gap installation instructions or contact your Wallaroo support representative. Also note that if Wallaroo is being installed into a cloud environment such as Google Cloud Platform, Microsoft Azure, Amazon Web Services, etc, then additional considerations such as networking, DNS, certificates, and other considerations must be accounted for. For IP address restricted environments, see the Air Gap Installation Guide.
- The steps below are based on minimum requirements for install Wallaroo in a single node environment.
- For situations that require limiting external IP access or other questions, refer to your Wallaroo support representative.
Template Single Node Scripts

The following template scripts are provided as examples on how to create single node virtual machines that meet the requirements listed above in AWS, GCP, and Microsoft Azure environments.

AWS VM Template Script
Dependencies
- AWS CLI
- IAM permissions to create resources. See IAM policies for Amazon EC2.

Download template script here: aws-single-node-vm.bash

# Variables

# The name of the virtual machine
NAME=$USER-demo-vm                     # eg bob-demo-vm

# The image used : ubuntu/images/2023.2.1/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230208
IMAGE_ID=ami-0557a15b87f6559cf

# Instance type meeting the Wallaroo requirements.
INSTANCE_TYPE=c6i.8xlarge # c6a.8xlarge is also acceptable

# key name - generate keys using Amazon EC2 Key Pairs
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html
# Wallaroo people: https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#KeyPairs:v=3 - 
MYKEY=DocNode


# We will whitelist the our source IP for maximum security -- just use 0.0.0.0/0 if you don't care.
MY_IP=$(curl -s https://checkip.amazonaws.com)/32

# Create security group in the Default VPC
aws ec2 create-security-group --group-name $NAME --description "$USER demo" --no-cli-pager

# Open port 22 and 443
aws ec2 authorize-security-group-ingress --group-name $NAME --protocol tcp --port 22 --cidr $MY_IP --no-cli-pager
aws ec2 authorize-security-group-ingress --group-name $NAME --protocol tcp --port 443 --cidr $MY_IP --no-cli-pager

# increase Boot device size to 650 GB
# Change the location from `/tmp/device.json` as required.
# cat <<EOF > /tmp/device.json 
# [{
#   "DeviceName": "/dev/sda1",
#   "Ebs": { 
#     "VolumeSize": 650,
#     "VolumeType": "gp2"
#   }
# }]
# EOF

# Launch instance with a 650 GB Boot device.
aws ec2 run-instances --image-id $IMAGE_ID --count 1 --instance-type $INSTANCE_TYPE \
    --no-cli-pager \
    --key-name $MYKEY \
    --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":650,"VolumeType":"gp2"}}]'  \
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$NAME}]" \
    --security-groups $NAME

# Sample output:
# {
#     "Instances": [
#         {
#             ...
#             "InstanceId": "i-0123456789abcdef",     # Keep this instance-id for later
#             ...
#         }
#     ]
# }

#INSTANCEID=YOURINSTANCE
      
# After several minutes, a public IP will be known. This command will retrieve it.
# aws ec2 describe-instances  --output text --instance-id $INSTANCEID \
#    --query 'Reservations[*].Instances[*].{ip:PublicIpAddress}'

# Sample Output
# 12.23.34.56

# KEYFILE=KEYFILELOCATION       #usually ~/.ssh/key.pem - verify this is the same as the key above.
# SSH to the VM - replace $INSTANCEIP
#ssh -i $KEYFILE ubuntu@$INSTANCEIP

# Stop the VM - replace the $INSTANCEID
#aws ec2 stop-instances --instance-id $INSTANCEID

# Restart the VM
#aws ec2 start-instances --instance-id $INSTANCEID

# Clean up - destroy VM
#aws ec2 terminate-instances --instance-id $INSTANCEID

Azure VM Template Script
Dependencies
- Azure CLI

Download template script here: azure-single-node-vm.bash

#!/bin/bash

# Variables list.  Update as per your organization's settings
NAME=$USER-demo-vm                          # eg bob-demo-vm
RESOURCEGROUP=YOURRESOURCEGROUP
LOCATION=eastus
IMAGE=Canonical:0001-com-ubuntu-server-jammy:22_04-lts:22.04.202301140

# Pick a location
az account list-locations  -o table |egrep 'US|----|Name'

# Create resource group
az group create -l $LOCATION --name $USER-demo-$(date +%y%m%d)

# Create VM. This will create ~/.ssh/id_rsa and id_rsa.pub - store these for later use.
az vm create --resource-group $RESOURCEGROUP --name $NAME --image $IMAGE  --generate-ssh-keys \
   --size Standard_D32s_v4 --os-disk-size-gb 500 --public-ip-sku Standard

# Sample output
# {
#   "location": "eastus",
#   "privateIpAddress": "10.0.0.4",
#   "publicIpAddress": "20.127.249.196",    <-- Write this down as MYPUBIP
#   "resourceGroup": "mnp-demo-230213",
#   ...
# }

# SSH port is open by default. This adds an application port.
az vm open-port --resource-group $RESOURCEGROUP --name $NAME --port 443

# SSH to the VM - assumes that ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub from above are availble.
# ssh $MYPUBIP

# Use this Stop the VM ("deallocate" frees resources and billing; "stop" does not)
# az vm deallocate --resource-group $RESOURCEGROUP --name $NAME

# Restart the VM
# az vm start --resource-group $RESOURCEGROUP --name $NAME

GCP VM Template Script

Dependencies:

Gcloud CLI
GCP Project ID

Download template script here: gcp-single-node-vm.bash

# Settings

NAME=$USER-demo-$(date +%y%m%d)      # eg bob-demo-230210
ZONE=us-west1-a                      # For a complete list, use `gcloud compute zones list | egrep ^us-`
PROJECT=wallaroo-dev-253816          # Insert the GCP Project ID here.  This is the one for Wallaroo.

# Create VM

IMAGE=projects/ubuntu-os-cloud/global/images/2023.2.1/ubuntu-2204-jammy-v20230114

# Port 22 and 443 open by default
gcloud compute instances create $NAME \
    --project=$PROJECT \
    --zone=$ZONE \
    --machine-type=e2-standard-32 \
    --network-interface=network-tier=STANDARD,subnet=default \
    --maintenance-policy=MIGRATE \
    --provisioning-model=STANDARD \
    --no-service-account \
    --no-scopes \
    --tags=https-server \
    --create-disk=boot=yes,image=${IMAGE},size=500,type=pd-standard \
    --no-shielded-secure-boot \
    --no-shielded-vtpm \
    --no-shielded-integrity-monitoring \
    --reservation-affinity=any


# Get the external IP address
gcloud compute instances describe $NAME --zone $ZONE --format='get(networkInterfaces[0].accessConfigs[0].natIP)'

# SSH to the VM
#gcloud compute ssh $NAME --zone $ZONE

# SCP file to the instance - replace $FILE with the file path.  Useful for copying up the license file up to the instance.

#gcloud compute scp --zone $ZONE $FILE $NAME:~/

# SSH port forward to the VM
#gcloud compute ssh $NAME --zone $ZONE -- -NL 8800:localhost:8800

# Suspend the VM
#gcloud compute instances stop $NAME --zone $ZONE

# Restart the VM
#gcloud compute instances start $NAME --zone $ZONE

Kubernetes Installation Steps

The following script and steps will install the Kubernetes version and requirements into the Linux node that supports a Wallaroo single node installation.

The process includes these major steps:

Install Kubernetes
Install Kots Version
Install Kubernetes

curl is installed in the default scripts provided above. Verify that it is installed if using some other platform.

Verify that the Ubuntu distribution is up to date, and reboot if necessary after updating.
```
sudo apt update
sudo apt upgrade
```
Start the Kubernetes installation with the following script, substituting the URL path as appropriate for your license.
For Wallaroo versions 2022.4 and below:
```
curl https://kurl.sh/9398a3a | sudo bash
```
For Wallaroo versions 2023.1 and later, the install is based on the license channel. For example, if your license uses the EE channel, then the path is /wallaroo-ee; that is, /wallaroo- plus the lower-case channel name. Note that the Kubernetes install channel must match the License version. Check with your Wallaroo support representative with any questions about your version.
```
curl https://kurl.sh/wallaroo-ee | sudo bash
```
1. If prompted with This application is incompatible with memory swapping enabled. Disable swap to continue? (Y/n), reply Y.

Set up the Kubernetes configuration with the following commands:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
chmod u+w $HOME/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc

Log out, and log back in as the same user. Verify the installation was successful with the following:
```
kubectl get nodes
```
It should return results similar to the following:
```
NAME     STATUS   ROLES                  AGE     VERSION
wallux   Ready    control-plane,master   6m26s   v1.23.6
```

Install Kots

Install kots with the following process.

Run the following script and provide your password for the sudo based commands when prompted.
```
curl https://kots.io/install/1.91.3 | REPL_USE_SUDO=y bash
```
Verify kots was installed with the following command:
```
kubectl kots version
```
It should return results similar to the following:
```
Replicated KOTS 1.91.3
```

Connection Options

Once Kubernetes has been set up on the Linux node, users can opt to copy the Kubernetes configuration to a local system, updating the IP address and other information as required. See the Configure Access to Multiple Clusters.

The easiest method is to create a SSH tunnel to the Linux node. Usually this will be in the format:

ssh $IP -L8800:localhost:8800

For example, in an AWS instance that may be as follows, replaying $KEYFILE with the link to the keyfile and $IP with the IP address of the Linux node.

ssh -i $KEYFILE ubuntu@$IP -L8800:localhost:8800

In a GCP instance, gcloud can be used as follows, replacing $NAME with the name of the GCP instance, $ZONE with the zone it was installed into.

gcloud compute ssh $NAME --zone $ZONE -- -NL 8800:localhost:8800

Port forwarding port 8800 is used for kots based installation to access the Wallaroo Administrative Dashboard.

Install Wallaroo

Organizations that use cloud services such as Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure can install Wallaroo Enterprise through the following process. These instructions also work with Single Node Linux based installations.

Before installation, the following prerequisites must be met:

Have a Wallaroo Enterprise license file. For more information, you can request a demonstration.
Set up a cloud Kubernetes environment that meets the requirements. Clusters must meet the following minimum specifications:
- Minimum number of nodes: 4
- Minimum Number of CPU Cores: 8
- Minimum RAM: 16 GB
- A total of 625 GB of storage will be allocated for the entire cluster based on 5 users with up to four pipelines with five steps per pipeline, with 50 GB allocated per node, including 50 GB specifically for the Jupyter Hub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.
- Runtime: containerd is required.
DNS services for integrating your Wallaroo Enterprise instance. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.

IMPORTANT NOTE

Wallaroo requires out-bound network connections to download the required container images and other tasks. For situations that require limiting out-bound access, refer to the air-gap installation instructions or contact your Wallaroo support representative.

Wallaroo Enterprise can be installed either interactively or automatically through the kubectl and kots applications.

Automated Install

To automatically install Wallaroo into the namespace wallaroo, specify the administrative password and the license file during the installation as in the following format with the following variables:

NAMESPACE: The namespace for the Wallaroo Enterprise install, typically wallaroo.
LICENSEFILE: The location of the Wallaroo Enterprise license file.
SHAREDPASSWORD: The password of for the Wallaroo Administrative Dashboard.

kubectl kots install wallaroo/ee -n $NAMESPACE --license-file $LICENSEFILE --shared-password $SHAREDPASSWORD

For example, the following settings translate to the following install command:

NAMESPACE: wallaroo.
LICENSEFILE: myWallaroolicense.yaml
SHAREDPASSWORD: snugglebunnies

kubectl kots install wallaroo/ee -n wallaroo --license-file myWallaroolicense.yaml --shared-password wallaroo

Interactive Install

The Interactive Install process allows users to adjust the configuration settings before Wallaroo is deployed. It requires users be able to access the Wallaroo Administrative Dashboard through a browser, typically on port 8080.

IMPORTANT NOTE: Users who install Wallaroo through another node such as in the single node installation can port use SSH tunneling to access the Wallaroo Administrative Dashboard. For example:
```
ssh IP -L8800:localhost:8800
```

Install the Wallaroo Enterprise Edition using kots install wallaroo/ee, specifying the namespace to install Wallaroo into. For example, if wallaroo is the namespace, then the command is:
```
kubectl kots install wallaroo/ee --namespace wallaroo
```

Wallaroo Enterprise Edition will be downloaded and installed into your Kubernetes environment in the namespace specified. When prompted, set the default password for the Wallaroo environment. When complete, Wallaroo Enterprise Edition will display the URL for the Admin Console, and how to end the Admin Console from running.

• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
    Enter a new password to be used for the Admin Console: •••••••••••••
  • Waiting for Admin Console to be ready ✓

• Press Ctrl+C to exit
• Go to http://localhost:8800 to access the Admin Console

Configure Wallaroo

Once installed, Wallaroo will continue to run until terminated.

To relaunch the Wallaroo Administrative Dashboard and make changes or updates, use the following command:

kubectl-kots admin-console --namespace wallaroo

DNS Services

Wallaroo Enterprise requires integration into your organizations DNS services.

The DNS Integration Guide details adding the Wallaroo instance to an organizations DNS services.

User Management

User management is handled through the Wallaroo instance Keycloak service. See the Wallaroo User Management for full guides on setting up users, identity providers, and other user configuration options.

1.2.2 - Wallaroo Enterprise Simple Install Guide

How to set up Wallaroo Enterprise for prepared environments.

The following guide is prepared for organizations that have an environment that meets the prerequisites for installing Wallaroo, and want to jump directly to the installation process.

For a complete guide that includes environment setup for different cloud providers, select the Wallaroo Enterprise Comprehensive Install Guide.

Some knowledge of the following will be useful in working with this guide:

Working knowledge of Linux distributions, particularly Ubuntu.
A cloud provider including Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure experience.
Working knowledge of Kubernetes, mainly kubectl and kots or helm.
The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.

Software or Runtime	Description	Minimum Supported Version	Preferred Version(s)
Kubernetes	Cluster deployment management	1.23	1.25
containerd	Container Management	1.7.0	1.7.0
kubectl	Kubernetes administrative console application	1.26	1.26

Install Wallaroo

Install the Wallaroo Enterprise Edition using kots install wallaroo/ee, specifying the namespace to install Wallaroo into. For example, if wallaroo is the namespace, then the command is:
```
kubectl kots install wallaroo/ee --namespace wallaroo
```

• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
    Enter a new password to be used for the Admin Console: •••••••••••••
  • Waiting for Admin Console to be ready ✓

• Press Ctrl+C to exit
• Go to http://localhost:8800 to access the Admin Console

Configure Wallaroo

Once installed, Wallaroo will continue to run until terminated.

To relaunch the Wallaroo Administrative Dashboard and make changes or updates, use the following command:

kubectl-kots admin-console --namespace wallaroo

DNS Services

Wallaroo Enterprise requires integration into your organizations DNS services.

The DNS Integration Guide details adding the Wallaroo instance to an organizations DNS services.

User Management

1.2.3 - Wallaroo Enterprise Air Gap Install Guide

Average Install Time

45 minutes depending on system performance and network connections.

Organizations that require Wallaroo be installed into an “air gap” environment - where the Wallaroo instance does not connect to the public Internet - can use these instructions to install Wallaroo into an existing Kubernetes cluster.

This guide assumes knowledge of how to use Kubernetes and work with internal clusters. The following conditions must be completed before starting an air gap installation of Wallaroo:

A Kubernetes cluster is installed and meets the prerequisites listed below.
A private container registry is available to the cluster, along with push and read credentials for that registry. This service is required for the air gap installation process to have images pushed and pulled for the installation. For examples on setting up a private container registry service, see the Docker Documentation “Deploy a registry server”. See Example Registry Service Install for an example of installing an unsecure private registry for testing purposes. For more details on setting up a container registry in a cloud environment, see the related documentation for your preferred cloud provider:
All commands will assume a method of connecting to the cluster remotely (VPN, etc).

If all prerequisites are met, skip directly to Install Instructions

General Time to Completion: 30 minutes.

Before installing Wallaroo version, verify that the following hardware and software requirements are met.

Environment Requirements

Environment Hardware Requirements

The following system requirements are required for the minimum settings for running Wallaroo in a Kubernetes cloud cluster.

Minimum number of nodes: 4
Minimum Number of CPU Cores: 8
Minimum RAM per node: 16 GB
Minimum Storage: A total of 625 GB of storage will be allocated for the entire cluster based on 5 users with up to four pipelines with five steps per pipeline, with 50 GB allocated per node, including 50 GB specifically for the Jupyter Hub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.

Wallaroo recommends at least 16 cores total to enable all services. At less than 16 cores, services will have to be disabled to allow basic functionality as detailed in this table.


Cluster Size		8 core	16 core	32 core	Description
Inference		✔	✔	✔	The Wallaroo inference engine that performs inference requests from deployed pipelines.
Dashboard		✔	✔	✔	The graphics user interface for configuring workspaces, deploying pipelines, tracking metrics, and other uses.
Jupyter HUB/Lab					The JupyterHub service for running Python scripts, JupyterNotebooks, and other related tasks within the Wallaroo instance.
	Single Lab	✔	✔	✔
	Multiple Labs	✘	✔	✔
Prometheus		✔	✔	✔	Used for collecting and reporting on metrics. Typical metrics are values such as CPU utilization and memory usage.
	Alerting	✘	✔	✔
	Model Validation	✘	✔	✔
	Dashboard Graphs	✔	✔	✔
Plateau		✘	✔	✔	A Wallaroo developed service for storing inference logs at high speed. This is not a long term service; organizations are encouraged to store logs in long term solutions if required.
	Model Insights	✘	✔	✔
Python API
	Model Conversion	✔	✔	✔	Converts models into a native runtime for use with the Wallaroo inference engine.

Enterprise Network Requirements

The following network requirements are required for the minimum settings for running Wallaroo:

For Wallaroo Enterprise users: 200 IP addresses are required to be allocated per cloud environment.
For Wallaroo Community users: 98 IP addresses are required to be allocated per cloud environment.
DNS services integration is required for Wallaroo Enterprise edition. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.
DNS services integration is required to provide access to the various supporting services that are part of the Wallaroo instance. These include:
- Simplified user authentication and management.
- Centralized services for accessing the Wallaroo Dashboard, Wallaroo SDK and Authentication.
- Collaboration features allowing teams to work together.
- Managed security, auditing and traceability.

Environment Software Requirements

The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.

Software or Runtime	Description	Minimum Supported Version	Preferred Version(s)
Kubernetes	Cluster deployment management	1.23	1.25
containerd	Container Management	1.7.0	1.7.0
kubectl	Kubernetes administrative console application	1.26	1.26

Node Selectors

Wallaroo uses different nodes for various services, which can be assigned to a different node pool to contain resources separate from other nodes. The following nodes selectors can be configured:

ML Engine node selector
ML Engine Load Balance node selector
Database Node Selector
Grafana node selector
Prometheus node selector
Each Lab * Node Selector

Install Instructions

The installation is broken into the following major processes:

Install Instructions
Troubleshooting
Example Registry Service Install
- Private Container Registry Service Install Process

Download Assets

The Wallaroo delivery team the URL and password to your organization’s License and Air Gap Download page. The following links are provided:

(A) Wallaroo Enterprise License File: The Wallaroo enterprise license file for this account. This is downloaded as a yaml file.
(B) Wallaroo Airgap Installation File: The air gap installation file that includes the necessary containers for the Wallaroo installation. This is typically about 6 GB in size. By selecting the link icon, the Wallaroo Airgap Installation File URL will be copied to the clipboard that can be used for curl or similar download commands. This file is typically downloaded as wallaroo.airgap.
IMPORTANT NOTE
If the Wallaroo Air Gap Bundle link is not available, contact your Wallaroo support representative.
(C) KOTS CLI: The installation files to install kots into the node that manages the Kubernetes cluster. This file is typically downloaded as kots_linux_amd64.tar.gz.
(D) KOTS Airgap Bundle: A set of files required by the Kubernetes environment to install Wallaroo via the air gap method. This file is typically downloaded as kotsadm.tar.gz.

Download these files either through the provided License and Airgap Download page, or by copying the links from the page and using the following command line commands into node performing the air gap installation with curl as follows:

Wallaroo Enterprise License File:

curl -LO {Link to Wallaroo Enterprise License File}

Airgap Installation File. Note the use of the -Lo option to download the Wallaroo air gap file as wallaroo.airgap, and the use of the single quotes around the Wallaroo Air Gap Installation File URL.
```
curl -Lo wallaroo.airgap '{Wallaroo Airgap Installation File URL}'
```
KOTS CLI
```
curl -LO {Link to KOTS CLI}
```
KOTS Airgap Bundle
```
curl -LO {Link to KOTS Airgap Bundle}
```

Place these files onto the air gap server or node that administrates the Kubernetes cluster. Once these files are on the node, the cluster can be air gapped and the required software installed through the next steps.

Install Kots

Install kots into the node managing the Kubernetes cluster with the following commands:

Extract the archive:
```
tar zxvf kots_linux_amd64.tar.gz kots
```
Install kots to the /usr/local/bin directory. Adjust this directory to match the location of the kubectl command.
```
sudo mv kots /usr/local/bin/kubectl-kots
```
Verify the kots installation by checking the version. The result should be similar to the following:
```
kubectl kots version
Replicated KOTS 1.91.3
```

Install the Kots Admin Console

This step will Extract the KOTS Admin Console container images and push them into a private registry. Registry credentials provided in this step must have push access. These credentials will not be stored anywhere or reused later.

This requires the following:

Private Registry Host: The URL of the private registry host used by the Kubernetes cluster.
Private Registry Port: The port of the private registry used by the Kubernetes cluster (5000 by default).
KOTS Airgap Bundle (default: kotsadm.tar.gz): Downloaded as part of Download Assets step.
Registry Push Username: The username with push access to the private registry.
Registry Push Password: The password of the registry user with push access to the private registry.

This command takes the following format:

kubectl kots admin-console push-images {KOTS Airgap Bundle} \
    {Private Registry Host}:{Private Registry Port} \
    --registry-username {Registry Push Username} \
    --registry-password {Registry Push Password}

Adjust the command based on your organizations registry setup.

Install Wallaroo Airgap

This step will install the Wallaroo air gap file into the Kubernetes cluster through the Kots Admin images.

Registry credentials provided in this step only need to have read access, and they will be stored in a Kubernetes secret in the same namespace where Admin Console will be installed. These credentials will be used to pull the images, and will be automatically created as an imagePullSecret on all of the Admin Console pods.

This requires the following:

Private Registry Host: The URL of the private registry host used by the Kubernetes cluster.
Private Registry Port: The port of the private registry used by the Kubernetes cluster (5000 by default).
Wallaroo Namespace (default: wallaroo): The kubernetes namespace used to install the Wallaroo isntance.
Wallaroo Airgap Installation File (default: wallaroo.airgap): Downloaded as part of Download Assets step.
Wallaroo License File: Downloaded as part of Download Assets step.
Registry Read Username: The username with read access to the private registry.
Registry Read Password: The password of the registry user with read access to the private registry.

The command will take the following format. Note that the option --license-file {Wallaroo License File} is required. This will point to the license REQUIRED for an air gap installation.

kubectl kots install wallaroo/ea \
    --kotsadm-registry {Private Registry Host}:{Private Registry Port} \
    --registry-username {Registry Read Username} --registry-password {Registry Read Password} \
    --airgap-bundle {Wallaroo Airgap Installation File} \
    --namespace {Wallaroo Namespace} \
    --license-file {Wallaroo License File}

The following flags can be added to speed up the configuration process:

--shared-password {Wallaroo Admin Dashboard Password}: The password used to access the Wallaroo Admin Dashboard.
--config-values config.yaml: Sets up the Wallaroo instance configuration based on the supplied yaml file.
--no-port-forward: Does not forward port 8800 for use.
--skip-preflights: Skip the standard preflight checks and launch the Wallaroo instance.

For example, the following will install Wallaroo Enterprise into the namespace wallaroo using the provided license file, using the shared password wallaroo and skipping the preflight checks:

kubectl kots install wallaroo/ea \
    --kotsadm-registry private.host:5000 \
    --registry-username xxx --registry-password yyy \
    --airgap-bundle wallaroo.airgap \
    --namespace wallaroo \
    --license-file license.yaml \
    --shared-password wallaroo \
    --skip-preflights

When complete, a link to the Wallaroo Admin Console will be made available unless the option --no-port-forward is selected.

  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console

Using Ctrl+C will disable the Wallaroo Admin Console, but the Wallaroo instance and services will continue to run in the cluster.

To reenable the Wallaroo Admin Console, use the following command:

kubectl-kots admin-console --namespace {Wallaroo Namespace}

Preflight Checks

Preflight checks will verify that the Wallaroo instance meets the prerequisites. If any fail, check your Kubernetes environment and verify they are in alignment.

Preflight checks will be skipped if Wallaroo was installed with the --skip-preflights option.

Wallaroo Admin Console

If no license file was provided through the command line, it can be provided through the Wallaroo Admin Console on port 8800. To access the Wallaroo Admin Console, some method of port forwarding through the jump box will have to be configured to the air gapped cluster.

Status Checks

While the installer allocates resources and deploys workloads, the status page will show as Missing or Unavailable. If it stays in this state for more than twenty minutes, proceed to troubleshooting or contact Wallaroo technical support.

Once the application has become ready, the status indication will turn green and ready Ready.

Troubleshooting

At any time, the administration console can create troubleshooting bundles for Wallaroo technical support to assess product health and help with problems. Support bundles contain logs and configuration files which can be examined before downloading and transmitting to Wallaroo. The console also has a configurable redaction mechanism in cases where sensitive information such as passwords, tokens, or PII (Personally Identifiable Information) need to be removed from logs in the bundle.

To manage support bundles:

Log into the administration console.
Select the Troubleshoot tab.
Select Analyze Wallaroo.
Select Download bundle to save the bundle file as a compressed archive. Depending on your browser settings the file download location can be specified.
Send the file to Wallaroo technical support.

At any time, any existing bundle can be examined and downloaded from the Troubleshoot tab.

Example Registry Service Install

The following example demonstrates how to set up an unsecure local registry service that can be used for testing. This process is not advised for production systems, and it only provided as an example for testing the air gap install process. This example uses an Ubuntu 20.04 instance as the installation environment.

This example assumes that the containerd service is installed and used by the Kubernetes cluster.

Private Container Registry Service Install Process

To install a demo container registry service on an Ubuntu 20.04 instance:

Install the registry service:

sudo apt update
sudo apt install docker-registry jq

Replace the file /etc/docker/registry/config.yml with the following. Note that this configures the service with no security:

version: 0.1
log:
fields:
    service: registry
storage:
cache:
    blobdescriptor: inmemory
filesystem:
    rootdirectory: /var/lib/docker-registry
http:
addr: :5000
headers:
    X-Content-Type-Options: [nosniff]
health:
storagedriver:
    enabled: true
    interval: 10s
    threshold: 3

Update the containerd service as follows, replacing YOUR-HOST-HERE with the hostname of the registry service configured above. Comment out any existing registry entries and replace with the new insecure registry service:

    [plugins."io.containerd.grpc.v1.cri".registry]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
        [plugins."io.containerd.grpc.v1.cri".registry.mirrors."YOUR-HOST-HERE:5000"]
        endpoint = ["http://YOUR-HOST-HERE:5000"]
    [plugins."io.containerd.grpc.v1.cri".registry.configs]
        [plugins."io.containerd.grpc.v1.cri".registry.configs."YOUR-HOST-HERE:5000".tls]
        insecure_skip_verify = true

    # [plugins."io.containerd.grpc.v1.cri".registry]
    #   config_path = ""
    #   [plugins."io.containerd.grpc.v1.cri".registry.auths]
    #   [plugins."io.containerd.grpc.v1.cri".registry.configs]
    #   [plugins."io.containerd.grpc.v1.cri".registry.headers]
    #   [plugins."io.containerd.grpc.v1.cri".registry.mirrors]

Restart the registry service and containerd service.

sudo systemctl restart docker-registry
sudo systemctl restart containerd

1.2.4 - Wallaroo Enterprise Helm Setup and Install Guides

Organizations that prefer to use the Helm package manager for Kubernetes can install Wallaroo versions 2022.4 and above via Helm.

The following procedures demonstrates how to install Wallaroo using Helm. For more information on settings and options for a Helm based install, see the Wallaroo Helm Reference Guides.

1.2.4.1 - Wallaroo Helm Standard Cloud Install Procedures

General Time to Completion: 30 minutes.

Before installing Wallaroo version, verify that the following hardware and software requirements are met.

Environment Requirements

Environment Hardware Requirements

The following system requirements are required for the minimum settings for running Wallaroo in a Kubernetes cloud cluster.

Minimum number of nodes: 4
Minimum Number of CPU Cores: 8
Minimum RAM per node: 16 GB
Minimum Storage: A total of 625 GB of storage will be allocated for the entire cluster based on 5 users with up to four pipelines with five steps per pipeline, with 50 GB allocated per node, including 50 GB specifically for the Jupyter Hub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.

Wallaroo recommends at least 16 cores total to enable all services. At less than 16 cores, services will have to be disabled to allow basic functionality as detailed in this table.


Cluster Size		8 core	16 core	32 core	Description
Inference		✔	✔	✔	The Wallaroo inference engine that performs inference requests from deployed pipelines.
Dashboard		✔	✔	✔	The graphics user interface for configuring workspaces, deploying pipelines, tracking metrics, and other uses.
Jupyter HUB/Lab					The JupyterHub service for running Python scripts, JupyterNotebooks, and other related tasks within the Wallaroo instance.
	Single Lab	✔	✔	✔
	Multiple Labs	✘	✔	✔
Prometheus		✔	✔	✔	Used for collecting and reporting on metrics. Typical metrics are values such as CPU utilization and memory usage.
	Alerting	✘	✔	✔
	Model Validation	✘	✔	✔
	Dashboard Graphs	✔	✔	✔
Plateau		✘	✔	✔	A Wallaroo developed service for storing inference logs at high speed. This is not a long term service; organizations are encouraged to store logs in long term solutions if required.
	Model Insights	✘	✔	✔
Python API
	Model Conversion	✔	✔	✔	Converts models into a native runtime for use with the Wallaroo inference engine.

Enterprise Network Requirements

The following network requirements are required for the minimum settings for running Wallaroo:

For Wallaroo Enterprise users: 200 IP addresses are required to be allocated per cloud environment.
For Wallaroo Community users: 98 IP addresses are required to be allocated per cloud environment.
DNS services integration is required for Wallaroo Enterprise edition. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.
DNS services integration is required to provide access to the various supporting services that are part of the Wallaroo instance. These include:
- Simplified user authentication and management.
- Centralized services for accessing the Wallaroo Dashboard, Wallaroo SDK and Authentication.
- Collaboration features allowing teams to work together.
- Managed security, auditing and traceability.

Environment Software Requirements

The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.

Software or Runtime	Description	Minimum Supported Version	Preferred Version(s)
Kubernetes	Cluster deployment management	1.23	1.25
containerd	Container Management	1.7.0	1.7.0
kubectl	Kubernetes administrative console application	1.26	1.26

Node Selectors

Wallaroo uses different nodes for various services, which can be assigned to a different node pool to contain resources separate from other nodes. The following nodes selectors can be configured:

ML Engine node selector
ML Engine Load Balance node selector
Database Node Selector
Grafana node selector
Prometheus node selector
Each Lab * Node Selector

For Helm installs:
- helm: Install Helm.
  - Minimum supported version: Helm 3.11.2
- krew: Install Krew
- krew preflight and krew support-bundle.

Kubernetes Installation Instructions

This sample Helm installation procedure has the following steps:

Install Kubernetes
Install Helm
Install Krew
Install Krew Support Tools

Install Kubernetes

This example requires the user use a Cloud Kubernetes installation.

Setup the Kubernetes Cloud cluster as defined in the Wallaroo Enterprise Environment Setup Guides.

Install Helm

The follow the instructions from the Installing Helm guide for your environment.

Install Krew

The following instructions were taken from the Install Krew guide.

To install the kubectl plugin krew:

Verify that git is installed in the local system.

Run the following to install krew:

(
set -x; cd "$(mktemp -d)" &&
OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
KREW="krew-${OS}_${ARCH}" &&
curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
tar zxvf "${KREW}.tar.gz" &&
./"${KREW}" install krew
)

Once complete, add the following to the .bashrc file:

export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"

Install Support Tools

Install the preflight and support-bundle Krew tools via the following commands:

kubectl krew install preflight

kubectl krew install support-bundle

Install Wallaroo via Helm

Wallaroo Provided Data

Members of the Wallaroo support staff will provide the following information:

Wallaroo Container Registration Login: Commands to login to the Wallaroo container registry.
Preflight and Support Bundle configuration files: The files preflight.yaml and support-bundle.yaml are used in the commands below to complete the preflight process and generate the support bundle package as needed for troubleshooting needs.
Preflight verification command: The commands to verify that the Kubernetes environment meets the requirements for the Wallaroo install.
Install Wallaroo Command: Instructions on installations into the Kubernetes environment using Helm through the Wallaroo container registry.

The following steps are used with these command and configuration files to install Wallaroo Enterprise via Helm.

The first step in the Wallaroo installation process via Helm is to connect to the Kubernetes environment that will host the Wallaroo Enterprise instance and login into the Wallaroo container registry through the command provided by the Wallaroo support staff. The command will take the following format, replacing $YOURUSERNAME and $YOURPASSWORD with the respective username and password provided.

helm registry login registry.replicated.com --username $YOURUSERNAME --password $YOURPASSWORD

Preflight Verification

IMPORTANT NOTE

The preflight test is not programmatically enforced during installation via Helm and should be performed manually before installation. If the Kubernetes environment does not meet the requirements the Wallaroo installation may fail or perform erratically. Please verify that all preflight test run successfully before proceeding to install Wallaroo.

Preflight verification is performed with the following command, using the preflight.yaml configuration file provided by the Wallaroo support representative as listed above.

kubectl preflight --interactive=false preflight.yaml

If successful, the tests will show PASS for each preflight requirement as in the following example:

name: cluster-resources    status: running         completed: 0    total: 2
name: cluster-resources    status: completed       completed: 1    total: 2
name: cluster-info         status: running         completed: 1    total: 2
name: cluster-info         status: completed       completed: 2    total: 2

   --- PASS Required Kubernetes Version
      --- Your cluster meets the recommended and required versions of Kubernetes.
   --- PASS Container Runtime
      --- Containerd container runtime was found.
   --- PASS Check Kubernetes environment.
      --- KURL is a supported distribution
   --- PASS Cluster Resources
      --- Cluster resources are satisfactory
   --- PASS Every node in the cluster must have at least 12Gi of memory
      --- All nodes have at least 12 GB of memory capacity
   --- PASS Every node in the cluster must have at least 8 cpus allocatable.
      --- All nodes have at least 8 CPU capacity
--- PASS   wallaroo
PASS

The following instructions detail how to install Wallaroo Enterprise via Helm for Kubernetes cloud environments such as Microsoft Azure, Amazon Web Service, and Google Cloud Platform.

IMPORTANT NOTE

These instructions are for Wallaroo Enterprise only.

Install Wallaroo

With the preflight checks and prerequisites met, Wallaroo can be installed via Helm through the following process:

Create namespace. By default, the namespace wallaroo is used:
```
kubectl create namespace wallaroo
```

Set the new namespace as the current namespace:

kubectl config set-context --current --namespace wallaroo

Set the TLS certificate secret in the Kubernetes environment:
1. Create the certificate and private key. It is recommended to name it after the domain name of your Wallaroo instance. For example: wallaroo.example.com. For production environments, organizations are recommended to use certificates from their certificate authority. Note that the Wallaroo SDK will not connect from an external connection without valid certificates. For more information on using DNS settings and certificates, see the Wallaroo DNS Integration Guide.
2. Create the Kubernetes secret from the certificates created in the previous step, replacing $TLSCONFIG with the name of the Kubernetes secret. Store the secret name for a the step Configure local values file.
```
kubectl create secret tls $TLSCONFIG --cert=$TLSSECRETS --key=$TLSSECRETS
```
  For example, if $TLSCONFIG is my-tls-secrets with example.com.crt and key example.com.key, then the command would be translated as
```
kubectl create secret tls my-tls-secrets --cert=example.com.crt --key=example.com.key
```
Configure local values file: The default Helm install of Wallaroo contains various default settings. The local values file overwrites values based on the organization needs. The following represents the minimum mandatory values for a Wallaroo installation using certificates and the default LoadBalancer for a cloud Kubernetes cluster. The configuration details below is saved as local-values.yaml for these examples.
For information on taints and tolerations settings, see the Taints and Tolerations Guide.
Note the following required settings:
- domainPrefix and domainSuffix: Used to set the DNS settings for the Wallaroo instance. For more information, see the Wallaroo DNS Integration Guide.
- replImagePrefix: proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs: Sets the Replicated installation containe proxy. Set to proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs unless using a private container registry. Contact a Wallaroo Support representative for details.
- deploymentStage and custTlsSecretName: These are set for use with the Kubernetes secret created in the previous step. External connections through the Wallaroo SDK require valid certificates.
- generate_secrets: Secrets for administrative and other users can be generated by the Helm install process, or set manually. This setting scrambles the passwords during installation.
- apilb: Sets the apilb service options including the following:
  - serviceType: LoadBalancer: Uses the default LoadBalancer setting for the Kubernetes cloud service the Wallaroo instance is installed into. Replace with the specific service connection settings as required.
  - external_inference_endpoints_enabled: true: This setting is required for performing external SDK inferences to a Wallaroo instance. For more information, see the Wallaroo Model Endpoints Guide

domainPrefix: "" # optional if using a DNS Prefix
domainSuffix: {Your Wallaroo DNS Suffix}

deploymentStage: cust
custTlsSecretName: cust-cert-secret

generate_secrets: true

apilb:
  serviceType: LoadBalancer
  external_inference_endpoints_enabled: true

dashboard:
  clientName: "xx" # Insert the name displayed in the Wallaroo Dashboard

arbEx:
  enabled: true

nats:
  enabled: true

orchestration:
  enabled: true

pipelines:
  enabled: false

imageRegistry: proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs
replImagePrefix: proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs

minio:
  persistence:
    size: 25Gi     # Minio model storage disk size. Smaller than 10Gi is not recommended.

models:
  enabled: true

pythonAPIServer:
  enabled: true

Install Wallaroo: The Wallaroo support representative will provide the installation command for the Helm install that will use the Wallaroo container registry. This assumes that the preflight checks were successful. This command uses the following format:
```
helm install $RELEASE $REGISTRYURL --version $VERSION--values $LOCALVALUES.yaml
```
Where:
1. $RELEASE: The name of the Helm release. By default, wallaroo.
2. $REGISTRYURL: The URl for the Wallaroo container registry service.
3. $VERSION: The version of Wallaroo to install. For this example, 2022.4.0-main-2297.
4. $LOCALVALUES: The .yaml file containing the local values overrides. For this example, local-values.yaml.
For example, for the registration wallaroo the command would be:
```
helm install wallaroo oci://registry.replicated.com/wallaroo/EE/wallaroo --version 2022.4.0-main-2297 --values local-values.yaml
```

Verify the Installation: Once the installation is complete, verify the installation with the helm test $RELEASE command. With the settings above, this would be:

helm test wallaroo

A successful installation will resemble the following:

NAME: wallaroo
LAST DEPLOYED: Wed Dec 21 09:15:23 2022
NAMESPACE: wallaroo
STATUS: deployed
REVISION: 1
TEST SUITE:     wallaroo-fluent-bit-test-connection
Last Started:   Wed Dec 21 11:58:34 2022
Last Completed: Wed Dec 21 11:58:37 2022
Phase:          Succeeded
TEST SUITE:     wallaroo-test-connections-hook
Last Started:   Wed Dec 21 11:58:37 2022
Last Completed: Wed Dec 21 11:58:41 2022
Phase:          Succeeded
TEST SUITE:     wallaroo-test-objects-hook
Last Started:   Wed Dec 21 11:58:41 2022
Last Completed: Wed Dec 21 11:58:53 2022
Phase:          Succeeded

At this point, the installation is complete and can be accessed through the fully qualified domain names set in the installation process above. Verify that the DNS settings are accurate before attempting to connect to the Wallaroo instance. For more information, see the Wallaroo DNS Integration Guide.

To add the initial users if they were not set up through Helm values, see the Wallaroo Enterprise User Management guide.

Troubleshoot Wallaroo

If issues are detected in the Wallaroo instance, a support bundle file is generated using the support-bundle.yaml file provided by the Wallaroo support representative.

This creates a collection of log files, configuration files and other details into a .tar.gz file in the same directory as the command is run from in the format support-bundle-YYYY-MM-DDTHH-MM-SS.tar.gz. This file is submitted to the Wallaroo support team for review.

This support bundle is generated through the following command:

kubectl support-bundle support-bundle.yaml --interactive=false

Uninstall

To uninstall Wallaroo via Helm, use the following command replacing the $RELEASE with the name of the release used to install Wallaroo. By default, this is wallaroo:

helm uninstall wallaroo

It is also recommended to remove the wallaroo namespace after the helm uninstall is complete.

IMPORTANT NOTE

Do not remove the Wallaroo namespace until after the helm uninstall is complete. Removing the namespace first can leave resources hanging and can cause issues when trying to reinstall Wallaroo via Helm.

kubectl delete namespace wallaroo

1.2.4.2 - Wallaroo Helm Reference Guides

The following guides include reference details related to installing Wallaroo via Helm.

1.2.4.2.1 - Wallaroo Helm Reference Table

A Helm chart for the control plane for Wallaroo

Configuration

The following table lists the configurable parameters of the Wallaroo chart and their default values.

Parameter	Description	Default
`kubernetes_distribution`	One of: aks, eks, gke, or kurl. May be safe to leave defaulted.	`""`
`imageRegistry`	imageRegistry where images are pulled from	`"ghcr.io/wallaroolabs"`
`imageTag`	imageTag that images default to - can be overridden for each component	`"main"`
`replImagePrefix`	imageRegistry where images are pulled from, as overridden by Kots	`"ghcr.io/wallaroolabs"`
`assays.enabled`	Controls the display of Assay data in the Dashboard	`true`
`custTlsSecretName`	Name of existing Kubernetes TLS type secret	`""`
`deploymentStage`	Deployment stage, must be set to “cust” when deployed	`"dev"`
`custTlsCert`	Customer provided certificate chain when deploymentStage is “cust”.	`""`
`custTlsKey`	Customer provided private key when deploymentStage is “cust”.	`""`
`nodeSelector`	Global node selector	`{}`
`tolerations`	Global tolerations	`[{"key": "wallaroo", "operator": "Exists", "effect": "NoSchedule"}]`
`domainPrefix`	DNS prefix of Wallaroo endpoints, can be empty for none	`"xxx"`
`domainSuffix`	DNS suffix of Wallaroo endpoints, MUST be provided	`"yyy"`
`externalIpOverride`	Used in cases where we can’t accurately determine our external, inbound IP address. Normally “”.	`""`
`imagePullPolicy`	Global policy saying when K8s pulls images: Always, Never, or IfNotPresent.	`"Always"`
`wallarooSecretName`	Secret name for pulling Wallaroo images	`"regcred"`
`apilb.nodeSelector`	standard node selector for API-LB	`{}`
`apilb.annotations`	Annotations for api-lb service	`{}`
`apilb.serviceType`	Service type of api-lb service	`"ClusterIP"`
`apilb.external_inference_endpoints_enabled`	Enable external URL inference endpoints: pipeline inference endpoints that are accessible outside of the Wallaroo cluster.	`true`
`jupyter.enabled`	If true, a jupyer hub was deployed which components can point to.	`false`
`keycloak.user`	administrative username	`"admin"`
`keycloak.password`	default admin password: overridden if generate_secrets is true	`"admin"`
`keycloak.provider.clientId`	upstream client id	`""`
`keycloak.provider.clientSecret`	upstream client secret	`""`
`keycloak.provider.name`	human name for provider	`""`
`keycloak.provider.id`	Type of provider, one of: “github”, “google”, or “OIDC”	`""`
`keycloak.provider.authorizationUrl`	URL to contact the upstream client for auth requests	`null`
`keycloak.provider.clientAuthMethod`	client auth method - Must be client_secret_post for OIDC provider type, leave blank otherwise.	`null`
`keycloak.provider.displayName`	human name for provider, displayed to end user in login dialogs	`null`
`keycloak.provider.tokenUrl`	Used only for ODIC, see token endpoint under Azure endpoints.	`null`
`dbcleaner.schedule`	when the cleaner runs, default is every eight hours	`"* /8 * *"`
`dbcleaner.maxAgeDays`	delete older than this many days	`"30"`
`plateau.enabled`	Enable Plateau deployment	`true`
`plateau.diskSize`	Disk space to allocate. Smaller than 100Gi is not recommended.	`"100Gi"`
`telemetry.enabled`	Used only for our CE product. Leave disabled for EE/Helm installs.	`false`
`dashboard.enabled`	Enable dashboard service	`true`
`dashboard.clientName`	Customer display name which appears at the top of the dashboard window.	`"Fitzroy Macropods, LLC"`
`minio.imagePullSecrets`	Must override for helm + private registry; eg `-name: "some-secret"`	`[]`
`minio.image.repository`	Must override for helm + private registry	`"quay.io/minio/minio"`
`minio.mcImage.repository`	Must override for helm + private registry	`"quay.io/minio/mc"`
`minio.persistence.size`	Minio model storage disk size. Smaller than 10Gi is not recommended.	`"10Gi"`
`fluent-bit.imagePullSecrets`	Must override for helm + private registry; eg `-name: "some-secret"`	`[]`
`fluent-bit.image.repository`	Must override for helm + private registry	`"cr.fluentbit.io/fluent/fluent-bit"`
`helmTests.enabled`	When enabled, create “helm test” resources.	`true`
`helmTests.nodeSelector`	When `helm test` is run, this selector places the test pods.	`{}`
`pythonAPIServer.enabled`	This service is used for model conversion.	`false`
`explainabilityServer.enabled`	Enable the model explainability service	`false`
`replImagePrefix`	Sets the replicated image prefix for installation containers. Set to `replImagePrefix: proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs` unless otherwise instructed.

1.2.4.2.2 - Wallaroo Helm Reference Details

post_delete_hook

This hook runs when you do helm uninstall unless …

you give –no-hooks to helm
you set the enable flag to False at INSTALL time.

imageRegistry

Registry and Tag portion of Wallaroo images. Third party images are not included. Tag is
computed at runtime and overridden. In online Helm installs, these should not be touched; in
airgap Helm installs imageRegistry must be overridden to local registry.

generate_secrets

If true, generate random secrets for several services at install time.
If false, use the generic defaults listed here, which can also be overridden by caller.

assays

This is a (currently) Dashboard-specific feature flag to control the display of Assays.

custTlsSecretName

To provide TLS certificates, (1) set deploymentStage to “cust”, then (2) provide EITHER the
name of an existing Kubernetes TLS secret in custTlsSecret OR provide base64 encoded secrets
in custTlsCert and custTlsKey.

domainPrefix

DNS specification for our named external service endpoints.

To form URLs, we concatenate the optional domainPrefix, the service name in question, and then
the domainSuffix. Their values are based on license, type, and customer config inputs. They
MUST be overriden per install via helm values, or by Replicated.

Community – prefix/suffix in license

domainPrefix	domainSuffix	dashboard_fqdn	thing_fqdn (thing = jup, kc, etc)
""	wallaroo.community	(never)	(never)
cust123	wallaroo.community	cust123.wallaroo.community	cust123.thing.wallaroo.community

Enterprise et al – prefix/suffix from config

domainPrefix	domainSuffix	dashboard_fqdn	thing_fqdn (thing = jup, kc, etc)
""	wl.bigco	wl.bigco	thing.wl.
cust123	wl.bigco	cust123.wl.bigco	cust123.thing.wl.bigco

wallarooSecretName

In online Helm installs, an image pull secret is created and this is its name. The secret allows
the Kubernetes node to pull images from proxy.replicated.com. In airgap Helm installs, a local
Secret of type docker-registry must be created and this value set to its name.

privateModelRegistry

If the customer has specified a private model container registry, the enable flag will reflect
and the secret will be populated. registry, username, and password are mandatory. email
is optional. registry is of the form “hostname:port”.

apilb

Main ingress LB for Wallaroo services.

The Kubernetes Ingress object is not used, instead we deploy a single Envoy load balancer with a
single IP in all cases, which serves: TLS termination, authentication (JWT) checking, and both
host based and path based application routing. Customer should be aware of two values in particular.

api.serviceType defaults to ClusterIP. If api.serviceType is set to LoadBalancer, cloud
services will allocate a hosted LB service, in which case the apilb.annotations should be
provided, in order to pass configuration such as “internal” or “external” to the cloud service.

Example:
apilb:
serviceType: LoadBalancer
annotations: service.beta.kubernetes.io/aws-load-balancer-internal: “true”

keycloak

Wallaroo can connect to a variety of identity providers, broker OpenID Connect authentication
requests, and then limit access to endpoints. This section configures a https://www.keycloak.org
installation. If a provider is specified here, Keycloak will configure itself to use that on
install. If no providers are specified here, the administrator must login to the Keycloak
service as the administrative user and either add users by hand or create an auth provider. In
general, a client must be created upstream and a URL, client ID, and secret (token) for that
client is entered here.

dbcleaner

Manage retention for fluentbit table. This contains log message outputs from orchestration tasks.

plateau

Plateau is a low-profile fixed-footprint log processor / event store for fast storage of
inference results. The amount of disk space provisioned is adjustable. Smaller than “100Gi” is
not recommended for performance reasons.

pythonAPIServer

Model conversion is an optional service that allows converting non-onnx models (keras, sklearn,
and xgboost) to onnx and adding them to your pipeline, without extensive manual conversion or
processing steps. This allows more rapid iteration over models or experiments.

wsProxy

This controls the wsProxy, and should only be enabled if nats (ArbEx) is also enabled.
wsProxy is required for the Dashboard to subscribe to events and show notifications.

orchestration

Pipeline orchestration is general task execution service that allows users to upload arbitrary
code and have it executed on their behalf by the system. nats and arbex must be enabled.

models

The model server supports model autoconversion and requires nats and arbitrary execution to be
enabled.

1.2.5 - Wallaroo Enterprise Environment Setup Samples

The following are examples of setting up an environment capable of hosting Wallaroo that meets the Wallaroo installation prerequisites.

Environment Setup Guides

The following setup guides are used to set up the environment that will host the Wallaroo instance. Verify that the environment is prepared and meets the Wallaroo Prerequisites Guide.

Uninstall Guides

To uninstall a previously installed Wallaroo instance:

  -> kubectl get namespaces
    NAME			    STATUS        AGE
    default		        Active        7d4h
    kube-node-lease	    Active		    7d4h
    kube-public		    Active		    7d4h
    ccfraud-pipeline-21    Active         4h23m
    wallaroo             Active         3d6h

  -> kubectl delete namespaces ccfraud-pipeline-21

Use the following bash script or run the commands individually. Warning: If the selector is incorrect or missing from the kubectl command, the cluster could be damaged beyond repair. For a default installation, the selector and namespace will be wallaroo.
```
#!/bin/bash
kubectl delete ns wallaroo && \ 
kubectl delete all,secret,configmap,clusterroles,clusterrolebindings,storageclass,crd \
--selector app.kubernetes.io/part-of=wallaroo --selector kots.io/app-slug=wallaroo
```

Wallaroo can now be reinstalled into this environment.

Environment Setup Guides

AWS Cluster for Wallaroo Enterprise Instructions

The following instructions are made to assist users set up their Amazon Web Services (AWS) environment for running Wallaroo Enterprise using AWS Elastic Kubernetes Service (EKS).

These represent a recommended setup, but can be modified to fit your specific needs.

AWS Prerequisites

To install Wallaroo in your AWS environment based on these instructions, the following prerequisites must be met:

Register an AWS account: https://aws.amazon.com/ and assign the proper permissions according to your organization’s needs.
The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux with using the containerd driver.
- Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Recommended Aws Machine type: c5.4xlarge. For more information, see the AWS Instance Types.
Installed eksctl version 0.101.0 and above.
If the cluster will utilize autoscaling, install the Cluster Autoscaler on AWS.

IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

AWS Environment Setup Steps

The following steps are guidelines to assist new users in setting up their AWS environment for Wallaroo. Feel free to replace these with commands with ones that match your needs.

These commands make use of the command line tool eksctl which streamlines the process in creating Amazon Elastic Kubernetes Service clusters for our Wallaroo environment.

The following are used for the example commands below. Replace them with your specific environment settings:

AWS Cluster Name: wallarooAWS
Create an AWS EKS Cluster

This sample YAML file can be downloaded from here:wallaroo_enterprise_aws_install.yaml

Or copied from here:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: wallarooAWS
  region: us-east-1
  version: "1.25"

addons:
  - name: aws-ebs-csi-driver

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: cluster-autoscaler
      namespace: kube-system
      labels: {aws-usage: "cluster-ops"}
    wellKnownPolicies:
      autoScaler: true
    roleName: eksctl-cluster-autoscaler-role

nodeGroups:
  - name: mainpool
    instanceType: m5.2xlarge
    desiredCapacity: 3
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: postgres
    instanceType: m5.2xlarge
    desiredCapacity: 1
    taints:
      - key: wallaroo.ai/postgres
        value: "true"
        effect: NoSchedule
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine-lb
    instanceType: c5.4xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/enginelb
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine-lb
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine
    instanceType: c5.2xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/engine
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a

Create the Cluster

Create the cluster with the following command, which creates the environment and sets the correct Kubernetes version.

eksctl create cluster -f wallaroo_enterprise_aws_install.yaml

kubectl get nodes

NAME                                           STATUS   ROLES    AGE     VERSION
ip-192-168-21-253.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-30-36.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-38-31.us-east-2.compute.internal    Ready    <none>   9m46s   v1.23.8-eks-9017834
ip-192-168-55-123.us-east-2.compute.internal   Ready    <none>   12m     v1.23.8-eks-9017834
ip-192-168-79-70.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-37-222.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834

Azure Cluster for Wallaroo Enterprise Instructions

If your prepared to install the environment now, skip to Setup Environment Steps.

There are two methods we’ve detailed here on how to setup your Kubernetes cloud environment in Azure:

Quick Setup Script: Download a bash script to automatically set up the Azure environment through the Microsoft Azure command line interface az.
Manual Setup Guide: A list of the az commands used to create the environment through manual commands.
Azure Prerequisites

To install Wallaroo in your Microsoft Azure environment, the following prerequisites must be met:

Register a Microsoft Azure account: https://azure.microsoft.com/.
Install the Microsoft Azure CLI and complete the Azure CLI Get Started Guide to connect your az application to your Microsoft Azure account.
The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux the containerd driver as the default.
- Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Minimum machine type is set to to Standard_D8s_v4.

IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

Standard Setup Variables

The following variables are used in the Quick Setup Script and the Manual Setup Guide detailed below. Modify them as best fits your organization.

Variable Name	Default Value	Description
WALLAROO_RESOURCE_GROUP	wallaroogroup	The Azure Resource Group used for the KUbernetes environment.
WALLAROO_GROUP_LOCATION	eastus	The region that the Kubernetes environment will be installed to.
WALLAROO_CONTAINER_REGISTRY	wallarooacr	The Azure Container Registry used for the Kubernetes environment.
WALLAROO_CLUSTER	wallarooaks	The name of the Kubernetes cluster that Wallaroo is installed to.
WALLAROO_SKU_TYPE	Base	The Azure Kubernetes Service SKU type.
WALLAROO_VM_SIZE	Standard_D8s_v4	The VM type used for the standard Wallaroo cluster nodes.
POSTGRES_VM_SIZE	Standard_D8s_v4	The VM type used for the postgres nodepool.
ENGINELB_VM_SIZE	Standard_D8s_v4	The VM type used for the engine-lb nodepool.
ENGINE_VM_SIZE	Standard_F8s_v2	The VM type used for the engine nodepool.

Setup Environment Steps
Quick Setup Script

The following script is available for download: wallaroo_enterprise_azure_expandable.bash

The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

Download the script above.
In a terminal window set the script status as execute with the command chmod +x wallaroo_enterprise_install_azure_expandable.bash.
Modify the script variables listed above based on your requirements.
Run the script with either bash wallaroo_enterprise_install_azure_expandable.bash or ./wallaroo_enterprise_install_azure_expandable.bash from the same directory as the script.

Manual Setup Guide

See the Azure Command-Line Interface for full details on commands and settings.

Setting up an Azure AKS environment is based on the Azure Kubernetes Service tutorial, streamlined to show the minimum steps in setting up your own Wallaroo environment in Azure.

This follows these major steps:

Set Variables

The following are the variables used for the rest of the commands. Modify them as fits your organization’s needs.

WALLAROO_RESOURCE_GROUP=wallaroogroup
WALLAROO_GROUP_LOCATION=eastus
WALLAROO_CONTAINER_REGISTRY=wallarooacr
WALLAROO_CLUSTER=wallarooaks
WALLAROO_SKU_TYPE=Base
WALLAROO_VM_SIZE=Standard_D8s_v4
POSTGRES_VM_SIZE=Standard_D8s_v4
ENGINELB_VM_SIZE=Standard_D8s_v4
ENGINE_VM_SIZE=Standard_F8s_v2

Create an Azure Resource Group

To create an Azure Resource Group for Wallaroo in Microsoft Azure, use the following template:

az group create --name $WALLAROO_RESOURCE_GROUP --location $WALLAROO_GROUP_LOCATION

(Optional): Set the default Resource Group to the one recently created. This allows other Azure commands to automatically select this group for commands such as az aks list, etc.

az configure --defaults group={Resource Group Name}

For example:

az configure --defaults group=wallarooGroup

Create an Azure Container Registry

An Azure Container Registry(ACR) manages the container images for services includes Kubernetes. The template for setting up an Azure ACR that supports Wallaroo is the following:

az acr create -n $WALLAROO_CONTAINER_REGISTRY \
-g $WALLAROO_RESOURCE_GROUP \
--sku $WALLAROO_SKU_TYPE \
--location $WALLAROO_GROUP_LOCATION

Create an Azure Kubernetes Services

Now we can create our Kubernetes service in Azure that will host our Wallaroo with the az aks create command.

az aks create \
--resource-group $WALLAROO_RESOURCE_GROUP \
--name $WALLAROO_CLUSTER \
--node-count 3 \
--generate-ssh-keys \
--vm-set-type VirtualMachineScaleSets \
--load-balancer-sku standard \
--node-vm-size $WALLAROO_VM_SIZE \
--nodepool-name mainpool \
--attach-acr $WALLAROO_CONTAINER_REGISTRY \
--kubernetes-version=1.23.15 \
--zones 1 \
--location $WALLAROO_GROUP_LOCATION

Wallaroo Enterprise Nodepools

Wallaroo Enterprise supports autoscaling and static nodepools. The following commands are used to create both to support the Wallaroo Enterprise cluster.

The following static nodepools are set up to support the Wallaroo cluster for postgres. Update the VM_SIZE based on your requirements.

az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name postgres \
--node-count 1 \
--node-vm-size $POSTGRES_VM_SIZE \
--no-wait \
--node-taints wallaroo.ai/postgres=true:NoSchedule \
--zones 1

The following autoscaling nodepools are used for the engineLB and the engine nodepools. Adjust the settings based on your organizations requirements.

az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name enginelb \
--node-count 1 \
--node-vm-size $ENGINELB_VM_SIZE \
--no-wait \
--enable-cluster-autoscaler \
--max-count 3 \
--min-count 1 \
--node-taints wallaroo.ai/enginelb=true:NoSchedule \
--labels wallaroo-node-type=enginelb \
--zones 1

az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name engine \
--node-count 1 \
--node-vm-size $ENGINE_VM_SIZE \
--no-wait \
--enable-cluster-autoscaler \
--max-count 3 \
--min-count 1 \
--node-taints wallaroo.ai/engine=true:NoSchedule \
--labels wallaroo-node-type=engine \
--zones 1

Download Wallaroo Kubernetes Configuration

Once the Kubernetes environment is complete, associate it with the local Kubernetes configuration by importing the credentials through the following template command:

az aks get-credentials --resource-group $WALLAROO_RESOURCE_GROUP --name $WALLAROO_CLUSTER

Verify the cluster is available through the kubectl get nodes command.

kubectl get nodes

NAME                               STATUS   ROLES   AGE   VERSION
aks-engine-99896855-vmss000000     Ready    agent   40m   v1.23.8
aks-enginelb-54433467-vmss000000   Ready    agent   48m   v1.23.8
aks-mainpool-37402055-vmss000000   Ready    agent   81m   v1.23.8
aks-mainpool-37402055-vmss000001   Ready    agent   81m   v1.23.8
aks-mainpool-37402055-vmss000002   Ready    agent   81m   v1.23.8
aks-postgres-40215394-vmss000000   Ready    agent   52m   v1.23.8

Quick Setup Script: Download a bash script to automatically set up the GCP environment through the Google Cloud Platform command line interface gcloud.
Manual Setup Guide: A list of the gcloud commands used to create the environment through manual commands.
GCP Prerequisites

Organizations that wish to run Wallaroo in their Google Cloud Platform environment must complete the following prerequisites:

Register a Google Cloud Account: https://cloud.google.com/
Create a Google Cloud project: https://cloud.google.com/resource-manager/docs/creating-managing-projects
Install gcloud and run gcloud init or gcloud init –console on the local system used to set up your environment: https://cloud.google.com/sdk/docs/install
Enable the Google Compute Engine(GCE): https://cloud.google.com/endpoints/docs/openapi/enable-api
Enable the Google Kubernetes Engine(GKE) on your project: https://console.cloud.google.com/apis/enableflow?apiid=container.googleapis.com
Select a default Computer Engine region and zone: https://cloud.google.com/compute/docs/regions-zones.

IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

Standard Setup Variables

The following variables are used in the Quick Setup Script and the Manual Setup Guide. Modify them as best fits your organization.

Variable Name	Default Value	Description
WALLAROO_GCP_PROJECT	wallaroo	The name of the Google Project used for the Wallaroo instance.
WALLAROO_CLUSTER	wallaroo	The name of the Kubernetes cluster for the Wallaroo instance.
WALLAROO_GCP_REGION	us-central1	The region the Kubernetes environment is installed to. Update this to your GCP Computer Engine region.
WALLAROO_NODE_LOCATION	us-central1-f	The location the Kubernetes nodes are installed to. Update this to your GCP Compute Engine Zone.
WALLAROO_GCP_NETWORK_NAME	wallaroo-network	The Google network used with the Kubernetes environment.
WALLAROO_GCP_SUBNETWORK_NAME	wallaroo-subnet-1	The Google network subnet used with the Kubernets environment.
DEFAULT_VM_SIZE	e2-standard-8	The VM type used for the default nodepool.
POSTGRES_VM_SIZE	n2-standard-8	The VM type used for the postgres nodepool.
ENGINELB_VM_SIZE	c2-standard-8	The VM type used for the engine-lb nodepool.
ENGINE_VM_SIZE	c2-standard-8	The VM type used for the engine nodepool.

Quick Setup Script

The following script is available for download: wallaroo_enterprise_gcp_expandable.bash

The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

Download the script above.
In a terminal window set the script status as execute with the command chmod +x bash wallaroo_enterprise_gcp_expandable.bash.
Modify the script variables listed above based on your requirements.
Run the script with either bash wallaroo_enterprise_gcp_expandable.bash or ./wallaroo_enterprise_gcp_expandable.bash from the same directory as the script.

Set Variables

The following are the variables used in the environment setup process. Modify them as best fits your organization’s needs.

WALLAROO_GCP_PROJECT=wallaroo
WALLAROO_CLUSTER=wallaroo
WALLAROO_GCP_REGION=us-central1
WALLAROO_NODE_LOCATION=us-central1-f
WALLAROO_GCP_NETWORK_NAME=wallaroo-network
WALLAROO_GCP_SUBNETWORK_NAME=wallaroo-subnet-1
DEFAULT_VM_SIZE=n2-standard-8
POSTGRES_VM_SIZE=n2-standard-8
ENGINELB_VM_SIZE=c2-standard-8
ENGINE_VM_SIZE=c2-standard-8

Manual Setup Guide

See the Google Cloud SDK for full details on commands and settings.

Create a GCP Network

First create a GCP network that is used to connect to the cluster with the gcloud compute networks create command. For more information, see the gcloud compute networks create page.

gcloud compute networks \
create $WALLAROO_GCP_NETWORK_NAME \
--bgp-routing-mode regional \
--subnet-mode custom

Verify it’s creation by listing the GCP networks:

gcloud compute networks list

Create the GCP Wallaroo Cluster

Once the network is created, the gcloud container clusters create command is used to create a cluster. For more information see the gcloud container clusters create page.

The following is a recommended format, replacing the {} listed variables based on your setup. For Google GKE containerd is enabled by default.

gcloud container clusters \
create $WALLAROO_CLUSTER \
--region $WALLAROO_GCP_REGION \
--node-locations $WALLAROO_NODE_LOCATION \
--machine-type $DEFAULT_VM_SIZE \
--network $WALLAROO_GCP_NETWORK_NAME \
--create-subnetwork name=$WALLAROO_GCP_SUBNETWORK_NAME \
--enable-ip-alias \
--cluster-version=1.23

The command can take several minutes to complete based on the size and complexity of the clusters. Verify the process is complete with the clusters list command:

gcloud container clusters list

Wallaroo Enterprise Nodepools

The following static nodepools can be set based on your organizations requirements. Adjust the settings or names based on your requirements.

gcloud container node-pools create postgres \
--cluster=$WALLAROO_CLUSTER \
--machine-type=$POSTGRES_VM_SIZE \
--num-nodes=1 \
--region $WALLAROO_GCP_REGION \
--node-taints wallaroo.ai/postgres=true:NoSchedule

The following autoscaling nodepools are used for the engine load balancers and Wallaroo engine. Again, replace names and virtual machine types based on your organizations requirements.

gcloud container node-pools create engine-lb \
--cluster=$WALLAROO_CLUSTER \
--machine-type=$ENGINELB_VM_SIZE \
--enable-autoscaling \
--num-nodes=1 \
--min-nodes=0 \
--max-nodes=3 \
--region $WALLAROO_GCP_REGION \
--node-taints wallaroo-engine-lb=true:NoSchedule,wallaroo.ai/enginelb=true:NoSchedule \
--node-labels wallaroo-node-type=engine-lb

gcloud container node-pools create engine \
--cluster=$WALLAROO_CLUSTER \
--machine-type=$ENGINE_VM_SIZE \
--enable-autoscaling \
--num-nodes=1 \
--min-nodes=0 \
--max-nodes=3 \
--region $WALLAROO_GCP_REGION \
--node-taints wallaroo.ai/engine=true:NoSchedule \
--node-labels=wallaroo-node-type=engine

Retrieving Kubernetes Credentials

gcloud container clusters \
get-credentials $WALLAROO_CLUSTER \
--region $WALLAROO_GCP_REGION

To verify the Kubernetes credentials for your cluster have been installed locally, use the kubectl get nodes command. This will display the nodes in the cluster as demonstrated below:

kubectl get nodes

NAME                                         STATUS   ROLES    AGE   VERSION
gke-wallaroo-default-pool-863f02db-7xd4   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-default-pool-863f02db-8j2d   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-default-pool-863f02db-hn06   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-engine-3946eaca-4l3s         Ready    <none>   89s   v1.21.6-gke.1503
gke-wallaroo-engine-lb-2e33a27f-64wb      Ready    <none>   26m   v1.21.6-gke.1503
gke-wallaroo-postgres-d22d73d3-5qp5       Ready    <none>   28m   v1.21.6-gke.1503

Troubleshooting
- What does the error ‘Insufficient project quota to satisfy request: resource “CPUS_ALL_REGIONS”’ mean?
  Make sure that the Compute Engine Zone and Region are properly set based on your organization’s requirements. The instructions above default to us-central1, so change that zone to install your Wallaroo instance in the correct location.

Single Node Linux

Organizations can run Wallaroo within a single node Linux environment that meet the prerequisites.

Prerequisites

Before starting the bare Linux installation, the following conditions must be met:

Have a Wallaroo Enterprise license file. For more information, you can request a demonstration.
A Linux bare-metal system or virtual machine with at least 32 cores and 64 GB RAM with Ubuntu 20.04 installed.
- See the Install Wallaroo with Minimum Services for installing Wallaroo with reduced services.
650 GB allocated for the root partition, plus 50 GB allocated per node and another 50 GB for the JupyterHub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.
Ensure memory swapping is disabled by removing it from /etc/fstab if needed.
DNS services for integrating your Wallaroo Enterprise instance. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.
IMPORTANT NOTE
- Wallaroo requires out-bound network connections to download the required container images and other tasks. For situations that require limiting out-bound access, refer to the air-gap installation instructions or contact your Wallaroo support representative. Also note that if Wallaroo is being installed into a cloud environment such as Google Cloud Platform, Microsoft Azure, Amazon Web Services, etc, then additional considerations such as networking, DNS, certificates, and other considerations must be accounted for. For IP address restricted environments, see the Air Gap Installation Guide.
- The steps below are based on minimum requirements for install Wallaroo in a single node environment.
- For situations that require limiting external IP access or other questions, refer to your Wallaroo support representative.
Template Single Node Scripts

The following template scripts are provided as examples on how to create single node virtual machines that meet the requirements listed above in AWS, GCP, and Microsoft Azure environments.

AWS VM Template Script
Dependencies
- AWS CLI
- IAM permissions to create resources. See IAM policies for Amazon EC2.

Download template script here: aws-single-node-vm.bash

# Variables

# The name of the virtual machine
NAME=$USER-demo-vm                     # eg bob-demo-vm

# The image used : ubuntu/images/2023.2.1/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230208
IMAGE_ID=ami-0557a15b87f6559cf

# Instance type meeting the Wallaroo requirements.
INSTANCE_TYPE=c6i.8xlarge # c6a.8xlarge is also acceptable

# key name - generate keys using Amazon EC2 Key Pairs
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html
# Wallaroo people: https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#KeyPairs:v=3 - 
MYKEY=DocNode


# We will whitelist the our source IP for maximum security -- just use 0.0.0.0/0 if you don't care.
MY_IP=$(curl -s https://checkip.amazonaws.com)/32

# Create security group in the Default VPC
aws ec2 create-security-group --group-name $NAME --description "$USER demo" --no-cli-pager

# Open port 22 and 443
aws ec2 authorize-security-group-ingress --group-name $NAME --protocol tcp --port 22 --cidr $MY_IP --no-cli-pager
aws ec2 authorize-security-group-ingress --group-name $NAME --protocol tcp --port 443 --cidr $MY_IP --no-cli-pager

# increase Boot device size to 650 GB
# Change the location from `/tmp/device.json` as required.
# cat <<EOF > /tmp/device.json 
# [{
#   "DeviceName": "/dev/sda1",
#   "Ebs": { 
#     "VolumeSize": 650,
#     "VolumeType": "gp2"
#   }
# }]
# EOF

# Launch instance with a 650 GB Boot device.
aws ec2 run-instances --image-id $IMAGE_ID --count 1 --instance-type $INSTANCE_TYPE \
    --no-cli-pager \
    --key-name $MYKEY \
    --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":650,"VolumeType":"gp2"}}]'  \
    --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$NAME}]" \
    --security-groups $NAME

# Sample output:
# {
#     "Instances": [
#         {
#             ...
#             "InstanceId": "i-0123456789abcdef",     # Keep this instance-id for later
#             ...
#         }
#     ]
# }

#INSTANCEID=YOURINSTANCE
      
# After several minutes, a public IP will be known. This command will retrieve it.
# aws ec2 describe-instances  --output text --instance-id $INSTANCEID \
#    --query 'Reservations[*].Instances[*].{ip:PublicIpAddress}'

# Sample Output
# 12.23.34.56

# KEYFILE=KEYFILELOCATION       #usually ~/.ssh/key.pem - verify this is the same as the key above.
# SSH to the VM - replace $INSTANCEIP
#ssh -i $KEYFILE ubuntu@$INSTANCEIP

# Stop the VM - replace the $INSTANCEID
#aws ec2 stop-instances --instance-id $INSTANCEID

# Restart the VM
#aws ec2 start-instances --instance-id $INSTANCEID

# Clean up - destroy VM
#aws ec2 terminate-instances --instance-id $INSTANCEID

Azure VM Template Script
Dependencies
- Azure CLI

Download template script here: azure-single-node-vm.bash

#!/bin/bash

# Variables list.  Update as per your organization's settings
NAME=$USER-demo-vm                          # eg bob-demo-vm
RESOURCEGROUP=YOURRESOURCEGROUP
LOCATION=eastus
IMAGE=Canonical:0001-com-ubuntu-server-jammy:22_04-lts:22.04.202301140

# Pick a location
az account list-locations  -o table |egrep 'US|----|Name'

# Create resource group
az group create -l $LOCATION --name $USER-demo-$(date +%y%m%d)

# Create VM. This will create ~/.ssh/id_rsa and id_rsa.pub - store these for later use.
az vm create --resource-group $RESOURCEGROUP --name $NAME --image $IMAGE  --generate-ssh-keys \
   --size Standard_D32s_v4 --os-disk-size-gb 500 --public-ip-sku Standard

# Sample output
# {
#   "location": "eastus",
#   "privateIpAddress": "10.0.0.4",
#   "publicIpAddress": "20.127.249.196",    <-- Write this down as MYPUBIP
#   "resourceGroup": "mnp-demo-230213",
#   ...
# }

# SSH port is open by default. This adds an application port.
az vm open-port --resource-group $RESOURCEGROUP --name $NAME --port 443

# SSH to the VM - assumes that ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub from above are availble.
# ssh $MYPUBIP

# Use this Stop the VM ("deallocate" frees resources and billing; "stop" does not)
# az vm deallocate --resource-group $RESOURCEGROUP --name $NAME

# Restart the VM
# az vm start --resource-group $RESOURCEGROUP --name $NAME

GCP VM Template Script

Dependencies:

Gcloud CLI
GCP Project ID

Download template script here: gcp-single-node-vm.bash

# Settings

NAME=$USER-demo-$(date +%y%m%d)      # eg bob-demo-230210
ZONE=us-west1-a                      # For a complete list, use `gcloud compute zones list | egrep ^us-`
PROJECT=wallaroo-dev-253816          # Insert the GCP Project ID here.  This is the one for Wallaroo.

# Create VM

IMAGE=projects/ubuntu-os-cloud/global/images/2023.2.1/ubuntu-2204-jammy-v20230114

# Port 22 and 443 open by default
gcloud compute instances create $NAME \
    --project=$PROJECT \
    --zone=$ZONE \
    --machine-type=e2-standard-32 \
    --network-interface=network-tier=STANDARD,subnet=default \
    --maintenance-policy=MIGRATE \
    --provisioning-model=STANDARD \
    --no-service-account \
    --no-scopes \
    --tags=https-server \
    --create-disk=boot=yes,image=${IMAGE},size=500,type=pd-standard \
    --no-shielded-secure-boot \
    --no-shielded-vtpm \
    --no-shielded-integrity-monitoring \
    --reservation-affinity=any


# Get the external IP address
gcloud compute instances describe $NAME --zone $ZONE --format='get(networkInterfaces[0].accessConfigs[0].natIP)'

# SSH to the VM
#gcloud compute ssh $NAME --zone $ZONE

# SCP file to the instance - replace $FILE with the file path.  Useful for copying up the license file up to the instance.

#gcloud compute scp --zone $ZONE $FILE $NAME:~/

# SSH port forward to the VM
#gcloud compute ssh $NAME --zone $ZONE -- -NL 8800:localhost:8800

# Suspend the VM
#gcloud compute instances stop $NAME --zone $ZONE

# Restart the VM
#gcloud compute instances start $NAME --zone $ZONE

Kubernetes Installation Steps

The following script and steps will install the Kubernetes version and requirements into the Linux node that supports a Wallaroo single node installation.

The process includes these major steps:

Install Kubernetes
Install Kots Version
Install Kubernetes

curl is installed in the default scripts provided above. Verify that it is installed if using some other platform.

Verify that the Ubuntu distribution is up to date, and reboot if necessary after updating.
```
sudo apt update
sudo apt upgrade
```
Start the Kubernetes installation with the following script, substituting the URL path as appropriate for your license.
For Wallaroo versions 2022.4 and below:
```
curl https://kurl.sh/9398a3a | sudo bash
```
For Wallaroo versions 2023.1 and later, the install is based on the license channel. For example, if your license uses the EE channel, then the path is /wallaroo-ee; that is, /wallaroo- plus the lower-case channel name. Note that the Kubernetes install channel must match the License version. Check with your Wallaroo support representative with any questions about your version.
```
curl https://kurl.sh/wallaroo-ee | sudo bash
```
1. If prompted with This application is incompatible with memory swapping enabled. Disable swap to continue? (Y/n), reply Y.

Set up the Kubernetes configuration with the following commands:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
chmod u+w $HOME/.kube/config
echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc

Log out, and log back in as the same user. Verify the installation was successful with the following:
```
kubectl get nodes
```
It should return results similar to the following:
```
NAME     STATUS   ROLES                  AGE     VERSION
wallux   Ready    control-plane,master   6m26s   v1.23.6
```

Install Kots

Install kots with the following process.

Run the following script and provide your password for the sudo based commands when prompted.
```
curl https://kots.io/install/1.91.3 | REPL_USE_SUDO=y bash
```
Verify kots was installed with the following command:
```
kubectl kots version
```
It should return results similar to the following:
```
Replicated KOTS 1.91.3
```

Connection Options

The easiest method is to create a SSH tunnel to the Linux node. Usually this will be in the format:

ssh $IP -L8800:localhost:8800

For example, in an AWS instance that may be as follows, replaying $KEYFILE with the link to the keyfile and $IP with the IP address of the Linux node.

ssh -i $KEYFILE ubuntu@$IP -L8800:localhost:8800

In a GCP instance, gcloud can be used as follows, replacing $NAME with the name of the GCP instance, $ZONE with the zone it was installed into.

gcloud compute ssh $NAME --zone $ZONE -- -NL 8800:localhost:8800

Port forwarding port 8800 is used for kots based installation to access the Wallaroo Administrative Dashboard.

1.3 - Wallaroo Community Install Guides

1.3.1 - Wallaroo Community Simple Install Guide

How to set up Wallaroo Community in a prepared environment.

The following guide details how to set up Wallaroo Community in a prepared environment. For a comprehensive guide for the entire process including sample scripts for setting up a cloud environment, see the Wallaroo Community Comprehensive Install Guide

Install Wallaroo Community

Wallaroo Community can be installed into a Kubernetes cloud environment, or into a Kubernetes environment that meets the Wallaroo Prerequisites Guide. Organizations that use the Wallaroo Community AWS EC2 Setup procedure do not have to set up a Kubernetes environment, as it is already configured for them.

This video demonstrates that procedure:

The procedure assumes at least a basic knowledge of Kubernetes and how to use the kubectl and kots version 1.91.3 applications.

The procedure involves the following major steps:

Install Wallaroo Community
- Prerequisites
  - Local Software Requirements
- Install Wallaroo
  - Initial Configuration and License Upload Procedure

Prerequisites

Local Software Requirements

Before starting, verify that all local system requirements are complete as detailed in the Wallaroo Community Local System Prerequisites guide:

kubectl: This interfaces with the Kubernetes server created in the Wallaroo environment.
For Kots based installs:
- kots Version 1.91.3

Cloud Kubernetes environment has been prepared.
You have downloaded your Wallaroo Community License file.

Install Wallaroo

The environment is ready, the tools are installed - let’s install Wallaroo! The following will use kubectl and kots through the following procedure:

Install the Wallaroo Community Edition using kots install wallaroo/ce, specifying the namespace to install. For example, if wallaroo is the namespace, then the command is:
```
kubectl kots install wallaroo/ce --namespace wallaroo
```

Wallaroo Community Edition will be downloaded and installed into your Kubernetes environment in the namespace specified. When prompted, set the default password for the Wallaroo environment. When complete, Wallaroo Community Edition will display the URL for the Admin Console, and how to end the Admin Console from running.

• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
    Enter a new password to be used for the Admin Console: •••••••••••••
  • Waiting for Admin Console to be ready ✓

• Press Ctrl+C to exit
• Go to http://localhost:8800 to access the Admin Console

Wallaroo Community edition will continue to run until terminated. To relaunch in the future, use the following command:

kubectl-kots admin-console --namespace wallaroo

Initial Configuration and License Upload Procedure

Once Wallaroo Community edition has been installed for the first time, we can perform initial configuration and load our Wallaroo Community license file through the following process:

If Wallaroo Community Edition has not started, launch it with the following command:

❯ kubectl-kots admin-console --namespace wallaroo
  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console

Enter the Wallaroo Community Admin Console address into a browser. You will be prompted for the default password as set in the step above. Enter it and select Log in.
Upload your license file.
The Configure Wallaroo Community page will be displayed which allows you to customize your Wallaroo environment. For now, scroll to the bottom and select Continue. These settings can be customized at a later date.
The Wallaroo Community Admin Console will run the preflight checks to verify that all of the minimum requirements are not met. This may take a few minutes. If there are any issues, Wallaroo can still be launched but may not function properly. When ready, select Continue.
The Wallaroo Community Dashboard will be displayed. There may be additional background processes that are completing their setup procedures, so there may be a few minute wait until those are complete. If everything is ready, then the Wallaroo Dashboard will show a green Ready.
Under the license information is the DNS entry for your Wallaroo instance. This is where you and other users of your Wallaroo instance can log in. In this example, the URL will be https://beautiful-horse-9537.wallaroo.community. Note that it may take a few minutes for the DNS entries to propagate and this URL to be available.
You will receive an email invitation for the email address connected to this URL with a temporary password and a link to this Wallaroo instance’s URL. Either enter the URL for your Wallaroo instance or use the link in the email.
To login to your new Wallaroo instance, enter the email address and temporary password associated with the license.

With that, Wallaroo Community edition is launched and ready for use! You can end the Admin Console from your terminal session above. From this point on you can just use the Wallaroo instance URL.

1.3.2 - Wallaroo Community Comprehensive Install Guide

How to set up Wallaroo Community in various environments.

This guide is targeted towards system administrators and data scientists who want to work with the easiest, fastest, and free method of running your own machine learning models.

A typical installation of Wallaroo Community follows this process:

Step	Description	Average Setup Time
Download License	Create an environment that meets the Wallaroo prerequisites	5 minutes
Set Up Environment	Set up the cloud environment hosting the Wallaroo instance.	30 minutes
Install Wallaroo	Install Wallaroo into a prepared environment	15 minutes

Register Your Wallaroo Community Account

Completion Time

General completion time: 5 minutes

The first step to installing Wallaroo CE is to set up your Wallaroo Community Account at the web site https://portal.wallaroo.community. This process typically takes about 5 minutes.

Once you’ve submitted your credentials, you’ll be sent an email with a link to your license file.

Follow the link and download your license file. Store it in a secure location.

Redownload License

If your license is misplaced or otherwise lost, it can be downloaded again later from the same link, or by following the registration steps again to be provided with a link to your license file.

Setup Environments

The following setup guides are used to set up the environment that will host the Wallaroo instance. Verify that the environment is prepared and meets the Wallaroo Prerequisites Guide.

Wallaroo Community AWS EC2 Setup Instructions

The following instructions are made to assist users set up their Amazon Web Services (AWS) environment for running Wallaroo using AWS virtual servers with EC2. This allows organizations to stand a single virtual machine and used a pre-made Amazon Machine Images (AMIs) to quickly stand up an environment that can be used to install Wallaroo.

AWS Prerequisites

To install Wallaroo in your AWS environment based on these instructions, the following prerequisites must be met:

Register an AWS account: https://aws.amazon.com/ and assign the proper permissions according to your organization’s needs. This must be a paid AWS account - Wallaroo will not operate on the free tier level of virtual machines.
Steps
- Create the EC2 VM

To create your Wallaroo instance using a pre-made AMI:

Log into AWS cloud console.
Set the region to N. Virginia. Other regions will be added over time.
Select Services -> EC2.
Select Instances, then from the upper right hand section Launch Instances->Launch Instances.
Set the Name and any additional tags.
In Application and OS Images, enter Wallaroo Install and press Enter.
From the search results, select Community AMIs and select Wallaroo Installer 3a.
Set the Instance Type as c6i.8xlarge or c6a.8xlarge as the minimum machine type. This provides 32 cores with 60 GB memory.
For Key pair (login) select one of the following:
1. Select an existing Key pair name
2. Select Create new key pair and set the following:
  1. Name: The name of the new key pair.
  2. Key pair type: Select either RSA or ED25519.
  3. Private key file format: Select either .pem or .ppk. These instructions are based on the .pem file.
  4. Select Create key pair when complete.
Set the following for Network settings:
1. Firewall: Select Create security group or select from an existing one that best fits your organization.
2. Allow SSH traffic from: Set to Enabled and Anywhere 0.0.0.0/0.
3. Allow HTTPs traffic from the internet: Set to Enabled.
Set the following for Configure Storage:
1. Set Root volume to at least 400 GiB, type standard.
Review the Summary and verify the following:
1. Number of instances: 1
2. Virtual server type: Matches the minimum requirement listed above.
3. Verify the other settings are accurate.
Select Launch Instance.

It is recommended to give the instance time to complete its setup process. This typically takes 20 minutes.

Verify the Setup

To verify the environment is setup for Wallaroo:

From the EC2 Dashboard, select the virtual machine created for your Wallaroo instance.
Note the Public IPv4 DNS address.
From a terminal, run ssh to connect to your virtual machine. The installation requires access to port 8800 and the private key selected or created in the instructions above.
The ssh command format for connecting to your virtual machine uses the following format, replacing the $keyfile, $VM_DNS with your private key file and the DNS address to your Amazon VM:
```
ssh -i "$keyfile" ubuntu@$VM_DNS -L8800:localhost:8800
```
For example, a $keyfile of Doc Sample Key.pem and $VM_DNS of ec2-54-160-227-100.compute-1.amazonaws.com would be as follows:
```
ssh -i "Doc Sample Key.pem" ubuntu@ec2-54-160-227-100.compute-1.amazonaws.com -L8800:localhost:8800
```
If the Kubernetes setup is still installing, wait until complete and when prompted select EXIT to complete the process. This process may take up to 20 to 30 minutes.

Cost Saving Tips The following tips can be used to save costs on your AWS EC2 instance.
- Stop Instances When Not In Use

One cost saving measure is to stop instances when not in use. If you intend to stop an instance, register it with static IP address so when it is turned back on your services will continue to function without interruption.

Reference: How do I associate a static public IP address with my EC2 Windows or Linux instance?.

Troubleshooting
- I keep seeing the errors such as connect failed. Is this a problem?
  - Sometimes you may see an error such as channel 3: open failed: connect failed: Connection refused. This is the ssh port forwarding attempting to connect to port 8800 during the installation, and can be ignored.
- When Launching JupyterHub, I get a Server 500 error.
  - If you shut down and restart a Wallaroo instance in a new environment or change the IP address, some settings may not be updated. Run the following command to restart the deployment process and update the settings to match the current environment:
```
kubectl rollout restart deployment hub
```

Setup AWS EKS Environment for Wallaroo

The following instructions are made to assist users set up their Amazon Web Services (AWS) environment for running Wallaroo using AWS Elastic Kubernetes Service (EKS).

These represent a recommended setup, but can be modified to fit your specific needs.

If the prerequisites are already met, skip ahead to Install Wallaroo.

The following video demonstrates this process:

AWS Prerequisites

To install Wallaroo in your AWS environment based on these instructions, the following prerequisites must be met:

Register an AWS account: https://aws.amazon.com/ and assign the proper permissions according to your organization’s needs.
The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux with using the containerd driver.
- Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Recommended Aws Machine type: c5.4xlarge. For more information, see the AWS Instance Types.
Installed eksctl version 0.101.0 and above.
If the cluster will utilize autoscaling, install the Cluster Autoscaler on AWS.

AWS Cluster Recommendations

The following recommendations will assist in reducing the cost of a cloud based Kubernetes Wallaroo cluster.

Turn off the cluster when not in use. An AWS EKS (Elastic Kubernetes Services) cluster can be turn off when not in use, then turned back on again when needed. If organizations adopt this process, be aware of the following issues:
- IP Address Reassignment: The load balancer public IP address may be reassigned when the cluster is restarted by the cloud service unless a static IP address is assigned. For more information in Amazon Web Services see the Associate Elastic IP addresses with resources in your VPC user guide.
Assign to a Single Availability Zone: Clusters that span multiple availability zones may have issues accessing persistent volumes that were provisioned in another availability zone from the node when the node is restarted. The simple solution is to assign the entire cluster into a single availability zone. For more information in Amazon Web Services see the Regions and Zones guide.
The scripts and configuration files are set up to create the AWS environment for a Wallaroo instance are based on a single availability zone. Modify the script as required for your organization.

Community Cluster Setup Instructions

The following is based on the requirements for Wallaroo Community. Note that Wallaroo Community does not use adaptive nodepools. Adapt the settings as required for your organization’s needs, as long as they meet the prerequisites listed above.

This sample YAML file can be downloaded from here:

wallaroo_community_aws_install.yaml

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  # replace with the name of your server
  name: wallarooAWS
  # replace with your location
  region: us-east-1
  version: "1.25"

addons:
  - name: aws-ebs-csi-driver

iam:
  withOIDC: true

nodeGroups:
  - name: mainpool
    instanceType: m5.2xlarge
    desiredCapacity: 4
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a

Install AWS Command Line Tools

The following steps require the installation of the following Amazon Web Services command line tools:

AWS CLI: Complete both the AWS Command Line Installation Guide and the AWS CLI Quick Setup Guide to authenticate to your AWS account.
eksctl: A command line tool for managing Amazon EKS clusters from a configured YAML file. See the EKSCTL Install Guide for more details.
kubectl: This interfaces with the Kubernetes server created in the Wallaroo environment.
kots Version: Used to manage software installed in a Kubernetes environment.
Create the Cluster

Create the cluster with the following command, which creates the environment and sets the correct Kubernetes version.

eksctl create cluster -f aws.yaml

During the process the Kuberntes credentials will be copied into the local environment. To verify the setup is complete, use the kubectl get nodes command to display the available nodes as in the following example:

kubectl get nodes

NAME                                           STATUS   ROLES    AGE     VERSION
ip-192-168-21-253.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-30-36.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-55-123.us-east-2.compute.internal   Ready    <none>   12m     v1.23.8-eks-9017834
ip-192-168-79-70.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834

Setup Azure Environment for Wallaroo

The following instructions are made to assist users set up their Microsoft Azure Kubernetes environment for running Wallaroo Community. These represent a recommended setup, but can be modified to fit your specific needs.

If your prepared to install the environment now, skip to Setup Environment Steps.

There are two methods we’ve detailed here on how to setup your Kubernetes cloud environment in Azure:

Quick Setup Script: Download a bash script to automatically set up the Azure environment through the Microsoft Azure command line interface az.
Manual Setup Guide: A list of the az commands used to create the environment through manual commands.

The following video demonstrates the manual guide:

Azure Prerequisites

To install Wallaroo in your Microsoft Azure environment, the following prerequisites must be met:

Register a Microsoft Azure account: https://azure.microsoft.com/.
Install the Microsoft Azure CLI and complete the Azure CLI Get Started Guide to connect your az application to your Microsoft Azure account.
The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux the containerd driver as the default.
- Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Minimum machine type is set to to Standard_D8s_v4.

Azure Cluster Recommendations

The following recommendations will assist in reducing the cost of a cloud based Kubernetes Wallaroo cluster.

Turn off the cluster when not in use. An Azure Kubernetes Service (AKS) cluster can be turn off when not in use, then turned back on again when needed to save on costs. For more information on starting and stopping an AKS cluster, see the Stop and Start an Azure Kubernetes Service (AKS) cluster guide.
If organizations adopt this process, be aware of the following issues:
- IP Address Reassignment: The load balancer public IP address may be reassigned when the cluster is restarted by the cloud service unless a static IP address is assigned. For more information in Microsoft Azure see the Use a static public IP address and DNS label with the Azure Kubernetes Service (AKS) load balancer user guide.
Assign to a Single Availability Zone: Clusters that span multiple availability zones may have issues accessing persistent volumes that were provisioned in another availability zone from the node when the cluster is restarted. The simple solution is to assign the entire cluster into a single availability zone. For more information in Microsoft Azure see the Create an Azure Kubernetes Service (AKS) cluster that uses availability zones guide.
The scripts and configuration files are set up to create the Azure environment for a Wallaroo instance are based on a single availability zone. Modify the script as required for your organization.

Setup Environment Steps
- Standard Setup Variables
  The following variables are used in the Quick Setup Script and the Manual Setup Guide. Modify them as best fits your organization.

Variable Name	Default Value	Description
WALLAROO_RESOURCE_GROUP	wallaroocegroup	The Azure Resource Group used for the Kubernetes environment.
WALLAROO_GROUP_LOCATION	eastus	The region that the Kubernetes environment will be installed to.
WALLAROO_CONTAINER_REGISTRY	wallarooceacr	The Azure Container Registry used for the Kubernetes environment.
WALLAROO_CLUSTER	wallarooceaks	The name of the Kubernetes cluster that Wallaroo is installed to.
WALLAROO_SKU_TYPE	Base	The Azure Kubernetes Service SKU type.
WALLAROO_NODEPOOL	wallaroocepool	The main nodepool for the Kubernetes cluster.
WALLAROO_VM_SIZE	Standard_D8s_v4	The VM type used for the standard Wallaroo cluster nodes.
WALLAROO_CLUSTER_SIZE	4	The number of nodes in the cluster.

Quick Setup Script

The following sample script creates an Azure Kubernetes environment ready for use with Wallaroo Community. This script requires the following prerequisites listed above.

Modify the installation file to fit for your organization. The only parts that require modification are the variables listed in the beginning as follows:

The following script is available for download: wallaroo_community_azure_install.bash

The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

Download the script above.
In a terminal window set the script status as execute with the command chmod +x wallaroo_community_azure_install.bash.
Modify the script variables listed above based on your requirements.
Run the script with either bash wallaroo_community_azure_install.bash or ./wallaroo_community_azure_install.bash from the same directory as the script.

Manual Setup Guide

The following steps are guidelines to assist new users in setting up their Azure environment for Wallaroo. Feel free to replace these with commands with ones that match your needs.

See the Azure Command-Line Interface for full details on commands and settings.

The following are used for the example commands below. Replace them with your specific environment settings:

Azure Resource Group: wallarooCEGroup
Azure Resource Group Location: eastus
Azure Container Registry: wallarooCEAcr
Azure Kubernetes Cluster: wallarooCEAKS
Azure Container SKU type: Base
Azure Nodepool Name: wallarooCEPool

Setting up an Azure AKS environment is based on the Azure Kubernetes Service tutorial, streamlined to show the minimum steps in setting up your own Wallaroo environment in Azure.

This follows these major steps:

Create an Azure Resource Group
Create an Azure Container Registry
Create the Azure Kubernetes Environment
Set Variables

The following are the variables used in the environment setup process. Modify them as best fits your organization’s needs.

WALLAROO_RESOURCE_GROUP=wallaroocegroupdocs
WALLAROO_GROUP_LOCATION=eastus
WALLAROO_CONTAINER_REGISTRY=wallarooceacrdocs
WALLAROO_CLUSTER=wallarooceaksdocs
WALLAROO_SKU_TYPE=Base
WALLAROO_NODEPOOL=wallaroocepool
WALLAROO_VM_SIZE=Standard_D8s_v4
WALLAROO_CLUSTER_SIZE=4

Create an Azure Resource Group

To create an Azure Resource Group for Wallaroo in Microsoft Azure, use the following template:

az group create --name $WALLAROO_RESOURCE_GROUP --location $WALLAROO_GROUP_LOCATION

(Optional): Set the default Resource Group to the one recently created. This allows other Azure commands to automatically select this group for commands such as az aks list, etc.

az configure --defaults group=$WALLAROO_RESOURCE_GROUP

Create an Azure Container Registry

An Azure Container Registry(ACR) manages the container images for services includes Kubernetes. The template for setting up an Azure ACR that supports Wallaroo is the following:

az acr create -n $WALLAROO_CONTAINER_REGISTRY -g $WALLAROO_RESOURCE_GROUP --sku $WALLAROO_SKU_TYPE --location $WALLAROO_GROUP_LOCATION

Create an Azure Kubernetes Services

And now we can create our Kubernetes service in Azure that will host our Wallaroo that meet the prerequisites. Modify the settings to meet your organization’s needs. This creates a 4 node cluster with a total of 32 cores.

az aks create \
--resource-group $WALLAROO_RESOURCE_GROUP \
--name $WALLAROO_CLUSTER \
--node-count $WALLAROO_CLUSTER_SIZE \
--generate-ssh-keys \
--vm-set-type VirtualMachineScaleSets \
--load-balancer-sku standard \
--node-vm-size $WALLAROO_VM_SIZE \
--nodepool-name $WALLAROO_NODEPOOL \
--nodepool-name mainpool \
--attach-acr $WALLAROO_CONTAINER_REGISTRY \
--kubernetes-version=1.23.8 \
--zones 1 \
--location $WALLAROO_GROUP_LOCATION

Download Wallaroo Kubernetes Configuration

Once the Kubernetes environment is complete, associate it with the local Kubernetes configuration by importing the credentials through the following template command:

az aks get-credentials --resource-group $WALLAROO_RESOURCE_GROUP --name $WALLAROO_CLUSTER

Verify the cluster is available through the kubectl get nodes command.

kubectl get nodes

NAME                               STATUS   ROLES   AGE   VERSION
aks-mainpool-37402055-vmss000000   Ready    agent   81m   v1.23.8
aks-mainpool-37402055-vmss000001   Ready    agent   81m   v1.23.8
aks-mainpool-37402055-vmss000002   Ready    agent   81m   v1.23.8
aks-mainpool-37402055-vmss000003   Ready    agent   81m   v1.23.8

Setup GCP Environment for Wallaroo

The following instructions are made to assist users set up their Google Cloud Platform (GCP) Kubernetes environment for running Wallaroo. These represent a recommended setup, but can be modified to fit your specific needs. In particular, these instructions will provision a GKE cluster with 32 CPUs in total. Please ensure that your project’s resource limits support that.

Quick Setup Guide: Download a bash script to automatically set up the GCP environment through the Google Cloud Platform command line interface gcloud.
Manual Setup Guide: A list of the gcloud commands used to create the environment through manual commands.

The following video demonstrates the manual guide:

GCP Prerequisites

Organizations that wish to run Wallaroo in their Google Cloud Platform environment must complete the following prerequisites:

Register a Google Cloud Account: https://cloud.google.com/
Create a Google Cloud project: https://cloud.google.com/resource-manager/docs/creating-managing-projects
Install gcloud and run gcloud init or gcloud init –console on the local system used to set up your environment: https://cloud.google.com/sdk/docs/install
Enable the Google Compute Engine(GCE): https://cloud.google.com/endpoints/docs/openapi/enable-api
Enable the Google Kubernetes Engine(GKE) on your project: https://console.cloud.google.com/apis/enableflow?apiid=container.googleapis.com
Select a default Computer Engine region and zone: https://cloud.google.com/compute/docs/regions-zones.

GCP Cluster Recommendations

The following recommendations will assist in reducing the cost of a cloud based Kubernetes Wallaroo cluster.

Turn off the cluster when not in use. A GCP Google Kubernetes Engine (GKE) cluster can be turn off when not in use, then turned back on again when needed. If organizations adopt this process, be aware of the following issues:
- IP Address Reassignment: The load balancer public IP address may be reassigned when the cluster is restarted by the cloud service unless a static IP address is assigned. For more information in Google Cloud Platform see the Configuring domain names with static IP addresses user guide.
Assign to a Single Availability Zone: Clusters that span multiple availability zones may have issues accessing persistent volumes that were provisioned in another availability zone from the node when the cluster is restarted. The simple solution is to assign the entire cluster into a single availability zone. For more information in Google Cloud Platform see the Regions and zones guide.
The scripts and configuration files are set up to create the GCP environment for a Wallaroo instance are based on a single availability zone. Modify the script as required for your organization.

Standard Setup Variables

The following variables are used in the Quick Setup Script and the Manual Setup Guide. Modify them as best fits your organization.

Variable Name	Default Value	Description
WALLAROO_GCP_PROJECT	wallaroo-ce	The name of the Google Project used for the Wallaroo instance.
WALLAROO_CLUSTER	wallaroo-ce	The name of the Kubernetes cluster for the Wallaroo instance.
WALLAROO_GCP_REGION	us-central1	The region the Kubernetes environment is installed to. Update this to your GCP Computer Engine region.
WALLAROO_NODE_LOCATION	us-central1-f	The location the Kubernetes nodes are installed to. Update this to your GCP Compute Engine Zone.
WALLAROO_GCP_NETWORK_NAME	wallaroo-network	The Google network used with the Kubernetes environment.
WALLAROO_GCP_SUBNETWORK_NAME	wallaroo-subnet-1	The Google network subnet used with the Kubernetes environment.
WALLAROO_GCP_MACHINE_TYPE	e2-standard-8	Recommended VM size per GCP node.
WALLAROO_CLUSTER_SIZE	4	Number of nodes installed into the cluster. 4 nodes will create a 32 core cluster.

Quick Setup Script

A sample script is available here, and creates a Google Kubernetes Engine cluster ready for use with Wallaroo Community. This script requires the prerequisites listed above, and uses the variables as listed in Standard Setup Variables.

The following script is available for download: wallaroo_community_gcp_install.bash

The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

Download the script above.
In a terminal window set the script status as execute with the command chmod +x bash wallaroo_community_gcp_install.bash.
Modify the script variables listed above based on your requirements.
Run the script with either bash wallaroo_community_gcp_install.bash or ./wallaroo_community_gcp_install.bash from the same directory as the script.

Manual Setup Guide

The following steps are guidelines to assist new users in setting up their GCP environment for Wallaroo. Feel free to replace these with commands with ones that match your needs.

See the Google Cloud SDK for full details on commands and settings.

The commands below are set to meet the prerequisites listed above, and uses the variables as listed in Standard Setup Variables. Modify them as best fits your organization’s needs.

Set Variables

The following are the variables used in the environment setup process. Modify them as best fits your organization’s needs.

WALLAROO_GCP_PROJECT=wallaroo-ce
WALLAROO_CLUSTER=wallaroo-ce
WALLAROO_GCP_REGION=us-central1
WALLAROO_NODE_LOCATION=us-central1-f
WALLAROO_GCP_NETWORK_NAME=wallaroo-network
WALLAROO_GCP_SUBNETWORK_NAME=wallaroo-subnet-1
WALLAROO_GCP_MACHINE_TYPE=e2-standard-8
WALLAROO_CLUSTER_SIZE=4

Create a GCP Network

First create a GCP network that is used to connect to the cluster with the gcloud compute networks create command. For more information, see the gcloud compute networks create page.

gcloud compute networks \
create $WALLAROO_GCP_NETWORK_NAME \
--bgp-routing-mode regional \
--subnet-mode custom

Verify it’s creation by listing the GCP networks:

gcloud compute networks list

Create the GCP Wallaroo Cluster

Once the network is created, the gcloud container clusters create command is used to create a cluster. For more information see the gcloud container clusters create page.

Note that three nodes are created by default, so one more is added with the --num-nodes setting to meet the Wallaroo prerequisites. For Google GKE, containerd is enabled by default and so does not need to be specified during the setup procedure: (https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd)[https://cloud.google.com/kubernetes-engine/docs/concepts/using-containerd].

gcloud container clusters \
create $WALLAROO_CLUSTER \
--region $WALLAROO_GCP_REGION \
--node-locations $WALLAROO_NODE_LOCATION \
--machine-type $WALLAROO_GCP_MACHINE_TYPE \
--num-nodes $WALLAROO_CLUSTER_SIZE \
--network $WALLAROO_GCP_NETWORK_NAME \
--create-subnetwork name=$WALLAROO_GCP_SUBNETWORK_NAME \
--enable-ip-alias \
--cluster-version=1.23

The command can take several minutes to complete based on the size and complexity of the clusters. Verify the process is complete with the clusters list command:

gcloud container clusters list

Retrieving Kubernetes Credentials

Once the GCP cluster is complete, the Kubernetes credentials can be installed into the local administrative system with the the gcloud container clusters get-credentials (https://cloud.google.com/sdk/gcloud/reference/container/clusters/get-credentials) command:

gcloud container clusters \
get-credentials $WALLAROO_CLUSTER \
--region $WALLAROO_GCP_REGION

To verify the Kubernetes credentials for your cluster have been installed locally, use the kubectl get nodes command. This will display the nodes in the cluster as demonstrated below:

kubectl get nodes

NAME                                         STATUS   ROLES    AGE   VERSION
gke-wallaroo-ce-default-pool-863f02db-7xd4   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-ce-default-pool-863f02db-8j2d   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-ce-default-pool-863f02db-hn06   Ready    <none>   39m   v1.21.6-gke.1503
gke-wallaroo-ce-default-pool-3946eaca-4l3s   Ready    <none>   39m   v1.21.6-gke.1503

Troubleshooting
- What does the error ‘Insufficient project quota to satisfy request: resource “CPUS_ALL_REGIONS”’ mean?

Make sure that the Compute Engine Zone and Region are properly set based on your organization’s requirements. The instructions above default to us-central1, so change that zone to install your Wallaroo instance in the correct location.

In the case of the script, this would mean changing the region and location from:

WALLAROO_GCP_REGION=us-central1
WALLAROO_NODE_LOCATION=us-central1-f

WALLAROO_GCP_REGION={Your Region}
WALLAROO_NODE_LOCATION={Your Location}

Install Wallaroo Community

If the prerequisites are already configured, jump to Install Wallaroo to start installing.

This video demonstrates that procedure:

The procedure assumes at least a basic knowledge of Kubernetes and how to use the kubectl and kots version 1.91.3 applications.

The procedure involves the following major steps:

Prerequisites
Download License File: This is detailed above in the process Register Your Wallaroo Community Account above.
Install Wallaroo

Prerequisites

Local Software Requirements

Before starting, verify that all local system requirements are complete as detailed in the Wallaroo Community Local System Prerequisites guide:

kubectl: This interfaces with the Kubernetes server created in the Wallaroo environment.
For Kots based installs:
- kots Version 1.91.3

Cloud Kubernetes environment has been prepared.
You have downloaded your Wallaroo Community License file.

Install Wallaroo

The environment is ready, the tools are installed - let’s install Wallaroo! The following will use kubectl and kots through the following procedure:

Install the Wallaroo Community Edition using kots install wallaroo/ce, specifying the namespace to install. For example, if wallaroo is the namespace, then the command is:
```
kubectl kots install wallaroo/ce --namespace wallaroo
```

• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
    Enter a new password to be used for the Admin Console: •••••••••••••
  • Waiting for Admin Console to be ready ✓

• Press Ctrl+C to exit
• Go to http://localhost:8800 to access the Admin Console

Wallaroo Community edition will continue to run until terminated. To relaunch in the future, use the following command:

kubectl-kots admin-console --namespace wallaroo

Initial Configuration and License Upload Procedure

Once Wallaroo Community edition has been installed for the first time, we can perform initial configuration and load our Wallaroo Community license file through the following process:

If Wallaroo Community Edition has not started, launch it with the following command:

❯ kubectl-kots admin-console --namespace wallaroo
  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console

Enter the Wallaroo Community Admin Console address into a browser. You will be prompted for the default password as set in the step above. Enter it and select Log in.
Upload your license file.
The Configure Wallaroo Community page will be displayed which allows you to customize your Wallaroo environment. For now, scroll to the bottom and select Continue. These settings can be customized at a later date.
The Wallaroo Community Admin Console will run the preflight checks to verify that all of the minimum requirements are not met. This may take a few minutes. If there are any issues, Wallaroo can still be launched but may not function properly. When ready, select Continue.
The Wallaroo Community Dashboard will be displayed. There may be additional background processes that are completing their setup procedures, so there may be a few minute wait until those are complete. If everything is ready, then the Wallaroo Dashboard will show a green Ready.
Under the license information is the DNS entry for your Wallaroo instance. This is where you and other users of your Wallaroo instance can log in. In this example, the URL will be https://beautiful-horse-9537.wallaroo.community. Note that it may take a few minutes for the DNS entries to propagate and this URL to be available.
You will receive an email invitation for the email address connected to this URL with a temporary password and a link to this Wallaroo instance’s URL. Either enter the URL for your Wallaroo instance or use the link in the email.
To login to your new Wallaroo instance, enter the email address and temporary password associated with the license.

With that, Wallaroo Community edition is launched and ready for use! You can end the Admin Console from your terminal session above. From this point on you can just use the Wallaroo instance URL.

Now that your Wallaroo Community edition has been installed, let’s work with some sample models to show off what you can do. Check out either the Wallaroo 101 if this is your first time using Wallaroo, or for more examples of how to use Wallaroo see the Wallaroo Tutorials.

Troubleshooting

Issue

If you see an error similar to failed to deploy admin console: failed to wait for {some node}: timeout waiting for {some node}, it may be because the connection between your local system and the cloud service is slow, or related issues.

If this occurs, adding the option --wait-duration 5m give enough time for nodes to finish starting.

1.4 - Installation Configurations

Guides for different install options for Wallaroo

The following guides demonstrate how to install Wallaroo with different options to best fit your organizations needs, and are meant to supplement the standard install guides.

1.4.1 - Wallaroo Enterprise Azure integration Overview

An overview of the Wallaroo Enterprise for Azure Cloud

Wallaroo is proud to announce Wallaroo Enterprise for the Microsoft Azure Marketplace. This brings Wallaroo to even more organizations who want to use Wallaroo with their other Microsoft Azure services.

The following diagram displays the architecture for this service.

Users and application integrations connect to Jupyter Lab and the Wallaroo ML Ops APIs hosted in an AKS cluster.
Wallaroo cluster services are hosted in a Kubernetes namespace and manage deployments of ML models.
ML models are deployed in AKS to scale across as many VMs as needed to handle the load.
ML inference services are provided via a web API to allow integration with data storage systems or other services.

1.4.2 - Create GPU Nodepools for Kubernetes Clusters

How to create GPU nodepools for Kubernetes clusters.

Wallaroo provides support for ML models that use GPUs. The following templates demonstrate how to create a nodepool in different cloud providers, then assign that nodepool to an existing cluster. These steps can be used in conjunction with Wallaroo Enterprise Install Guides.

Note that deploying pipelines with GPU support is only available for Wallaroo Enterprise.

The following script creates a nodepool with NVidia Tesla K80 gpu using the Standard_NC6 machine type and autoscales from 0-3 nodes. Each node has one GPU in this example so the max .gpu() that can be requested by a pipeline step is 1.

For detailed steps on adding GPU to a cluster, see Microsoft Azure Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS) guide.

Note that the labels are required as part of the Wallaroo pipeline deployment with GPU support

RESOURCE_GROUP="YOUR RESOURCE GROUP"
CLUSTER_NAME="YOUR CLUSTER NAME"
GPU_NODEPOOL_NAME="YOUR GPU NODEPOOL NAME"

az extension add --name aks-preview

az extension update --name aks-preview

az feature register --namespace "Microsoft.ContainerService" --name "GPUDedicatedVHDPreview"

az provider register -n Microsoft.ContainerService

az aks nodepool add \                
    --resource-group $RESOURCE_GROUP \
    --cluster-name $CLUSTER_NAME \
    --name $GPU_NODEPOOL_NAME \
    --node-count 0 \
    --node-vm-size Standard_NC6 \
    --node-taints sku=gpu:NoSchedule \
    --aks-custom-headers UseGPUDedicatedVHD=true \
    --enable-cluster-autoscaler \
    --min-count 0 \
    --max-count 3 \
    --labels doc-gpu-label=true

The following script creates a nodepool uses NVidia T4 GPUs and autoscales from 0-3 nodes. Each node has one GPU in this example so the max .gpu() that can be requested by a pipeline step is 1.

Google GKE automatically adds the following taint to the created nodepool.

NO_SCHEDULE nvidia.com/gpu present

Note that the labels are required as part of the Wallaroo pipeline deployment with GPU support

GCP_PROJECT="YOUR GCP PROJECT"
GCP_CLUSTER="YOUR CLUSTER NAME"
GPU_NODEPOOL_NAME="YOUR GPU NODEPOOL NAME"
REGION="YOUR REGION"

gcloud beta container \
    --project $GCP_PROJECT \
    node-pools create $GPU_NODEPOOL_NAME \
    --cluster $GCP_CLUSTER \
    --region $REGION \
    --node-version "1.25.8-gke.500" \
    --machine-type "n1-standard-1" \
    --accelerator "type=nvidia-tesla-t4,count=1" \
    --image-type "COS_CONTAINERD" \
    --disk-type "pd-balanced" \
    --disk-size "100" \
    --node-labels doc-gpu-label=true \
    --metadata disable-legacy-endpoints=true \
    --scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
    --num-nodes "3" \
    --enable-autoscaling \
    --min-nodes "0" \
    --max-nodes "3" \
    --location-policy "BALANCED" \
    --enable-autoupgrade \
    --enable-autorepair \
    --max-surge-upgrade 1 \
    --max-unavailable-upgrade 0

The following steps are used to create a AWS EKS Nodepool with GPU nodes.

Prerequisites: An existing AWS (Amazon Web Service) EKS (Elastic Kubernetes Service) cluster. See Wallaroo Enterprise Comprehensive Install Guide: Environment Setup Guides for a sample creation of an AWS EKS cluster for hosting a Wallaroo Enterprise instance.
eksctl: Command line tool for installating and updating EKS clusters.
Administrator access to the EKS cluster and capabilty of running kubectl commands.

Create the nodepool with the following configuration file. Note that the labels are required as part of the Wallaroo pipeline deployment with GPU support. The sample configuration file below uses the AWS instance type g5.2xlarge. Modify as required.
```
eksctl create nodegroup --config-file=<path>
```
Sample config file:

# aws-gpu-nodepool.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: YOUR CLUSTER NAME HERE # This must match the name of the existing cluster
  region: YOUR REGION HERE

managedNodeGroups:
- name: YOUR NODEPOOL NAME HERE
  instanceType: g5.2xlarge
  minSize: 1
  maxSize: 3
  labels:
    wallaroo.ai/gpu: "true"
    doc-gpu-label: "true"
  taints:
    - key: wallaroo.ai/engine
      value: "true"
      effect: NoSchedule
  tags:
    k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine
    k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
  iam:
    withAddonPolicies:
      autoScaler: true
  containerRuntime: containerd
  amiFamily: AmazonLinux2
  availabilityZones:
    - INSERT YOUR ZONE HERE
  volumeSize: 100

References:
- eksctl Managing nodegroups

1.4.3 - Install Wallaroo with Minimum Services

How to install Wallaroo with disabled services for lower core environments

The following system requirements are required for the minimum settings for running Wallaroo in a Kubernetes cloud cluster.

Minimum number of nodes: 4
Minimum Number of CPU Cores: 8
Minimum RAM per node: 16 GB
Minimum Storage: A total of 625 GB of storage will be allocated for the entire cluster based on 5 users with up to four pipelines with five steps per pipeline, with 50 GB allocated per node, including 50 GB specifically for the Jupyter Hub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.

Wallaroo recommends at least 16 cores total to enable all services. At less than 16 cores, services will have to be disabled to allow basic functionality as detailed in this table.


Cluster Size		8 core	16 core	32 core	Description
Inference		✔	✔	✔	The Wallaroo inference engine that performs inference requests from deployed pipelines.
Dashboard		✔	✔	✔	The graphics user interface for configuring workspaces, deploying pipelines, tracking metrics, and other uses.
Jupyter HUB/Lab					The JupyterHub service for running Python scripts, JupyterNotebooks, and other related tasks within the Wallaroo instance.
	Single Lab	✔	✔	✔
	Multiple Labs	✘	✔	✔
Prometheus		✔	✔	✔	Used for collecting and reporting on metrics. Typical metrics are values such as CPU utilization and memory usage.
	Alerting	✘	✔	✔
	Model Validation	✘	✔	✔
	Dashboard Graphs	✔	✔	✔
Plateau		✘	✔	✔	A Wallaroo developed service for storing inference logs at high speed. This is not a long term service; organizations are encouraged to store logs in long term solutions if required.
	Model Insights	✘	✔	✔
Python API
	Model Conversion	✔	✔	✔	Converts models into a native runtime for use with the Wallaroo inference engine.

Wallaroo Installation with less than 16 Cores

To install Wallaroo with less than 16 cores and 8 cores or greater, the following services must be disabled:

Model Conversion
Model Insights
Plateau

The following configuration settings can be used at the installation procedure to disable these services.

Download

A sample file wallaroo-install-8-cores.yaml is available from the following link:

wallaroo-install-8-cores.yaml

apiVersion: kots.io/v1beta1
kind: ConfigValues
metadata:
  name: wallaroo
spec:
  values:
    dashboard_enabled:
      value: "1"
    enable_model_insights:
      value: "0"
    model_conversion_enabled:
      value: "1"
    plateau_enabled:
      value: "0"

The configuration file can be applied via the --config-values={CONFIG YAML FILE} option. For example:

kubectl kots install "wallaroo/ce" \
-n wallaroo \ 
--config-values=wallaroo-install-8-cores.yaml

1.4.4 - Install Wallaroo to Specific Nodes

How to install Wallaroo to specific nodes

Organizations that share their Kubernetes environment with other applications may want to install Wallaroo to specific nodes in their cluster. The following guide demonstrates how to install a Wallaroo instance into specific nodes in the Kubernetes cluster.

This example uses Wallaroo Community as the example. For other guides for installing a Wallaroo instance, see the Wallaroo Community Setup Guides and the Wallaroo Enterprise Setup Guides.

Users who are familiar with Kubernetes clusters can skip ahead directly to the Install Steps.

Description

When installed into a Kubernetes cluster, Wallaroo will use available nodes to maximize its performance. Some organizations may use specific nodepools or nodes for specific applications.

One option is to use Kubernetes metadata to assign Node labels to nodes, then specify that Wallaroo can be installed into specific nodes that match that label. This can be done with specifying configuration options during the install process using the kots option --config-values={CONFIG YAML FILE}. For more information, see the kots set config documentation.

Install Steps

In this example, an instance of Wallaroo Community will be installed into a Kubernetes cluster that has four nodes assigned with the label wallaroo.ai/node=true;

kubectl get nodes
NAME                                 STATUS   ROLES   AGE   VERSION
aks-mainpool-18670167-vmss000000     Ready    agent   84m   v1.23.8
aks-wallarooai-12293194-vmss000000   Ready    agent   75m   v1.23.8
aks-wallarooai-12293194-vmss000001   Ready    agent   75m   v1.23.8
aks-wallarooai-12293194-vmss000002   Ready    agent   75m   v1.23.8
aks-wallarooai-12293194-vmss000003   Ready    agent   75m   v1.23.8

kubectl get nodes -l wallaroo.ai/node=true
NAME                                 STATUS   ROLES   AGE   VERSION
aks-wallarooai-12293194-vmss000000   Ready    agent   75m   v1.23.8
aks-wallarooai-12293194-vmss000001   Ready    agent   75m   v1.23.8
aks-wallarooai-12293194-vmss000002   Ready    agent   75m   v1.23.8
aks-wallarooai-12293194-vmss000003   Ready    agent   75m   v1.23.8

Create a kots configuration file and specify the label to use for installing nodes. For this example, we will use wallaroo.ai/node: "true" as the label. Any nodes with that label will be used by Wallaroo during the installation. For this example, this configuration is saved to the file test-node.yaml.
```
apiVersion: kots.io/v1beta1
kind: ConfigValues
metadata:
    creationTimestamp: null
    name: wallaroo
spec:
    values:
        wallaroo_node_selector: 
            value: 'wallaroo.ai/node: "true"'
status: {}
```

During installation, specify the configuration file to be used with the --config-values option.

IMPORTANT NOTE

`kots` typically will not alert the user if the file passed in `--config-values` does not exist.  Verify the file name before proceeding.

kubectl kots install "wallaroo/ce" \
-n wallaroo \ 
--config-values=test-node.yaml

• Deploying Admin Console
• Creating namespace ✓
• Waiting for datastore to be ready ✓
  • Waiting for Admin Console to be ready ✓
• Press Ctrl+C to exit
• Go to http://localhost:8800 to access the Admin Console

Proceed with the installation as normal, including uploading the required license file, etc.

Once complete, verify Wallaroo was installed to specific nodes with the kubectl get pods command. The following shows the pods in the wallaroo namespace where the Wallaroo Community instance was installed, and the pods used for the deployed pipeline ccfraudpipeline.

kubectl get pods --all-namespaces -o=custom-columns=NAME:.metadata.name,Namespace:.metadata.namespace,Node:.spec.nodeName

NAME                                       Namespace           Node
engine-6469d85b5c-5pz75                    ccfraudpipeline-1   aks-wallarooai-12293194-vmss000003
engine-lb-db4f647fb-m9bkl                  ccfraudpipeline-1   aks-wallarooai-12293194-vmss000001
helm-runner-xz4vn                          ccfraudpipeline-1   aks-wallarooai-12293194-vmss000003
azure-ip-masq-agent-26cnd                  kube-system         aks-wallarooai-12293194-vmss000002
azure-ip-masq-agent-745hs                  kube-system         aks-wallarooai-12293194-vmss000000
azure-ip-masq-agent-f2nl2                  kube-system         aks-mainpool-18670167-vmss000000
azure-ip-masq-agent-hjxbr                  kube-system         aks-wallarooai-12293194-vmss000003
azure-ip-masq-agent-nktlq                  kube-system         aks-wallarooai-12293194-vmss000001
cloud-node-manager-6twk7                   kube-system         aks-mainpool-18670167-vmss000000
cloud-node-manager-g2bql                   kube-system         aks-wallarooai-12293194-vmss000003
cloud-node-manager-j4xdq                   kube-system         aks-wallarooai-12293194-vmss000001
cloud-node-manager-q6b2k                   kube-system         aks-wallarooai-12293194-vmss000000
cloud-node-manager-rsrsg                   kube-system         aks-wallarooai-12293194-vmss000002
coredns-autoscaler-7d56cd888-t28v5         kube-system         aks-mainpool-18670167-vmss000000
coredns-dc97c5f55-8v7lh                    kube-system         aks-mainpool-18670167-vmss000000
coredns-dc97c5f55-p2dc2                    kube-system         aks-mainpool-18670167-vmss000000
csi-azuredisk-node-5hlxc                   kube-system         aks-mainpool-18670167-vmss000000
csi-azuredisk-node-6bp8l                   kube-system         aks-wallarooai-12293194-vmss000003
csi-azuredisk-node-mthtd                   kube-system         aks-wallarooai-12293194-vmss000000
csi-azuredisk-node-p6w8w                   kube-system         aks-wallarooai-12293194-vmss000002
csi-azuredisk-node-sqznw                   kube-system         aks-wallarooai-12293194-vmss000001
csi-azurefile-node-7kw5p                   kube-system         aks-wallarooai-12293194-vmss000002
csi-azurefile-node-9zb6l                   kube-system         aks-wallarooai-12293194-vmss000001
csi-azurefile-node-grs6g                   kube-system         aks-wallarooai-12293194-vmss000000
csi-azurefile-node-z84nz                   kube-system         aks-mainpool-18670167-vmss000000
csi-azurefile-node-zzqdf                   kube-system         aks-wallarooai-12293194-vmss000003
konnectivity-agent-6c57d77bcd-5tvbh        kube-system         aks-mainpool-18670167-vmss000000
konnectivity-agent-6c57d77bcd-z5q48        kube-system         aks-mainpool-18670167-vmss000000
kube-proxy-4nz25                           kube-system         aks-wallarooai-12293194-vmss000000
kube-proxy-8fv76                           kube-system         aks-wallarooai-12293194-vmss000002
kube-proxy-c5nvs                           kube-system         aks-wallarooai-12293194-vmss000001
kube-proxy-lvlwc                           kube-system         aks-wallarooai-12293194-vmss000003
kube-proxy-vbvfr                           kube-system         aks-mainpool-18670167-vmss000000
metrics-server-64b66fbbc8-tvxpj            kube-system         aks-mainpool-18670167-vmss000000
api-lb-bbc98488d-24qxb                     wallaroo           aks-wallarooai-12293194-vmss000002
continuous-image-puller-8sfw9              wallaroo           aks-wallarooai-12293194-vmss000003
continuous-image-puller-bbt7c              wallaroo           aks-wallarooai-12293194-vmss000000
continuous-image-puller-ngr75              wallaroo           aks-wallarooai-12293194-vmss000002
continuous-image-puller-stxpq              wallaroo           aks-wallarooai-12293194-vmss000001
dashboard-677df986d9-8c5mz                 wallaroo           aks-wallarooai-12293194-vmss000000
deploymentmanager-69b4c6d449-j8jct         wallaroo           aks-wallarooai-12293194-vmss000002
graphql-api-9c664ddf-t7cnr                 wallaroo           aks-wallarooai-12293194-vmss000000
hub-668d49b7b4-jspqj                       wallaroo           aks-wallarooai-12293194-vmss000001
jupyter-john-2ehansarick-40wallaroo-2eai   wallaroo           aks-wallarooai-12293194-vmss000002
keycloak-85cf99c7bf-8vvb5                  wallaroo           aks-wallarooai-12293194-vmss000002
kotsadm-cbf8d8ccb-qgx2w                    wallaroo           aks-wallarooai-12293194-vmss000002
kotsadm-minio-0                            wallaroo           aks-wallarooai-12293194-vmss000002
kotsadm-postgres-0                         wallaroo           aks-wallarooai-12293194-vmss000000
minio-68bc498d6d-xm5ht                     wallaroo           aks-wallarooai-12293194-vmss000000
model-insights-7dcccb976-ttz64             wallaroo           aks-wallarooai-12293194-vmss000000
plateau-5b777686dd-8c69s                   wallaroo           aks-wallarooai-12293194-vmss000000
postgres-6c5fff5c57-9hr4n                  wallaroo           aks-wallarooai-12293194-vmss000002
prometheus-deployment-7dcb484c56-7jwq4     wallaroo           aks-wallarooai-12293194-vmss000000
proxy-755778dccd-mwhwz                     wallaroo           aks-wallarooai-12293194-vmss000002
python-api-787bcb7764-nvdb4                wallaroo           aks-wallarooai-12293194-vmss000002
rest-api-677dc6bdcf-q9b62                  wallaroo           aks-wallarooai-12293194-vmss000002
wallaroo-fluent-bit-standard-h7mrl         wallaroo           aks-wallarooai-12293194-vmss000002
wallaroo-fluent-bit-standard-jss2d         wallaroo           aks-wallarooai-12293194-vmss000000
wallaroo-fluent-bit-standard-l75cj         wallaroo           aks-wallarooai-12293194-vmss000001
wallaroo-fluent-bit-standard-m55tk         wallaroo           aks-wallarooai-12293194-vmss000003
wallaroo-telemetry-27687782-g6mhm          wallaroo           aks-wallarooai-12293194-vmss000001
wallaroo-telemetry-27687783-xgqpm          wallaroo           aks-wallarooai-12293194-vmss000001
wallaroo-telemetry-27687784-9b85g          wallaroo           aks-wallarooai-12293194-vmss000001

For other instructions on how to deploy or configure a Wallaroo instance, see the Wallaroo Operations Guides.

1.4.5 - Taints and Tolerations Guide

Configure custom taints and toleration for a cluster for Wallaroo

Organizations can customize the taints and tolerances for their Kubernetes cluster running Wallaroo. Nodes in a Kubernetes cluster can have a taint applied to them. Any pod that does not have a toleration matching the taint can be rejected and will not be applied to that node.

This allows organizations to determine which pods can be accepted or rejected into specific nodes, reserving their Kubernetes resources for other services. Combined with the Install Wallaroo to Specific Nodes guide this ensures that Wallaroo pods are contained to specific cluster nodes, and prevents non-Wallaroo pods from being scheduled into the same nodes to reserve those resources for the Wallaroo instance.

In this example, the node Postgres has the taint wallaroo.ai/postgres=true:NoSchedule. The pod postgres has the tolerance wallaroo.ai/postgres:NoSchedule op=Exists, so it is scheduled into the node Postgres. The pod nginx has no tolerations, so it is not scheduled into the node Postgres.

Node: Postgres Taints:wallaroo.ai/postgres=true:NoSchedule	Scheduled
Postgres Tolerations: wallaroo.ai/postgres:NoSchedule op=Exists	√
nginx Tolerations: None	🚫

See the Kubernetes Taints and Tolerations documentation for more information.

Setting Tolerations and Taints

The Wallaroo Enterprise Install Guides specify default taints applied to nodepools. These can be used to contain pod scheduling only to specific nodes where the pod tolerations match the nodes taints. By default, the following nodepools and their associated taints are created

After Wallaroo release September 2022 (Codename Cobra):

Nodepool	Taints
postgres	`wallaroo.ai/postgres=true:NoSchedule`
enginelb	`wallaroo.ai/enginelb=true:NoSchedule`
engine	`wallaroo.ai/engine=true:NoSchedule`
mainpool	N/A

Before Wallaroo release September 2022 (Code name Mustang and before)

Nodepool	Taints
postgres	`wallaroo-postgres=true:NoSchedule`
enginelb	`wallaroo-enginelb=true:NoSchedule`
engine	`wallaroo-engine=true:NoSchedule`
mainpool	N/A

The nodepool mainpool is not assigned any taints to allow other Kubernetes services to run as part of the cluster.

The taint wallaroo.ai/reserved=true:NoSchedule can be applied to other nodepools. This allows additional Wallaroo resources to be scheduled in those nodes while rejecting other pods that do not have a matching toleration.

Default Tolerations

By default, the following tolerations are applied for Wallaroo pods. Organizations can add a corresponding Any pod that does not contain a taint to match these tolerances will have the condition effect:NoSchedule for the specified node.

Toleration key for all Wallaroo pods
- wallaroo.ai/reserved
Engine toleration key
- wallaroo.ai/engine
Engine LB toleration key
- wallaroo.ai/enginelb
Postgres toleration key
- wallaroo.ai/postgres

Note that these taint values are applied to the nodepools as part of the Wallaroo Enterprise Setup guides. They are not typically set up or required for Wallaroo Community instances.

Custom Tolerations

To customize the tolerations applied to Wallaroo nodes, the following prerequisites must be met:

Access to the Kubernetes environment running the Wallaroo instances.
Have kubectl and kots installed and connected to the Kubernetes environment.

For full details on installing Wallaroo and the prerequisite software, see the Wallaroo Prerequisites Guide.

Access the Wallaroo Administrative Dashboard.
1. From a terminal with kubectl and kots installed and connected to the Kubernetes environment, run:
```
kubectl kots admin-console --namespace wallaroo
```
  This will provide access to the Wallaroo Administrative Dashboard through http://localhost:8800:
```
  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console
```
2. Launch a browser and connect to http://localhost:8800.
3. Enter the password created during the Wallaroo Install process. The Wallaroo Administrative Dashboard will now be available.
From the Wallaroo Administrative Dashboard, select Config -> Taints and Tolerations.
Set the custom tolerations as required by your organization. The following nodes and tolerations can be changed:

Toleration key for all Wallaroo pods
- Default value: wallaroo.ai/reserved
Engine toleration key
- Default value: wallaroo.ai/engine
Engine LB toleration key
- Default value: wallaroo.ai/enginelb
Postgres toleration key
Default value: wallaroo.ai/postgres

1.5 - Installation Troubleshooting Guide

Troubleshooting

I’m Getting a Timeout Error

Depending on the connection and resources, the installation process may time out. If that occurs, use the --wait-duration flag to provide additional time. The time must be provided in Go duration format (for example: 60s, 1m, etc). The following example extends the wait duration to 10 minutes:

kubectl kots install wallaroo/ea -n wallaroo --license-file myfile.yaml --shared-password wallaroo --wait-duration 600s

Preflight Checks are Failing at the Command Line

If your system does not meet all of the preflight requirements, the installation process may fail when performing an automated installation. It is highly recommended to install Wallaroo on a system that meets all requirements or else performance will be degraded.

Before continuing, use the following command and note down any and all pre-flight checks that are listed as a failure. The license will be installed in later steps through the browser.

install wallaroo/ea -n wallaroo

To ignore preflight checks, use the --skip-preflights flag, as in the following example (Note: This is not recommended, only provided as an example.):

kubectl kots install wallaroo/ea -n wallaroo --license-file myfile.yaml --shared-password wallaroo --skip-preflights

When Launching JupyterHb, I get a Server 500 error

If you shut down and restart a Wallaroo instance in a new environment or change the IP address, some settings may not be updated. Run the following command to restart the deployment process and update the settings to match the current environment. Note that the namespace wallaroo is used - modify this to match the environment where Wallaroo is installed.

kubectl rollout restart deployment hub -n wallaroo

How do I Send Logs and Configurations to Wallaroo?

See the Wallaroo Support Bundle Generation Guide for instructions on how to create a support bundle used to troubleshoot installation and configuration issues.

1.6 - How to Uninstall Wallaroo

How to Uninstall Wallaroo

If the install procedure for Wallaroo goes awry, one option is to uninstall the incomplete Wallaroo installation and start again. The following procedure will remove Wallaroo from a Kubernetes cluster.

WARNING

This procedure will delete all Wallaroo data from the Kubernetes environment. Make sure that all data is backed up before proceeding with the uninstall process.

Remove all Kubernetes namespaces that correlate to a Wallaroo pipeline with the kubectl delete namespaces {list of namespaces}command except the following : default, kube* (any namespaces with kube before it), and wallaroo. wallaroo will be removed in the next step.

For example, in the following environment model1 and model2 would be deleted with the following:

  -> kubectl get namespaces
    NAME			    STATUS        AGE
    default		        Active        7d4h
    kube-node-lease	    Active		    7d4h
    kube-public		    Active		    7d4h
    model1               Active         4h23m
    model2               Active         4h23m
    wallaroo             Active         3d6h

    kubectl delete namespaces model1 model2

Use the following bash script or run the commands individually. Warning: If the selector is incorrect or missing from the kubectl command, the cluster could be damaged beyond repair.

#!/bin/bash
kubectl delete ns wallaroo && kubectl delete all,secret,configmap,clusterroles,clusterrolebindings,storageclass,crd --selector app.kubernetes.io/part-of=wallaroo --selector kots.io/app-slug=wallaroo

Once complete, the kubectl get namespaces will return only the default namespaces:

❯ kubectl get namespaces
NAME              STATUS   AGE
default           Active   3h47m
kube-node-lease   Active   3h47m
kube-public       Active   3h47m
kube-system       Active   3h47m

Wallaroo can now be reinstalled into this environment.

2 - Wallaroo General Guide

How to perform the most common tasks in Wallaroo

The Wallaroo General Guide details how to perform the most common tasks in Wallaroo.

Install Wallaroo

For devops or system administrators who will be installing Wallaroo, see the following guides:

Wallaroo Install Guides: How to prepare your environment for Wallaroo and install Wallaroo into your Kubernetes cloud environment, along with tips on other local software requirements.

Before logging into to Wallaroo for the first time, you’ll have received an email inviting you to your new Wallaroo instance and a temporary email to use.

To login to Wallaroo Community for the first time:

The Wallaroo Invitation email will contain:
1. Your temporary password.
2. A link to your Wallaroo instance.
Select the link to your Wallaroo instance.
Select Sign In.
Login using your email address and the temporary password from the Invitation email.
Once you have successfully authenticated, you will be prompted to create your own password.
When finished, you will be able to login to your Wallaroo instance in the future using your registered email address and new password.

Wallaroo Enterprise users are created either as local users, or through other authentication services such as GitHub or Google Cloud Platform. For more information, see the Wallaroo Enterprise User Management Guide. Local users will be provided a temporary password by their devops team.

To log into Wallaroo Enterprise for the first time as a local user:

Access the Wallaroo Dashboard provided by your devops team.
Select Sign In.
Login using your email address and the temporary password provided by your devops team.
Once you have successfully authenticated, you will be prompted to create your own password.
When finished, you will be able to login to your Wallaroo instance in the future using your registered email address and new password.

To login to your Wallaroo instance with an existing username and password:

From your browser, enter the URL for your Wallaroo instance. This is provided either during the installation process or when you are invited to a Wallaroo environment.
Select Sign In.
Enter your username and password.

Exploring the Wallaroo Dashboard

Once you have logged into your Wallaroo instance, you will be presented with the Wallaroo dashboard. From here, you can perform the following actions:

A Change Current Workspace and Workspace Management: Select the workspace to work in. Each workspace has it’s own Models and Pipelines. For more information, see Workspace Management.
B Pipeline Management: View information on this workspace’s pipelines. For more information, see Pipeline Management.
C Model Management: View information on this workspace’s models. For more information, see Model Management.
D User Management: Add users to your instance, or to your current workspace. For more information, see either User Management or Workspace Management.
E Access Jupyter Hub: Access the Jupyter Hub to run Jupyter Notebooks and shell access to your Wallaroo Instance. For more information, see either the Quick Start Guides for sample Jupyter Notebooks, data and models to learn how to use Wallaroo, or the Wallaroo Developer Guides for developers.
F View Collaborators: Displays a list of users who have been granted to this workspace.

How to Connect to Jupyter Hub

Jupyter Hub has been integrated as a service for Wallaroo. To access your Wallaroo instance’s Jupyter Hub service:

Login to your Wallaroo instance.
From the right navigation panel, select View Jupyter Hub.

A new tab will connect you to your Jupyter Hub service.

How to Log Out of Wallaroo

To log out of your Wallaroo instance:

Select the Wallaroo icon in the upper right hand corner.
Select Logout.

3 - Wallaroo User Management

How to manage new and existing users in your Wallaroo environment.

The following shows Wallaroo users in adding other participants to their Wallaroo environment. Some user management tasks are allocated to the Workspace Management, such as adding or removing users from a specific workspace.

This guide is split into two segments: Wallaroo Community Edition and Wallaroo Enterprise Edition. The critical differences in user management between the Wallaroo Community and Wallaroo Enterprise edition are:

Wallaroo Community allows up to 5 users to be active in a single Wallaroo instance, while Enterprise has no such restrictions.
Wallaroo Community users are administrated locally while Wallaroo Enterprise allows for other administrative services including GitHub, Google Cloud Platform, and other services.

3.1 - Wallaroo Community User Management

How to manage new and existing users in your Wallaroo Community environment.

Wallaroo Community User Management

How to Invite a User to a Wallaroo Instance

Note: Up to two users can work together in the same Wallaroo Community instance, while the Wallaroo Enterprise version has no user restrictions.

To invite another user to your Wallaroo instance:

Login to your Wallaroo instance.
Select Invite Users from the upper right hand corner of the Wallaroo Dashboard.
Under the Invite Users module, enter the email address for each user to invite.
When finished, select Send Invitations.

Each user will be sent a link to login to your Wallaroo instance. See the General Guide for more information on the initial login process.

3.2 - Wallaroo Enterprise User Management

How to manage new and existing users in your Wallaroo Enterprise environment.

Wallaroo Enterprise User Management

Wallaroo uses Keycloak for user authentication, authorization, and management. Enterprise customers can manage their users in Keycloak through its web-based UI, or programmatically through Keycloak’s REST API.

In enterprise deployments customers store their Wallaroo user accounts either directly in Keycloak or utilize its User Federation feature. Integration with external/public Identity Providers (such as popular social networks) is not expected at this time.

See the Keycloak User Guide for more details: https://www.keycloak.org/documentation.html

The Keycloak instance deployed in the wallaroo Kubernetes namespace comes pre-configured with a single administrator user in the Master realm. All users must be managed within that realm.

Accessing The Wallaroo Keycloak Dashboard

Enterprise customers may access their Wallaroo Keycloak dashboard by navigating to https://<prefix>.keycloak.<suffix>, depending on their choice domain prefix and suffix supplied during installation.

Obtaining Administrator Credentials

The standard Wallaroo installation creates the user admin by default and assigns them a randomly generated password. The admin user credentials are obtained which may be obtained directly from Kubernetes with the following commands, assuming the Wallaroo instance namespace is wallaroo.

Username


    kubectl -n wallaroo \
    get secret keycloak-admin-secret \
    -o go-template='{{.data.KEYCLOAK_ADMIN_USER | base64decode }}{{"\n"}}'

Password


    kubectl -n wallaroo \
    get secret keycloak-admin-secret \
    -o go-template='{{.data.KEYCLOAK_ADMIN_PASSWORD | base64decode }}{{"\n"}}'

Accessing the User Management Panel

In the Keycloak Administration Console, click Manage -> Users in the left-hand side menu. Click the View all users button to see existing users.

Adding Users

To add a user through the Keycloak interface:

Click the Add user button in the top-right corner.
Enter the following:
1. A unique username and email address.
2. Ensure that the Email Verified checkbox is checked - Wallaroo does not perform email verification.
3. Under Required User Actions, set Update Password so the user will update their password the next time they log in.
Click Save.
Once saved, select Credentials tab, then the Set Password section, enter the new user’s desired initial password in the Password and Password Confirmation fields.
Click Set Password. Confirm the action when prompted. This will force the user to set their own password when they log in to Wallaroo.

Managing Users Programmatically

It is possible to manage users through Keycloak’s Admin REST API. See https://www.keycloak.org/documentation.html for details.

Wallaroo simplifies this task with a small Python script, which can be utilized in a Jupyter notebook running in the wallaroo namespace through the following process:

Create a new Python file: In your JupyterHub workspace, create a new Python file named keycloak.py and populate it with the following:

Import the following libraries:

import json
import requests

class Keycloak:
    def __init__(self, host, port, admin_username, admin_password):
        self.host = host
        self.port = port
        self.admin_username = admin_username
        self.admin_password = admin_password

    def get_token(self):
        """Using a hardcoded admin password, obtain a session token from keycloak"""
        url = f"http://{self.host}:{self.port}/auth/realms/master/protocol/openid-connect/token"
        headers = {
            "Content-Type": "application/x-www-form-urlencoded",
            "Accept": "application/json",
        }
        data = {
            "username": self.admin_username,
            "password": self.admin_password,
            "grant_type": "password",
            "client_id": "admin-cli",
        }
        resp = requests.post(url, headers=headers, data=data)
        assert resp.status_code == 200
        token = resp.json()["access_token"]
        assert len(token) > 800
        self.token = token

    def list_users(self):
        url = f"http://{self.host}:{self.port}/auth/admin/realms/master/users"
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"bearer {self.token}",
        }
        data={}
        resp = requests.get(url, headers=headers, data=data)
        return resp

    def create_user(self, username, password, email):
        """Create a keycloak test user. Returns ID."""
        url = f"http://{self.host}:{self.port}/auth/admin/realms/master/users"
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"bearer {self.token}",
        }
        payload = {
            "username": username,
            "enabled": "true",
            "emailVerified": "true",
            "email": email,
            "credentials": [
                {
                    "type": "password",
                    "value": password,
                    "temporary": "false",
                }
            ],
        }
        resp = requests.post(url, headers=headers, data=json.dumps(payload))
        assert resp.status_code == 201
        return resp.headers["Location"].split("/")[-1]

    def delete_user(self, userid):
        """Remove a keycloak user"""
        url = f"http://{self.host}:{self.port}/auth/admin/realms/master/users/{userid}"
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"bearer {self.token}",
        }
        resp = requests.delete(url, headers=headers)

Create a Keycloak admin client: In JupyterHub environment, create a new Jupyter notebook in the same directory as your keycloak.py file.
Import the new Python module and instantiate your Keycloak client, supplying your administrator user credentials (3rd and 4th arguments).
For more information on retrieving your KeyClock username and password, see Obtaining Administrator Credentials.
```
from keycloak import Keycloak
kc = Keycloak('keycloak', 8080, 'admin', 'admin')
```
Obtain an authentication token: Before invoking any methods, you must obtain a fresh authentication token by calling get_token() method. This will obtain a new token, which is valid for 60 seconds, and cache it in the client.
```
kc.get_token()
```

Listing existing users

To list existing users, use the Keycloak list_users method:

resp = kc.list_users()
resp.json()

Creating new users

To create a new user, use the Keycloak create_user() and supply their unique username, password, as well as a unique email address:

kc.create_user('testuser1', 'abc123', 'testuser1@example.com')

If successful, the return value will be the new user’s unique identifier generated by Keycloak.

3.3 - Wallaroo Enterprise User Management Troubleshooting

How to manage correct common user issues.

When a new user logs in for the first time, they get an error when uploading a model or issues when they attempt to log in. How do I correct that?

When a new registered user attempts to upload a model, they may see the following error:

TransportQueryError: 
{'extensions': 
    {'path': 
        '$.selectionSet.insert_workspace_one.args.object[0]', 'code': 'not-supported'
    }, 
    'message': 
        'cannot proceed to insert array relations since insert to table "workspace" affects zero rows'

Or if they log into the Wallaroo Dashboard, they may see a Page not found error.

This is caused when a user has been registered without an appropriate email address. See the user guides here on inviting a user, or the Wallaroo Enterprise User Management on how to log into the Keycloak service and update users. Verify that the username and email address are both the same, and they are valid confirmed email addresses for the user.

3.4 - Wallaroo Authentication Configuration Guides

Enable SSO authentication to Wallaroo.

Wallaroo supports Single Sign-On (SSO) authentication through multiple providers. The following guides demonstrate how to enable SSO for different services.

3.4.1 - Wallaroo SSO for Amazon Web Services

Enable SSO authentication to Wallaroo from AWS

Organizations can use Amazon Web Services (AWS) as an identity provider for single sign-on (SSO) logins for users with Wallaroo Enterprise.

IMPORTANT NOTE

These instructions are for Wallaroo Enterprise edition only.

To enable AWS as an authentication provider to a Wallaroo instance:

Create the Wallaroo AWS SAML Identity Provider
Create the AWS Credentials
Add the AWS Credentials to Wallaroo
Verify the Login

Prerequisites

Administrative access to the Wallaroo instance
Permissions in an AWS account to the IAM Identity Center. For more information, see AWS IAM Identity Center Prerequisites and considerations for specific environments

Create the Wallaroo AWS SAML Identity Provider

Using AWS as a single sign-on identity provider within Wallaroo requires access to the Wallaroo instance’s Keycloak service. This process will require both the IAM Identity Center and Wallaroo Keycloak service be available at the same time to copy information between the two. When starting this process, do not close the Wallaroo Keycloak browser window or the AWS IAM Identity Center without completing all of the steps until Verify the Login.

1. From the Wallaroo instance, login to the Keycloak service. This will commonly be $PREFIX.keycloak.$SUFFIX. For example, playful-wombat-5555.keycloak.wallaroo.example.
Select Administration Console.
From the left navigation panel, select Identity Providers.
Select Add provider and select SAML v2.0.
Enter the following:
1. Alias ID: This will be the internal ID of the identity provider. It also sets the Redirect URI used in later steps.
2. Display Name: The name displayed for users to use in authenticating.
Save the following information:
1. Redirect URI: This is determined by the Wallaroo DNS Prefix, Wallaroo DNS Suffix, and the Alias ID in the format $PREFIX.keycloak.$SUFFIX/auth/realms/master/broker/$ALIASID/endpoint. For example, playful-wombat-5555.keycloak.wallaroo.example/auth/realms/master/broker/aws/endpoint.
2. Service Provider Entry ID: This is in the format $PREFIX.keycloak.$SUFFIX/auth/realms/master. For example: playful-wombat-5555.keycloak.wallaroo.example/auth/realms/master.

Create the AWS Credentials

The next step is creating the AWS credentials, and requires access to the organization’s Amazon IAM Identity Center.

From the AWS console, select the IAM Identity Center.
From the IAM Identity Center Dashboard, select Applications then Add application.
Select Custom application->Add custom SAML 2.0 application, then select Next.
Enter the following:
1. Display name: AWS or something similar depending on your organization’s requirements.
2. Application metadata:
  1. Application ACS URL: Enter the Redirect URI from [Create the Wallaroo AWS SAML Identity Provider].(#create-the-wallaroo-aws-saml-identity-provider).
  2. Application SAML audience: Enter the Service Provider Entry ID from [Create the Wallaroo AWS SAML Identity Provider].
Select the IAM Identity Center SAML metadata file and copy the URL. Store this for the step [Add AWS Credentials to Wallaroo](#add-aws-credentials-to-wallaroo(#add-aws-credentials-to-wallaroo).
Select Submit.
From the new application, select Actions->Edit attribute mappings.
Enter the following:
1. Subject (default entry): Set to ${user:email}, with the Format emailAddress.
2. Select Add new attribute mapping and set it to email, mapped to ${user:email}, with the Format emailAddress.
Select Save Changes to complete mapping the attributes.
From the IAM Identity Center Dashboard, select Users. From here, add or select the users or groups that will have access to the Wallaroo instance then select Assign Users.

Add AWS Credentials to Wallaroo

Return to the Wallaroo Keycloak service and the new Identity Provider from Create the Wallaroo AWS SAML Identity Provider.

In Import External IDP Config->Import from URL, enter the IAM Identity Center SAML metadata file saved from Create the AWS Credentials in the field Service Provider Entity ID.
Select Import.
Once the AWS SAMl settings are imported, select Save to store the identity provider.

Once complete, log out of the Wallaroo instance and go back into the login screen. With the usual username and password screen should also be a AWS link at the bottom or whatever name was set for the identity provider. Select that link to login.

Login to the IAM Application created in Create the AWS Credentials. The first time a user logs in they will be required to add their first and last name. After this, logins will happen as long as the user is logged into the AWS IAM application without submitting any further information.

3.4.2 - Wallaroo SSO for Microsoft Azure

Enable SSO authentication to Wallaroo from Microsoft Azure

Organizations can use Microsoft Azure as an identity provider for single sign-on (SSO) logins for users with Wallaroo Enterprise.

IMPORTANT NOTE

These instructions are for Wallaroo Enterprise edition only.

To enable Microsoft Azure as an authentication provider to a Wallaroo Enterprise instance:

Create the Azure Credentials
Add Azure Credentials to Wallaroo
Verify the Login

Create the Azure Credentials

The first step is to create the Azure credentials in Microsoft Azure.

By the end, the following information must be saved for use in the step Add Azure Credentials to Wallaroo:

Application (client) ID
Client secret Value
OpenID Connect metadata document

Create the New App

Login into the Microsoft Azure account with an account with permissions to create application registrations.
Select App registrations from the Azure Services menu, or search for App Registrations from the search bar.
From the App registrations screen, select either an existing application, or select + New registration. This example will show creating a new registration.
From the Register an application screen, set the following:
1. Name: The name of the application.
2. Supported account types: To restrict only to accounts in the organization directory, select Accounts in this organizational directory only.
3. Redirect URI: Set the type to Web and the URI. The URI will be based on the Wallaroo instance and the name of the Keycloak Identity Provider set in the step Add Azure Credentials to Wallaroo. This will be a link back to the Keycloak endpoint URL in your Wallaroo instance in the format https://$PREFIX.keycloak.$SUFFIX/auth/realms/master/broker/$IDENTITYNAME/endpoint.
  For example, if the Wallaroo prefix is silky-lions-3657, the name of the Wallaroo Keycloak Identity Provider is azure, and the suffix is wallaroo.ai, then the Keycloak endpoint URL would be silky-lions-3657.keycloak.wallaroo.ai/auth/realms/master/broker/azure/endpoint. For more information see the DNS Integration Guide.
  Once complete, select Register.

Store the Application ID

From the Overview screen, store the following in a secure location:
1. Application (client) ID: This will be used in the Add Azure Credentials to Wallaroo step.
From the Overview screen, select Redirect URIs. Set the following:
1. Verify the Redirect URI matches the Wallaroo instance endpoint.
2. Under Implicit grant and hybrid flows, set the following:
  1. Access tokens: Enabled
  2. ID tokens: Enabled
From the Overview screen, from the left sidebar select API permissions. Select +Add a permission.
1. Select Microsoft Graph, then Delegated Permissions.
2. Set email, openid, profile to Enabled then select Add permissions.

Create Client Secret

From the Overview screen, select Add a certificate or secret.
Select Client secrets, then +New client secret.
1. Set the following, then select Add.
  1. Description: Set the description of the client secret.
  2. Expires: Set the expiration for the client secret. Defaults to 6 months from creation.
2. Store the following in a secure location:
  1. Client secret Value: This will be used in the Add Azure Credentials to Wallaroo step.

Store Metadata Document

From the left navigation panel, select Overview, then Endpoints.
1. Store the following in a secure location:
  1. OpenID Connect metadata document: This will be used in the Add Azure Credentials to Wallaroo step.

Add Azure Credentials to Wallaroo

With the Azure credentials saved from the Create the Azure Credentials step, they can now be added into the Wallaroo Keycloak service.

Login to the Wallaroo Keycloak service with a Wallaroo admin account from the URL in the format https://$PREFIX.keycloak.$SUFFIX.
For example, if the Wallaroo prefix is silky-lions-3657, the name of the Wallaroo Keycloak Identity Provider is azure, and the suffix is wallaroo.ai, then the Keycloak endpoint URL would be silky-lions-3657.keycloak.wallaroo.ai. For more information see the DNS Integration Guide.
Select Administration Console, then from the left navigation panel select Identity Providers.
From the right Add provider… drop down menu select OpenID Connect v1.0.
From the Add identity provider screen, add the following:
1. alias: The name of the the Identity Provider. IMPORTANT NOTE: This will determine the Redirect URI value that is used in the Create the Azure Credentials step. Verify that the Redirect URI in both steps are the same.
2. Display Name: The name that will be shown on the Wallaroo instance login screen.
3. Client Authentication: Set to Client secret sent as post.
4. Client Authentication: Set with the Application (client) ID created in the Create the Azure Credentials step.
5. Client Secret: Set with the Client secret Value created in the Create the Azure Credentials step.
6. Default Scopes: Set to openid email profile - one space between each word.
7. Scroll to the bottom of the page and in Import from URL, add the OpenID Connect metadata document created in the Create the Azure Credentials step. Select Import to set the Identity Provider settings.
Once complete, select Save to store the identity provider settings.

Once the Azure Identity Provider settings are complete, log out of the Keycloak service.

After completing Add Azure Credentials to Wallaroo, the login can be verified through the following steps. This process will need to be completed the first time a user logs into the Wallaroo instance after the Azure Identity Provider settings are added.

Go to the Wallaroo instance login page. The Azure Identity Provider will be displayed under the username and password request based on the Displey Name set in the Add Azure Credentials to Wallaroo step.
Select the Azure Identity Provider to login.
For the first login, grant permission to the application. You may be required to select which Microsoft Azure account is being used to authenticate.

Once complete, the new user will be added to the Wallaroo instance.

3.4.3 - Wallaroo SSO for Google Cloud Platform

Enable SSO authentication to Wallaroo from Google Cloud Platform (GCP)

Organizations can use Google Cloud Platform (GCP) as an identity provider for single sign-on (SSO) logins for users with Wallaroo Enterprise.

IMPORTANT NOTE

These instructions are for Wallaroo Enterprise edition only.

To enable Google Cloud Platform (GCP) as an authentication provider to a Wallaroo instance:

Create the GCP Credentials
Add GCP Credentials to Wallaroo
Verify the Login

Create GCP Credentials

To create the GCP credentials a Wallaroo instance uses to authenticate users:

Log into Google Cloud Platform (GCP) console.
From the left side menu, select APIs and Services -> Credentials.
Select + CREATE CREDENTIALS->Oauth client ID.
Set Application type to Web application.
Set the following options:
1. Name: The name for this OAuth Client ID.
2. Authorized redirect URIs: This will be a link back to the Keycloak endpoint URL in your Wallaroo instance in the format https://$PREFIX.keycloak.$SUFFIX/auth/realms/master/broker/google/endpoint.
  For example, if the Wallaroo prefix is silky-lions-3657 and the suffix is wallaroo.ai, then the Keycloak endpoint URL would be silky-lions-3657.keycloak.wallaroo.ai/auth/realms/master/broker/google/endpoint. For more information see the DNS Integration Guide.
When the Oauth client is created, the Client ID and the Client Secret will be displayed. Store these for the next steps.

Add GCP Credentials to Wallaroo

With the Client ID and Client Secret from Google, we can now add this to the Wallaroo instance Keycloak service.

IMPORTANT NOTE

Leaving the Hosted Domain value unset will allow any valid Google user to access the system. Set the Hosted Domain to restrict access to the desired Google domain such as wallaroo.ai. This must be a domain that is managed by Google. For more information, see the Keycloak Social Identity Providers documentation.

From the Wallaroo instance, login to the Keycloak service. This will commonly be $PREFIX.keycloak.$SUFFIX. For example, playful-wombat-5555.keycloak.wallaroo.ai.
Select Administration Console.
From the left navigation panel, select Identity Providers.
Select Add provider and select Google.
Enter the following:
1. Redirect URI: Verify this is the same endpoint defined in Create GCP Credentials.
2. Client ID: Use the Client id from Get GCP Credentials.
3. Client Secret: Use the Client secret from Get GCP Credentials.
4. Hosted Domain: The domain that the user’s will be logging in from. For example: wallaroo.ai.
5. Enabled: On
6. For the other settings, see the Keycloak Social Identity Providers documentation.

Once complete, log out of the Wallaroo instance and go back into the login screen. With the usual username and password screen should also be a google link at the bottom or whatever name was set for the identity provider.

Select it, then select which Google user account to use. As long the domain matches the one listed in Add Google Credentials to Keycloak, the login will succeed. The first time a user logs in through Google, Keycloak will create a new local user account based on the Google credentials.

Troubleshooting

I get the error “This app’s request is invalid”

Double check the Google credentials from Get GCP Credentials and verify that the Authorized redirect URIs matches the one in Keycloak. This can be verified from logging into Keycloak, selecting Identity Providers, selecting the Google identity provider and Redirect URI from the top line.

3.4.4 - Wallaroo SSO Configuration for Seamless Redirect

Instructions on updating the Wallaroo SSO configuration for a seamless redirect experience.

By default, when organizations add identity providers to Wallaroo users have to select which identity provider or at least provide their username and passwords to login through the default Keycloak service.

The following instructions show how to set an identity provider as the default and configure Wallaroo so users who are already authenticated through a identity provider can seamlessly login to their Wallaroo instance without having to select any other options.

This process has two major steps:

Set an Identity Provider as Default
Set Update Profile on First Login to Off

Prerequisites

These instructions assume that an identity provider has been created for the Wallaroo instance.

Set an Identity Provider as Default

To set a default identity provider for a Wallaroo instance for seamless access:

Access the Wallaroo Keycloak service through a browser as an administrator. The Keycloak service URL will be in the format $WALLAROOPREFIX.keycloak.$WALLAROOSUFFIX. For example, if the Wallaroo prefix is wallaroo and the suffix example.com, then the Keycloak service URL would be wallaroo.keycloak.example.com. See the DNS Integration Guide for more details on Wallaroo services with DNS.
Select Administration Console, then log in with an administrator account. See the Wallaroo User Management guides for more information.
From the left navigation panel, select Authentication.
For the Auth Type Identity Provider Redirector row, select Actions -> Config.
Enter the following:
1. Alias: The name for this configuration.
2. Default Identity Provider: The identity provider to use by default. A list is available from Configure->Identity Providers. For this example, it is google. Verify that the name matches the name of the existing Identity Provider.
Select Save.
Save the ID! Save the Identity Provider Redirectory generated by Keycloak. This step is important in disabling the seamless redirect.

This optional step prevents the Keycloak service from forcing the user to update an existing profile the first time they log in through a new identity provider. For more information, see the Keycloak Identity Broker First Login documentation.

To set the Identity Broker First Login to Off:

Access the Wallaroo Keycloak service through a browser as an administrator. The Keycloak service URL will be in the format $WALLAROOPREFIX.keycloak.$WALLAROOSUFFIX. For example, if the Wallaroo prefix is wallaroo and the suffix example.com, then the Keycloak service URL would be wallaroo.keycloak.example.com. See the DNS Integration Guide for more details on Wallaroo services with DNS.
Select Administration Console, then log in with an administrator account. See the Wallaroo User Management guides for more information.
From the left navigation panel, select Authentication.
From the top drop-down list, select First Broker Login, then for the row labeled Review Profile(review profile config), select Actions->Config.
Set Update Profile on First Login to Off.
Select Save.

Disable Automatic Redirects

Disable Through Keycloak UI

To disable automatic redirects through the Keycloak UI:

1. Access the Wallaroo Keycloak service through a browser as an administrator. The Keycloak service URL will be in the format $WALLAROOPREFIX.keycloak.$WALLAROOSUFFIX. For example, if the Wallaroo prefix is wallaroo and the suffix example.com, then the Keycloak service URL would be wallaroo.keycloak.example.com. See the DNS Integration Guide for more details on Wallaroo services with DNS.
Select Administration Console, then log in with an administrator account. See the Wallaroo User Management guides for more information.
From the left navigation panel, select Authentication.
For the Auth Type Identity Provider Redirector row, set the Requirement to Disabled.

Seamless redirect is now disabled. Users will be able to either enter their username/password, or select the identity provider to use.

Disable through Kubernetes

This process allows users to disable the seamless redirect through through the Kubernetes administrative node. This process requires the following:

The Identity Provider Redirector was saved from the step Set an Identity Provider as Default.
kubectl is installed on the node administrating the Kubernetes environment hosting the Wallaroo instance.
curl is installed.

These steps assume the Wallaroo instance was installed into the namespace wallaroo.

The following code will retrieve the Wallaroo Keycloak admin password,then makes a connection to the Wallaroo Keycloak service through curl, then delete the identity provider set as the Identity Provider Redirector.

The Keycloak service URL will be in the format $WALLAROOPREFIX.keycloak.$WALLAROOSUFFIX. For example, if the Wallaroo prefix is wallaroo and the suffix example.com, then the Keycloak service URL would be wallaroo.keycloak.example.com. See the DNS Integration Guide for more details on Wallaroo services with DNS.

The variable IDENTITYUUID is the Identity Provider Redirector UUID.

Replace WALLAROOPREFIX, WALLAROOSUFFIX and IDENTITYUUID with the appropriate values for your Wallaroo instance.

WALLAROOPREFIX="wallaroo"
WALLAROOSUFFIX="example.com"
IDENTITYUUID="1234"
KEYCLOAK_PASSWORD=$(kubectl -n wallaroo get secret keycloak-admin-secret -o go-template='{{.data.KEYCLOAK_ADMIN_PASSWORD | base64decode }}')
TOKEN=$(curl -s "https://$WALLAROOPREFIX.keycloak.$WALLAROOSUFFIX/auth/realms/master/protocol/openid-connect/token" -d "username=admin" -d "password=$KEYCLOAK_PASSWORD" -d 'grant_type=password' -d 'client_id=admin-cli' | jq -r .access_token)
curl -H "Authorization: Bearer $TOKEN" "https://$WALLAROOPREFIX.keycloak.$WALLAROOSUFFIX/auth/admin/realms/master/authentication/config/$IDENTITYUUID" -X DELETE

Seamless redirect is now disabled. Users will be able to either enter their username/password, or select the identity provider to use.

4 - Wallaroo Workspace Management Guide

How to manage your Wallaroo Workspaces

This following guide is created to help users manage their Wallaroo Community Edition (CE) workspaces. Workspaces are used to manage Machine Learning (ML) models and pipelines.

Workspace Naming Requirements

Workspace names map onto Kubernetes objects, and must be DNS compliant. Workspace names must be ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

How to Set the Current Workspace

The Current Workspace shows the selected workspace and its pipelines and models that are contained in the workspace.

To set the current workspace in your Wallaroo session:

From the top left navigation panel, select the workspace. By default, this is My Workspace.
Select the workspace to set as the current workspace.

How to View All Workspaces

To view all workspaces:

From the top left navigation panel, select the Workspace icon (resembles an office desk with monitor).
Select View Workspaces from the bottom of the list.
A list of current workspaces and their owners will be displayed.

How to Create a New Workspace

Workspaces can be created either through the Wallaroo Dashboard or through the Wallaroo SDK.

NOTICE

Workspace names are not forced to be unique. You can have 50 workspaces all named my-amazing-workspace, which can cause confusion in determining which workspace to use.

It is recommended that organizations agree on a naming convention and select the workspace to use rather than creating a new one each time.

To create a new workspace from the Wallaroo interface:

From the top left navigation panel, select the Workspace icon (resembles an office desk with monitor).
Select View Workspaces from the bottom of the list.
Select the text box marked Workspace name and enter the name of the new workspace. It is recommended to make workspace names unique.
Select Create Workspace.

Manage Collaborators

Workspace collaborators are other users in your Wallaroo instance that can access the workspace either as a collaborator or as a co-owner.

How to Add a Workspace Collaborator

To add a collaborator to the workspace:

From the top left navigation panel, select the workspace. By default, this is My Workspace.
Select the workspace to set as the current workspace.
Select Invite Users from the Collaborators list.
Select from the users listed. Note that only users in your Wallaroo instance can be invited.
1. To add the user as a co-owner, select the checkbox “Add as Co-Owner?” next to their name.
Select Send Invitations.

Each invited collaborator will receive an email inviting them to use the workspace, and a link to the Wallaroo instance and the workspace in question.

How to Promote or Demote a Collaborator

To promote or demote a collaborator in a workspace:

From the top left navigation panel, select the workspace. By default, this is My Workspace.
Select the workspace to set as the current workspace.
From the Collaborators list, select the user ... to the right of the user to promote or demote.
1. If the user is a co-owner, select Demote to Collaborator.
2. If the user is a collaborator, select Promote to Owner.

How to Remove a Collaborator from a Workspace

To promote or demote a collaborator:

From the top left navigation panel, select the workspace. By default, this is My Workspace.
Select the workspace to set as the current workspace.
From the Collaborators list, select the user ... to the right of the user remove.
Select Remove from Workspace.
Confirm the removal of the user from the workspace.

5 - Wallaroo Model Management

How to manage your Wallaroo models

Models are the Machine Learning (ML) models that are uploaded to your Wallaroo workspace and used to solve problems based on data submitted to them in a pipeline.

Model Naming Requirements

Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

Supported Models and Libraries

Supported Models

The following frameworks are supported. Frameworks fall under either Native or Containerized runtimes in the Wallaroo engine. For more details, see the specific framework what runtime a specific model framework runs in.

Runtime Display	Model Runtime Space	Pipeline Configuration
`tensorflow`	Native	Native Runtime Configuration Methods
`onnx`	Native	Native Runtime Configuration Methods
`python`	Native	Native Runtime Configuration Methods
`mlflow`	Containerized	Containerized Runtime Deployment

Please note the following.

IMPORTANT NOTICE: FRAMEWORK VERSIONS

The supported frameworks include the specific version of the model framework supported by Wallaroo. It is highly recommended to verify that models uploaded to Wallaroo meet the library and version requirements to ensure proper functioning.

Wallaroo natively supports Open Neural Network Exchange (ONNX) models into the Wallaroo engine.

Parameter	Description
Web Site	https://onnx.ai/
Supported Libraries	See table below.
Framework	`Framework.ONNX` aka `onnx`
Runtime	Native aka `onnx`

The following ONNX versions models are supported:

Wallaroo Version	ONNX Version	ONNX IR Version	ONNX OPset Version	ONNX ML Opset Version
2023.2.1 (July 2023)	1.12.1	8	17	3
2023.2 (May 2023)	1.12.1	8	17	3
2023.1 (March 2023)	1.12.1	8	17	3
2022.4 (December 2022)	1.12.1	8	17	3
After April 2022 until release 2022.4 (December 2022)	1.10.*	7	15	2
Before April 2022	1.6.*	7	13	2

For the most recent release of Wallaroo 2023.2.1, the following native runtimes are supported:

If converting another ML Model to ONNX (PyTorch, XGBoost, etc) using the onnxconverter-common library, the supported DEFAULT_OPSET_NUMBER is 17.

Using different versions or settings outside of these specifications may result in inference issues and other unexpected behavior.

ONNX models always run in the native runtime space.

Data Schemas

ONNX models deployed to Wallaroo have the following data requirements.

Equal rows constraint: The number of input rows and output rows must match.
All inputs are tensors: The inputs are tensor arrays with the same shape.
Data Type Consistency: Data types within each tensor are of the same type.

Equal Rows Constraint

Inference performed through ONNX models are assumed to be in batch format, where each input row corresponds to an output row. This is reflected in the in fields returned for an inference. In the following example, each input row for an inference is related directly to the inference output.

df = pd.read_json('./data/cc_data_1k.df.json')
display(df.head())
result = ccfraud_pipeline.infer(df.head())

display(result)

INPUT

	tensor
0	[-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]
1	[-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]
2	[-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]
3	[-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]
4	[0.5817662108, 0.09788155100000001, 0.1546819424, 0.4754101949, -0.19788623060000002, -0.45043448540000003, 0.016654044700000002, -0.0256070551, 0.0920561602, -0.2783917153, 0.059329944100000004, -0.0196585416, -0.4225083157, -0.12175388770000001, 1.5473094894000001, 0.2391622864, 0.3553974881, -0.7685165301, -0.7000849355000001, -0.1190043285, -0.3450517133, -1.1065114108, 0.2523411195, 0.0209441826, 0.2199267436, 0.2540689265, -0.0450225094, 0.10867738980000001, 0.2547179311]

OUTPUT

	time	in.tensor	out.dense_1
0	2023-11-17 20:34:17.005	[-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]	[0.99300325]
1	2023-11-17 20:34:17.005	[-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]	[0.99300325]
2	2023-11-17 20:34:17.005	[-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]	[0.99300325]
3	2023-11-17 20:34:17.005	[-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439]	[0.99300325]
4	2023-11-17 20:34:17.005	[0.5817662108, 0.097881551, 0.1546819424, 0.4754101949, -0.1978862306, -0.4504344854, 0.0166540447, -0.0256070551, 0.0920561602, -0.2783917153, 0.0593299441, -0.0196585416, -0.4225083157, -0.1217538877, 1.5473094894, 0.2391622864, 0.3553974881, -0.7685165301, -0.7000849355, -0.1190043285, -0.3450517133, -1.1065114108, 0.2523411195, 0.0209441826, 0.2199267436, 0.2540689265, -0.0450225094, 0.1086773898, 0.2547179311]	[0.0010916889]

All Inputs Are Tensors

All inputs into an ONNX model must be tensors. This requires that the shape of each element is the same. For example, the following is a proper input:

t [
    [2.35, 5.75],
    [3.72, 8.55],
    [5.55, 97.2]
]

Another example is a 2,2,3 tensor, where the shape of each element is (3,), and each element has 2 rows.

t = [
        [2.35, 5.75, 19.2],
        [3.72, 8.55, 10.5]
    ],
    [
        [5.55, 7.2, 15.7],
        [9.6, 8.2, 2.3]
    ]

In this example each element has a shape of (2,). Tensors with elements of different shapes, known as ragged tensors, are not supported. For example:

t = [
    [2.35, 5.75],
    [3.72, 8.55, 10.5],
    [5.55, 97.2]
])

**INVALID SHAPE**

For models that require ragged tensor or other shapes, see other data formatting options such as Bring Your Own Predict models.

Data Type Consistency

All inputs into an ONNX model must have the same internal data type. For example, the following is valid because all of the data types within each element are float32.

t = [
    [2.35, 5.75],
    [3.72, 8.55],
    [5.55, 97.2]
]

The following is invalid, as it mixes floats and strings in each element:

t = [
    [2.35, "Bob"],
    [3.72, "Nancy"],
    [5.55, "Wani"]
]

The following inputs are valid, as each data type is consistent within the elements.

df = pd.DataFrame({
    "t": [
        [2.35, 5.75, 19.2],
        [5.55, 7.2, 15.7],
    ],
    "s": [
        ["Bob", "Nancy", "Wani"],
        ["Jason", "Rita", "Phoebe"]
    ]
})
df

	t	s
0	[2.35, 5.75, 19.2]	[Bob, Nancy, Wani]
1	[5.55, 7.2, 15.7]	[Jason, Rita, Phoebe]

Parameter	Description
Web Site	https://www.tensorflow.org/
Supported Libraries	`tensorflow==2.9.1`
Framework	`Framework.TENSORFLOW` aka `tensorflow`
Runtime	Native aka `tensorflow`
Supported File Types	SavedModel format as .zip file

IMPORTANT NOTE

These requirements are <strong>not</strong> for Tensorflow Keras models, only for non-Keras Tensorflow models in the SavedModel format.  For Tensorflow Keras deployment in Wallaroo, see the Tensorflow Keras requirements.

TensorFlow File Format

TensorFlow models are .zip file of the SavedModel format. For example, the Aloha sample TensorFlow model is stored in the directory alohacnnlstm:

├── saved_model.pb
└── variables
    ├── variables.data-00000-of-00002
    ├── variables.data-00001-of-00002
    └── variables.index

This is compressed into the .zip file alohacnnlstm.zip with the following command:

zip -r alohacnnlstm.zip alohacnnlstm/

ML models that meet the Tensorflow and SavedModel format will run as Wallaroo Native runtimes by default.

See the SavedModel guide for full details.

Parameter	Description
Web Site	https://www.python.org/
Supported Libraries	`python==3.8`
Framework	`Framework.PYTHON` aka `python`
Runtime	Native aka `python`

Python models uploaded to Wallaroo are executed as a native runtime.

Note that Python models - aka “Python steps” - are standalone python scripts that use the python libraries natively supported by the Wallaroo platform. These are used for either simple model deployment (such as ARIMA Statsmodels), or data formatting such as the postprocessing steps. A Wallaroo Python model will be composed of one Python script that matches the Wallaroo requirements.

This is contrasted with Arbitrary Python models, also known as Bring Your Own Predict (BYOP) allow for custom model deployments with supporting scripts and artifacts. These are used with pre-trained models (PyTorch, Tensorflow, etc) along with whatever supporting artifacts they require. Supporting artifacts can include other Python modules, model files, etc. These are zipped with all scripts, artifacts, and a requirements.txt file that indicates what other Python models need to be imported that are outside of the typical Wallaroo platform.

Python Models Requirements

Python models uploaded to Wallaroo are Python scripts that must include the wallaroo_json method as the entry point for the Wallaroo engine to use it as a Pipeline step.

This method receives the results of the previous Pipeline step, and its return value will be used in the next Pipeline step.

If the Python model is the first step in the pipeline, then it will be receiving the inference request data (for example: a preprocessing step). If it is the last step in the pipeline, then it will be the data returned from the inference request.

In the example below, the Python model is used as a post processing step for another ML model. The Python model expects to receive data from a ML Model who’s output is a DataFrame with the column dense_2. It then extracts the values of that column as a list, selects the first element, and returns a DataFrame with that element as the value of the column output.

def wallaroo_json(data: pd.DataFrame):
    print(data)
    return [{"output": [data["dense_2"].to_list()[0][0]]}]

In line with other Wallaroo inference results, the outputs of a Python step that returns a pandas DataFrame or Arrow Table will be listed in the out. metadata, with all inference outputs listed as out.{variable 1}, out.{variable 2}, etc. In the example above, this results the output field as the out.output field in the Wallaroo inference result.

	time	in.tensor	out.output	check_failures
0	2023-06-20 20:23:28.395	[0.6878518042, 0.1760734021, -0.869514083, 0.3..	[12.886651039123535]	0

Parameter	Description
Web Site	https://huggingface.co/models
Supported Libraries	`transformers==4.27.0` `diffusers==0.14.0` `accelerate==0.18.0` `torchvision==0.14.1` `torch==1.13.1`
Frameworks	The following Hugging Face pipelines are supported by Wallaroo. `Framework.HUGGING_FACE_FEATURE_EXTRACTION` aka `hugging-face-feature-extraction` `Framework.HUGGING_FACE_IMAGE_CLASSIFICATION` aka `hugging-face-image-classification` `Framework.HUGGING_FACE_IMAGE_SEGMENTATION` aka `hugging-face-image-segmentation` `Framework.HUGGING_FACE_IMAGE_TO_TEXT` aka `hugging-face-image-to-text` `Framework.HUGGING_FACE_OBJECT_DETECTION` aka `hugging-face-object-detection` `Framework.HUGGING_FACE_QUESTION_ANSWERING` aka `hugging-face-question-answering` `Framework.HUGGING_FACE_STABLE_DIFFUSION_TEXT_2_IMG` aka `hugging-face-stable-diffusion-text-2-img` `Framework.HUGGING_FACE_SUMMARIZATION` aka `hugging-face-summarization` `Framework.HUGGING_FACE_TEXT_CLASSIFICATION` aka `hugging-face-text-classification` `Framework.HUGGING_FACE_TRANSLATION` aka `hugging-face-translation` `Framework.HUGGING_FACE_ZERO_SHOT_CLASSIFICATION` aka `hugging-face-zero-shot-classification` `Framework.HUGGING_FACE_ZERO_SHOT_IMAGE_CLASSIFICATION` aka `hugging-face-zero-shot-image-classification` `Framework.HUGGING_FACE_ZERO_SHOT_OBJECT_DETECTION` aka `hugging-face-zero-shot-object-detection` `Framework.HUGGING_FACE_SENTIMENT_ANALYSIS` aka `hugging-face-sentiment-analysis` `Framework.HUGGING_FACE_TEXT_GENERATION` aka `hugging-face-text-generation`
Runtime	Containerized aka `tensorflow` / `mlflow`

Hugging Face Schemas

Input and output schemas for each Hugging Face pipeline are defined below. Note that adding additional inputs not specified below will raise errors, except for the following:

Framework.HUGGING-FACE-IMAGE-TO-TEXT
Framework.HUGGING-FACE-TEXT-CLASSIFICATION
Framework.HUGGING-FACE-SUMMARIZATION
Framework.HUGGING-FACE-TRANSLATION

Additional inputs added to these Hugging Face pipelines will be added as key/pair value arguments to the model’s generate method. If the argument is not required, then the model will default to the values coded in the original Hugging Face model’s source code.

See the Hugging Face Pipeline documentation for more details on each pipeline and framework.

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-FEATURE-EXTRACTION`	Feature Extraction Pipeline Feature Extraction Source Code

Schemas:

input_schema = pa.schema([
    pa.field('inputs', pa.string())
])
output_schema = pa.schema([
    pa.field('output', pa.list_(
        pa.list_(
            pa.float64(),
            list_size=128
        ),
    ))
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-IMAGE-CLASSIFICATION`	Image Classification Documentation Image Classification Source Code

Schemas:

input_schema = pa.schema([
    pa.field('inputs', pa.list_(
        pa.list_(
            pa.list_(
                pa.int64(),
                list_size=3
            ),
            list_size=100
        ),
        list_size=100
    )),
    pa.field('top_k', pa.int64()),
])

output_schema = pa.schema([
    pa.field('score', pa.list_(pa.float64(), list_size=2)),
    pa.field('label', pa.list_(pa.string(), list_size=2)),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-IMAGE-SEGMENTATION`	Image Segmentation Documentation Image Segmentation Source Code

Schemas:

input_schema = pa.schema([
    pa.field('inputs', 
        pa.list_(
            pa.list_(
                pa.list_(
                    pa.int64(),
                    list_size=3
                ),
                list_size=100
            ),
        list_size=100
    )),
    pa.field('threshold', pa.float64()),
    pa.field('mask_threshold', pa.float64()),
    pa.field('overlap_mask_area_threshold', pa.float64()),
])

output_schema = pa.schema([
    pa.field('score', pa.list_(pa.float64())),
    pa.field('label', pa.list_(pa.string())),
    pa.field('mask', 
        pa.list_(
            pa.list_(
                pa.list_(
                    pa.int64(),
                    list_size=100
                ),
                list_size=100
            ),
    )),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-IMAGE-TO-TEXT`	Image to Text Documentation Image to Text Source Code

Any parameter that is not part of the required inputs list will be forwarded to the model as a key/pair value to the underlying models generate method. If the additional input is not supported by the model, an error will be returned.

Schemas:

input_schema = pa.schema([
    pa.field('inputs', pa.list_( #required
        pa.list_(
            pa.list_(
                pa.int64(),
                list_size=3
            ),
            list_size=100
        ),
        list_size=100
    )),
    # pa.field('max_new_tokens', pa.int64()),  # optional
])

output_schema = pa.schema([
    pa.field('generated_text', pa.list_(pa.string())),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-OBJECT-DETECTION`	Object Detection Documentation Object Detection Source Code

Schemas:

input_schema = pa.schema([
    pa.field('inputs', 
        pa.list_(
            pa.list_(
                pa.list_(
                    pa.int64(),
                    list_size=3
                ),
                list_size=100
            ),
        list_size=100
    )),
    pa.field('threshold', pa.float64()),
])

output_schema = pa.schema([
    pa.field('score', pa.list_(pa.float64())),
    pa.field('label', pa.list_(pa.string())),
    pa.field('box', 
        pa.list_( # dynamic output, i.e. dynamic number of boxes per input image, each sublist contains the 4 box coordinates 
            pa.list_(
                    pa.int64(),
                    list_size=4
                ),
            ),
    ),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-QUESTION-ANSWERING`	Question Answering Documentation Question Answering Source Code

Schemas:

input_schema = pa.schema([
    pa.field('question', pa.string()),
    pa.field('context', pa.string()),
    pa.field('top_k', pa.int64()),
    pa.field('doc_stride', pa.int64()),
    pa.field('max_answer_len', pa.int64()),
    pa.field('max_seq_len', pa.int64()),
    pa.field('max_question_len', pa.int64()),
    pa.field('handle_impossible_answer', pa.bool_()),
    pa.field('align_to_words', pa.bool_()),
])

output_schema = pa.schema([
    pa.field('score', pa.float64()),
    pa.field('start', pa.int64()),
    pa.field('end', pa.int64()),
    pa.field('answer', pa.string()),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-STABLE-DIFFUSION-TEXT-2-IMG`	Stable Diffusion Text to Image Documentation Stable Diffusion Text to Image Source Code

Schemas:

input_schema = pa.schema([
    pa.field('prompt', pa.string()),
    pa.field('height', pa.int64()),
    pa.field('width', pa.int64()),
    pa.field('num_inference_steps', pa.int64()), # optional
    pa.field('guidance_scale', pa.float64()), # optional
    pa.field('negative_prompt', pa.string()), # optional
    pa.field('num_images_per_prompt', pa.string()), # optional
    pa.field('eta', pa.float64()) # optional
])

output_schema = pa.schema([
    pa.field('images', pa.list_(
        pa.list_(
            pa.list_(
                pa.int64(),
                list_size=3
            ),
            list_size=128
        ),
        list_size=128
    )),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-SUMMARIZATION`	Summarization Documentation Text2Text Generation Source Code.

Schemas:

input_schema = pa.schema([
    pa.field('inputs', pa.string()),
    pa.field('return_text', pa.bool_()),
    pa.field('return_tensors', pa.bool_()),
    pa.field('clean_up_tokenization_spaces', pa.bool_()),
    # pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair
])

output_schema = pa.schema([
    pa.field('summary_text', pa.string()),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-TEXT-CLASSIFICATION`	Text Classification Documentation Text Classification Source Code

Schemas

input_schema = pa.schema([
    pa.field('inputs', pa.string()), # required
    pa.field('top_k', pa.int64()), # optional
    pa.field('function_to_apply', pa.string()), # optional
])

output_schema = pa.schema([
    pa.field('label', pa.list_(pa.string(), list_size=2)), # list with a number of items same as top_k, list_size can be skipped but may lead in worse performance
    pa.field('score', pa.list_(pa.float64(), list_size=2)), # list with a number of items same as top_k, list_size can be skipped but may lead in worse performance
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-TRANSLATION`	Translation Documentation Translation Generation Source Code

Schemas:

input_schema = pa.schema([
    pa.field('inputs', pa.string()), # required
    pa.field('return_tensors', pa.bool_()), # optional
    pa.field('return_text', pa.bool_()), # optional
    pa.field('clean_up_tokenization_spaces', pa.bool_()), # optional
    pa.field('src_lang', pa.string()), # optional
    pa.field('tgt_lang', pa.string()), # optional
    # pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair
])

output_schema = pa.schema([
    pa.field('translation_text', pa.string()),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-ZERO-SHOT-CLASSIFICATION`	Zero Shot Classification Documentation Zero Shot Classification Source Code

Schemas:

input_schema = pa.schema([
    pa.field('inputs', pa.string()), # required
    pa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # required
    pa.field('hypothesis_template', pa.string()), # optional
    pa.field('multi_label', pa.bool_()), # optional
])

output_schema = pa.schema([
    pa.field('sequence', pa.string()),
    pa.field('scores', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance
    pa.field('labels', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-ZERO-SHOT-IMAGE-CLASSIFICATION`	Zero Shot Image Classification Zero Shot Image Classification Source Code

Schemas:

input_schema = pa.schema([
    pa.field('inputs', # required
        pa.list_(
            pa.list_(
                pa.list_(
                    pa.int64(),
                    list_size=3
                ),
                list_size=100
            ),
        list_size=100
    )),
    pa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # required
    pa.field('hypothesis_template', pa.string()), # optional
]) 

output_schema = pa.schema([
    pa.field('score', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels
    pa.field('label', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-ZERO-SHOT-OBJECT-DETECTION`	Zero Shot Object Detection Documentation Zero Shot Object Detection Source Code

Schemas:

input_schema = pa.schema([
    pa.field('images', 
        pa.list_(
            pa.list_(
                pa.list_(
                    pa.int64(),
                    list_size=3
                ),
                list_size=640
            ),
        list_size=480
    )),
    pa.field('candidate_labels', pa.list_(pa.string(), list_size=3)),
    pa.field('threshold', pa.float64()),
    # pa.field('top_k', pa.int64()), # we want the model to return exactly the number of predictions, we shouldn't specify this
])

output_schema = pa.schema([
    pa.field('score', pa.list_(pa.float64())), # variable output, depending on detected objects
    pa.field('label', pa.list_(pa.string())), # variable output, depending on detected objects
    pa.field('box', 
        pa.list_( # dynamic output, i.e. dynamic number of boxes per input image, each sublist contains the 4 box coordinates 
            pa.list_(
                    pa.int64(),
                    list_size=4
                ),
            ),
    ),
])

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-SENTIMENT-ANALYSIS`	Hugging Face Sentiment Analysis

Wallaroo Framework	Reference
`Framework.HUGGING-FACE-TEXT-GENERATION`	Text Generation Documentation Text Generation Source Code

input_schema = pa.schema([
    pa.field('inputs', pa.string()),
    pa.field('return_tensors', pa.bool_()), # optional
    pa.field('return_text', pa.bool_()), # optional
    pa.field('return_full_text', pa.bool_()), # optional
    pa.field('clean_up_tokenization_spaces', pa.bool_()), # optional
    pa.field('prefix', pa.string()), # optional
    pa.field('handle_long_generation', pa.string()), # optional
    # pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair
])

output_schema = pa.schema([
    pa.field('generated_text', pa.list_(pa.string(), list_size=1))
])

Parameter	Description
Web Site	https://pytorch.org/
Supported Libraries	`torch==1.13.1` `torchvision==0.14.1`
Framework	`Framework.PYTORCH` aka `pytorch`
Supported File Types	`pt` ot `pth` in TorchScript format
Runtime	Containerized aka `mlflow`

Sci-kit Learn aka SKLearn.

Parameter	Description
Web Site	https://scikit-learn.org/stable/index.html
Supported Libraries	`scikit-learn==1.2.2`
Framework	`Framework.SKLEARN` aka `sklearn`
Runtime	Containerized aka `tensorflow` / `mlflow`

SKLearn Schema Inputs

SKLearn schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. For example, the following DataFrame has 4 columns, each column a float.

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2

For submission to an SKLearn model, the data input schema will be a single array with 4 float values.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.

Original DataFrame:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2

Converted DataFrame:

	inputs
0	[5.1, 3.5, 1.4, 0.2]
1	[4.9, 3.0, 1.4, 0.2]

SKLearn Schema Outputs

Outputs for SKLearn that are meant to be predictions or probabilities when output by the model are labeled in the output schema for the model when uploaded to Wallaroo. For example, a model that outputs either 1 or 0 as its output would have the output schema as follows:

output_schema = pa.schema([
    pa.field('predictions', pa.int32())
])

When used in Wallaroo, the inference result is contained in the out metadata as out.predictions.

pipeline.infer(dataframe)

	time	in.inputs	out.predictions	check_failures
0	2023-07-05 15:11:29.776	[5.1, 3.5, 1.4, 0.2]	0	0
1	2023-07-05 15:11:29.776	[4.9, 3.0, 1.4, 0.2]	0	0

Parameter	Description
Web Site	https://www.tensorflow.org/api_docs/python/tf/keras/Model
Supported Libraries	`tensorflow==2.8.0` `keras==1.1.0`
Framework	`Framework.KERAS` aka `keras`
Supported File Types	SavedModel format as .zip file and HDF5 format
Runtime	Containerized aka `mlflow`

TensorFlow Keras SavedModel Format

TensorFlow Keras SavedModel models are .zip file of the SavedModel format. For example, the Aloha sample TensorFlow model is stored in the directory alohacnnlstm:

├── saved_model.pb
└── variables
    ├── variables.data-00000-of-00002
    ├── variables.data-00001-of-00002
    └── variables.index

This is compressed into the .zip file alohacnnlstm.zip with the following command:

zip -r alohacnnlstm.zip alohacnnlstm/

See the SavedModel guide for full details.

TensorFlow Keras H5 Format

Wallaroo supports the H5 for Tensorflow Keras models.

Parameter	Description
Web Site	https://xgboost.ai/
Supported Libraries	`xgboost==1.7.4`
Framework	`Framework.XGBOOST` aka `xgboost`
Supported File Types	`pickle` (XGB files are not supported.)
Runtime	Containerized aka `tensorflow` / `mlflow`

XGBoost Schema Inputs

XGBoost schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. If a model is originally trained to accept inputs of different data types, it will need to be retrained to only accept one data type for each column - typically pa.float64() is a good choice.

For example, the following DataFrame has 4 columns, each column a float.

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2

For submission to an XGBoost model, the data input schema will be a single array with 4 float values.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

Original DataFrame:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2

Converted DataFrame:

	inputs
0	[5.1, 3.5, 1.4, 0.2]
1	[4.9, 3.0, 1.4, 0.2]

XGBoost Schema Outputs

Outputs for XGBoost are labeled based on the trained model outputs. For this example, the output is simply a single output listed as output. In the Wallaroo inference result, it is grouped with the metadata out as out.output.

output_schema = pa.schema([
    pa.field('output', pa.int32())
])

pipeline.infer(dataframe)

	time	in.inputs	out.output	check_failures
0	2023-07-05 15:11:29.776	[5.1, 3.5, 1.4, 0.2]	0	0
1	2023-07-05 15:11:29.776	[4.9, 3.0, 1.4, 0.2]	0	0

Parameter	Description
Web Site	https://www.python.org/
Supported Libraries	`python==3.8`
Framework	`Framework.CUSTOM` aka `custom`
Runtime	Containerized aka `mlflow`

Arbitrary Python models, also known as Bring Your Own Predict (BYOP) allow for custom model deployments with supporting scripts and artifacts. These are used with pre-trained models (PyTorch, Tensorflow, etc) along with whatever supporting artifacts they require. Supporting artifacts can include other Python modules, model files, etc. These are zipped with all scripts, artifacts, and a requirements.txt file that indicates what other Python models need to be imported that are outside of the typical Wallaroo platform.

Contrast this with Wallaroo Python models - aka “Python steps”. These are standalone python scripts that use the python libraries natively supported by the Wallaroo platform. These are used for either simple model deployment (such as ARIMA Statsmodels), or data formatting such as the postprocessing steps. A Wallaroo Python model will be composed of one Python script that matches the Wallaroo requirements.

Arbitrary Python File Requirements

Arbitrary Python (BYOP) models are uploaded to Wallaroo via a ZIP file with the following components:

Artifact	Type	Description
Python scripts aka `.py` files with classes that extend `mac.inference.Inference` and `mac.inference.creation.InferenceBuilder`	Python Script	Extend the classes `mac.inference.Inference` and `mac.inference.creation.InferenceBuilder`. These are included with the Wallaroo SDK. Further details are in Arbitrary Python Script Requirements. Note that there is no specified naming requirements for the classes that extend `mac.inference.Inference` and `mac.inference.creation.InferenceBuilder` - any qualified class name is sufficient as long as these two classes are extended as defined below.
`requirements.txt`	Python requirements file	This sets the Python libraries used for the arbitrary python model. These libraries should be targeted for Python 3.8 compliance. These requirements and the versions of libraries should be exactly the same between creating the model and deploying it in Wallaroo. This insures that the script and methods will function exactly the same as during the model creation process.
Other artifacts	Files	Other models, files, and other artifacts used in support of this model.

For example, the if the arbitrary python model will be known as vgg_clustering, the contents may be in the following structure, with vgg_clustering as the storage directory:

vgg_clustering\
    feature_extractor.h5
    kmeans.pkl
    custom_inference.py
    requirements.txt

Note the inclusion of the custom_inference.py file. This file name is not required - any Python script or scripts that extend the classes listed above are sufficient. This Python script could have been named vgg_custom_model.py or any other name as long as it includes the extension of the classes listed above.

The sample arbitrary python model file is created with the command zip -r vgg_clustering.zip vgg_clustering/.

Wallaroo Arbitrary Python uses the Wallaroo SDK mac module, included in the Wallaroo SDK 2023.2.1 and above. See the Wallaroo SDK Install Guides for instructions on installing the Wallaroo SDK.

Arbitrary Python Script Requirements

The entry point of the arbitrary python model is any python script that extends the following classes. These are included with the Wallaroo SDK. The required methods that must be overridden are specified in each section below.

mac.inference.Inference interface serves model inferences based on submitted input some input. Its purpose is to serve inferences for any supported arbitrary model framework (e.g. scikit, keras etc.).

classDiagram
    class Inference {
        <<Abstract>>
        +model Optional[Any]
        +expected_model_types()* Set
        +predict(input_data: InferenceData)*  InferenceData
        -raise_error_if_model_is_not_assigned() None
        -raise_error_if_model_is_wrong_type() None
    }

mac.inference.creation.InferenceBuilder builds a concrete Inference, i.e. instantiates an Inference object, loads the appropriate model and assigns the model to to the Inference object.
```
classDiagram
    class InferenceBuilder {
        +create(config InferenceConfig) * Inference
        -inference()* Any
    }
```

mac.inference.Inference

mac.inference.Inference Objects

Object	Type	Description
`model Optional[Any]`	An optional list of models that match the supported frameworks from `wallaroo.framework.Framework` included in the arbitrary python script. Note that this is optional - no models are actually required. A BYOP can refer to a specific model(s) used, be used for data processing and reshaping for later pipeline steps, or other needs.

mac.inference.Inference Methods

Method	Returns	Description
`expected_model_types` (Required)	`Set`	Returns a Set of models expected for the inference as defined by the developer. Typically this is a set of one. Wallaroo checks the expected model types to verify that the model submitted through the `InferenceBuilder` method matches what this `Inference` class expects.
`_predict (input_data: mac.types.InferenceData)` (Required)	`mac.types.InferenceData`	The entry point for the Wallaroo inference with the following input and output parameters that are defined when the model is updated. `mac.types.InferenceData`: The input `InferenceData` is a dictionary of numpy arrays derived from the `input_schema` detailed when the model is uploaded, defined in PyArrow.Schema format. `mac.types.InferenceData`: The output is a dictionary of numpy arrays as defined by the output parameters defined in PyArrow.Schema format. The `InferenceDataValidationError` exception is raised when the input data does not match `mac.types.InferenceData`.
`raise_error_if_model_is_not_assigned`	N/A	Error when `expected_model_types` is not set.
`raise_error_if_model_is_wrong_type`	N/A	Error when the model does not match the `expected_model_types`.

mac.inference.creation.InferenceBuilder

InferenceBuilder builds a concrete Inference, i.e. instantiates an Inference object, loads the appropriate model and assigns the model to the Inference.

classDiagram
    class InferenceBuilder {
        +create(config InferenceConfig) * Inference
        -inference()* Any
    }

Each model that is included requires its own InferenceBuilder. InferenceBuilder loads one model, then submits it to the Inference class when created. The Inference class checks this class against its expected_model_types() Set.

mac.inference.creation.InferenceBuilder Methods

Method	Returns	Description
`create(config mac.config.inference.CustomInferenceConfig)` (Required)	The custom `Inference` instance.	Creates an Inference subclass, then assigns a model and attributes. The `CustomInferenceConfig` is used to retrieve the `config.model_path`, which is a `pathlib.Path object` pointing to the folder where the model artifacts are saved. Every artifact loaded must be relative to `config.model_path`. This is set when the arbitrary python .zip file is uploaded and the environment for running it in Wallaroo is set. For example: loading the artifact `vgg_clustering\feature_extractor.h5` would be set with `config.model_path \ feature_extractor.h5`. The model loaded must match an existing module. For our example, this is `from sklearn.cluster import KMeans`, and this must match the `Inference` `expected_model_types`.
`inference`	custom `Inference` instance.	Returns the instantiated custom Inference object created from the `create` method.

Arbitrary Python Runtime

Arbitrary Python always run in the containerized model runtime.

Parameter	Description
Web Site	https://mlflow.org
Supported Libraries	`mlflow==1.30.0`
Runtime	Containerized aka `mlflow`

For models that do not fall under the supported model frameworks, organizations can use containerized MLFlow ML Models.

This guide details how to add ML Models from a model registry service into Wallaroo.

Wallaroo supports both public and private containerized model registries. See the Wallaroo Private Containerized Model Container Registry Guide for details on how to configure a Wallaroo instance with a private model registry.

Wallaroo users can register their trained MLFlow ML Models from a containerized model container registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.

As of this time, Wallaroo only supports MLFlow 1.30.0 containerized models. For information on how to containerize an MLFlow model, see the MLFlow Documentation.

List Wallaroo Frameworks

Wallaroo frameworks are listed from the Wallaroo.Framework class. The following demonstrates listing all available supported frameworks.

from wallaroo.framework import Framework

[e.value for e in Framework]

    ['onnx',
    'tensorflow',
    'python',
    'keras',
    'sklearn',
    'pytorch',
    'xgboost',
    'hugging-face-feature-extraction',
    'hugging-face-image-classification',
    'hugging-face-image-segmentation',
    'hugging-face-image-to-text',
    'hugging-face-object-detection',
    'hugging-face-question-answering',
    'hugging-face-stable-diffusion-text-2-img',
    'hugging-face-summarization',
    'hugging-face-text-classification',
    'hugging-face-translation',
    'hugging-face-zero-shot-classification',
    'hugging-face-zero-shot-image-classification',
    'hugging-face-zero-shot-object-detection',
    'hugging-face-sentiment-analysis',
    'hugging-face-text-generation']

How to Upload Models to a Workspace

IMPORTANT NOTICE

Uploading models is managed through the Wallaroo SDK and Wallaroo MLOps API. As of this time, models can not be uploaded through the Wallaroo Dashboard

To upload a model to Wallaroo, see the following guides:

How to View Uploaded Models

Models uploaded to the current workspace can be seen through the following process:

From the Wallaroo Dashboard, select the workspace to set as the current workspace from the navigation panel above. The number of models for the workspace will be displayed.
Select View Models. A list of the models in the workspace will be displayed.
To view details on the model, select the model name from the list.

Model Details

From the Model Details page the following is displayed:

The name of the model.
The unique ID of the model represented as a UUID.
The file name of the model
The version history of the model.

5.1 - Wallaroo Model Tag Management

How to manage tags and models.

Tags can be used to label, search, and track models across different versions. The following guide will demonstrate how to:

Create a tag for a specific model version.
Remove a tag for a specific model version.

The example shown uses the model ccfraudmodel.

Steps

Add a New Tag to a Model Version

To set a tag for a specific version of a model uploaded to Wallaroo using the Wallaroo Dashboard:

Log into your Wallaroo instance.
Select the workspace the models were uploaded into.
Select View Models.
From the Model Select Dashboard page, select the model to update.
From the Model Dashboard page, select the version of the model. By default, the latest version will be selected.
Select the + icon under the name of the model and it’s hash value.
Enter the name of the new tag. When complete, select Enter. The tag will be set to this version of the model selected.

Remove a Tag from a Model Version

To remove a tag from a version of an uploaded model:

IMPORTANT NOTE

Once a tag is deleted from a model version, it can not be undeleted.

Log into your Wallaroo instance.
Select the workspace the models were uploaded into.
Select View Models.
From the Model Select Dashboard page, select the model to update.
From the Model Dashboard page, select the version of the model. By default, the latest version will be selected.
Select the X for the tag to delete. The tag will be removed from the model version.

Wallaroo SDK Tag Management

Tags are applied to either model versions or pipelines. This allows organizations to track different versions of models, and search for what pipelines have been used for specific purposes such as testing versus production use.

Create Tag

Tags are created with the Wallaroo client command create_tag(String tagname). This creates the tag and makes it available for use.

The tag will be saved to the variable currentTag to be used in the rest of these examples.

# Now we create our tag
currentTag = wl.create_tag("My Great Tag")

Tags are listed with the Wallaroo client command list_tags(), which shows all tags and what models and pipelines they have been assigned to.

# List all tags

wl.list_tags()

id	tag	models	pipelines
1	My Great Tag	[('tagtestmodel', ['70169e97-fb7e-4922-82ba-4f5d37e75253'])]	[]

Wallaroo Model Tag Management

Tags are used with models to track differences in model versions.

Assign Tag to a Model

Tags are assigned to a model through the Wallaroo Tag add_to_model(model_id) command, where model_id is the model’s numerical ID number. The tag is applied to the most current version of the model.

For this example, the currentTag will be applied to the tagtest_model. All tags will then be listed to show it has been assigned to this model.

# add tag to model

currentTag.add_to_model(tagtest_model.id())

{'model_id': 1, 'tag_id': 1}

Search Models by Tag

Model versions can be searched via tags using the Wallaroo Client method search_models(search_term), where search_term is a string value. All models versions containing the tag will be displayed. In this example, we will be using the text from our tag to list all models that have the text from currentTag in them.

# Search models by tag

wl.search_models('My Great Tag')

name	version	file_name	image_path	last_update_time
tagtestmodel	70169e97-fb7e-4922-82ba-4f5d37e75253	ccfraud.onnx	None	2022-11-29 17:15:21.703465+00:00

Remove Tag from Model

Tags are removed from models using the Wallaroo Tag remove_from_model(model_id) command.

In this example, the currentTag will be removed from tagtest_model. A list of all tags will be shown with the list_tags command, followed by searching the models for the tag to verify it has been removed.

### remove tag from model

currentTag.remove_from_model(tagtest_model.id())

{'model_id': 1, 'tag_id': 1}

6 - Wallaroo Pipeline Management

How to manage your Wallaroo pipelines

Pipelines represent how data is submitted to your uploaded Machine Learning (ML) models. Pipelines allow you to:

Submit information through an uploaded file or through the Pipeline’s Deployment URL.
Have the Pipeline submit the information to one or more models in sequence.
Once complete, output the result from the model(s).
Pipeline Naming Requirements

Pipeline names map onto Kubernetes objects, and must be DNS compliant. Pipeline names must be ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

How to Create a Pipeline and Use a Pipeline

Pipelines can be created through the Wallaroo Dashboard and the Wallaroo SDK. For specifics on using the SDK, see the Wallaroo SDK Guide. For more detailed instructions and step-by-step examples with real models and data, see the Wallaroo Tutorials.

The following instructions are focused on how to use the Wallaroo Dashboard for creating, deploying, and undeploying pipelines.

How to Create a Pipeline using the Wallaroo Dashboard

Prerequisites

Before creating a pipeline through the Wallaroo Dashboard, a model must be uploaded into the workspace through the SDK. For more information, see the Wallaroo SDK Essentials Guide.

IMPORTANT NOTICE

Pipeline names are not forced to be unique. You can have 50 pipelines all named my-pipeline, which can cause confusion in determining which pipeline to use.

It is recommended that organizations agree on a naming convention and select pipeline to use rather than creating a new one each time. See the SDK guides for more information on how to select an existing pipeline.

To create a pipeline:

From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
From the upper right hand corner, select Create Pipeline.
Enter the following:
1. Pipeline Name: The name of the new pipeline. Pipeline names should be unique across the Wallaroo instance.
2. Add Pipeline Step: Select the models to be used as the pipeline steps.
When finished, select Next.
Review the name of the pipeline and the steps. If any adjustments need to be made, select either Back to rename the pipeline or Add Step(s) to change the pipeline’s steps.
When finished, select Build to create the pipeline in this workspace. The pipeline will be built and be ready for deployment within a minute.

How to Deploy and Undeploy a Pipeline using the Wallaroo Dashboard

Deployed pipelines create new namespaces in the Kubernetes environment where the Wallaroo instance is deployed, and allocate resources from the Kubernetes environment to run the pipeline and its steps.

To deploy a pipeline:

From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
Select the pipeline to deploy.
From the right navigation panel, select Deploy.
A popup module will request verification to deploy the pipeline. Select Deploy again to deploy the pipeline.

Undeploying a pipeline returns resources back to the Kubernetes environment and removes the namespaces created when the pipeline was deployed.

To undeploy a pipeline:

From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
Select the pipeline to deploy.
From the right navigation panel, select Undeploy.
A popup module will request verification to undeploy the pipeline. Select Undeploy again to undeploy the pipeline.

How to View a Pipeline Details and Metrics

To view a pipeline’s details:

From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
To view details on the pipeline, select the name of the pipeline.
A list of the pipeline’s details will be displayed.

To view a pipeline’s metrics:

From the Wallaroo Dashboard, set the current workspace from the top left dropdown list.
Select View Pipelines from the pipeline’s row.
To view details on the pipeline, select the name of the pipeline.
A list of the pipeline’s details will be displayed.
Select Metrics to view the following information. From here you can select the time period to display metrics from through the drop down to display the following:
1. Requests per second
2. Cluster inference rate
3. Inference latency
The Audit Log and Anomaly Log are available to view further details of the pipeline’s activities.

Pipeline Details

The following is available from the Pipeline Details page:

The name of the pipeline.
The pipeline ID: This is in UUID format.
Pipeline steps: The steps and the models in each pipeline step.
Version History: how the pipeline has been updated over time.

6.1 - Wallaroo Pipeline Tag Management

How to manage tags and pipelines.

Tags can be used to label, search, and track pipelines across a Wallaroo instance. The following guide will demonstrate how to:

Create a tag for a specific pipeline.
Remove a tag for a specific pipeline.

The example shown uses the pipeline ccfraudpipeline.

Steps

Add a New Tag to a Pipeline

To set a tag to pipeline using the Wallaroo Dashboard:

Log into your Wallaroo instance.
Select the workspace the pipelines are associated with.
Select View Pipelines.
From the Pipeline Select Dashboard page, select the pipeline to update.
From the Pipeline Dashboard page, select the + icon under the name of the pipeline and it’s hash value.
Enter the name of the new tag. When complete, select Enter. The tag will be set for this pipeline.

Remove a Tag from a Pipeline

To remove a tag from a pipeline:

IMPORTANT NOTE

Once a tag is deleted from a pipeline, it can not be undeleted.

Log into your Wallaroo instance.
Select the workspace the pipelines are associated with.
Select View Pipelines.
From the Pipeline Select Dashboard page, select the pipeline to update.
From the Pipeline Dashboard page, select the select the X for the tag to delete. The tag will be removed from the pipeline.

Wallaroo SDK Tag Management

Create Tag

Tags are created with the Wallaroo client command create_tag(String tagname). This creates the tag and makes it available for use.

The tag will be saved to the variable currentTag to be used in the rest of these examples.

# Now we create our tag
currentTag = wl.create_tag("My Great Tag")

Tags are listed with the Wallaroo client command list_tags(), which shows all tags and what models and pipelines they have been assigned to.

# List all tags

wl.list_tags()

id	tag	models	pipelines
1	My Great Tag	[('tagtestmodel', ['70169e97-fb7e-4922-82ba-4f5d37e75253'])]	[]

Wallaroo Pipeline Tag Management

Tags are used with pipelines to track different pipelines that are built or deployed with different features or functions.

Add Tag to Pipeline

Tags are added to a pipeline through the Wallaroo Tag add_to_pipeline(pipeline_id) method, where pipeline_id is the pipeline’s integer id.

For this example, we will add currentTag to testtest_pipeline, then verify it has been added through the list_tags command and list_pipelines command.

# add this tag to the pipeline
currentTag.add_to_pipeline(tagtest_pipeline.id())

{'pipeline_pk_id': 1, 'tag_pk_id': 1}

Search Pipelines by Tag

Pipelines can be searched through the Wallaroo Client search_pipelines(search_term) method, where search_term is a string value for tags assigned to the pipelines.

In this example, the text “My Great Tag” that corresponds to currentTag will be searched for and displayed.

wl.search_pipelines('My Great Tag')

name	version	creation_time	last_updated_time	deployed	tags	steps
tagtestpipeline	5a4ff3c7-1a2d-4b0a-ad9f-78941e6f5677	2022-29-Nov 17:15:21	2022-29-Nov 17:15:21	(unknown)	My Great Tag

Remove Tag from Pipeline

Tags are removed from a pipeline with the Wallaroo Tag remove_from_pipeline(pipeline_id) command, where pipeline_id is the integer value of the pipeline’s id.

For this example, currentTag will be removed from tagtest_pipeline. This will be verified through the list_tags and search_pipelines command.

## remove from pipeline
currentTag.remove_from_pipeline(tagtest_pipeline.id())

{'pipeline_pk_id': 1, 'tag_pk_id': 1}

6.2 - Wallaroo Assays Management

How to create and use assays to monitor model inputs and outputs.

Model Insights and Interactive Analysis Introduction

Wallaroo provides the ability to perform interactive analysis so organizations can explore the data from a pipeline and learn how the data is behaving. With this information and the knowledge of your particular business use case you can then choose appropriate thresholds for persistent automatic assays as desired.

IMPORTANT NOTE
Model insights operates over time and is difficult to demo in a notebook without pre-canned data. We assume you have an active pipeline that has been running and making predictions over time and show you the code you may use to analyze your pipeline.

Monitoring tasks called assays monitors a model’s predictions or the data coming into the model against an established baseline. Changes in the distribution of this data can be an indication of model drift, or of a change in the environment that the model trained for. This can provide tips on whether a model needs to be retrained or the environment data analyzed for accuracy or other needs.

Assay Details

Assays contain the following attributes:

Attribute	Default	Description
Name		The name of the assay. Assay names must be unique.
Baseline Data		Data that is known to be “typical” (typically distributed) and can be used to determine whether the distribution of new data has changed.
Schedule	Every 24 hours at 1 AM	New assays are configured to run a new analysis for every 24 hours starting at the end of the baseline period. This period can be configured through the SDK.
Group Results	Daily	Groups assay results into groups based on either Daily (the default), Weekly, or Monthly.
Metric	PSI	Population Stability Index (PSI) is an entropy-based measure of the difference between distributions. Maximum Difference of Bins measures the maximum difference between the baseline and current distributions (as estimated using the bins). Sum of the difference of bins sums up the difference of occurrences in each bin between the baseline and current distributions.
Threshold	0.1	The threshold for deciding whether the difference between distributions, as evaluated by the above metric, is large (the distributions are different) or small (the distributions are similar). The default of 0.1 is generally a good threshold when using PSI as the metric.
Number of Bins	5	Sets the number of bins that will be used to partition the baseline data for comparison against how future data falls into these bins. By default, the binning scheme is percentile (quantile) based. The binning scheme can be configured (see Bin Mode, below). Note that the total number of bins will include the set number plus the `left_outlier` and the `right_outlier`, so the total number of bins will be the total set + 2.
Bin Mode	Quantile	Set the binning scheme. Quantile binning defines the bins using percentile ranges (each bin holds the same percentage of the baseline data). Equal binning defines the bins using equally spaced data value ranges, like a histogram. Custom allows users to set the range of values for each bin, with the Left Outlier always starting at Min (below the minimum values detected from the baseline) and the Right Outlier always ending at Max (above the maximum values detected from the baseline).
Bin Weight	Equally Weighted	The bin weights can be either set to Equally Weighted (the default) where each bin is weighted equally, or Custom where the bin weights can be adjusted depending on which are considered more important for detecting model drift.

Manage Assays via the Wallaroo Dashboard

Assays can be created and used via the Wallaroo Dashboard.

Accessing Assays Through the Pipeline Dashboard

Assays created through the Wallaroo Dashboard are accessed through the Pipeline Dashboard through the following process.

Log into the Wallaroo Dashboard.
Select the workspace containing the pipeline with the models being monitored from the Change Current Workspace and Workspace Management drop down.
Select View Pipelines.
Select the pipeline containing the models being monitored.
Select Insights.

The Wallaroo Assay Dashboard contains the following elements. For more details of each configuration type, see the Model Insights and Assays Introduction.

(A) Filter Assays: Filter assays by the following:
- Name
- Status:
  - Active: The assay is currently running.
  - Paused: The assay is paused until restarted.
  - Drift Detected: One or more drifts have been detected.
- Sort By
  - Sort by Creation Date: Sort by the most recent Assays first.
  - Last Assay Run: Sort by the most recent Assay Last Run date.
(B) Create Assay: Create a new assay.
(C) Assay Controls:
- Pause/Start Assay: Pause a running assay, or start one that was paused.
- Show Assay Details: View assay details. See Assay Details View for more details.
(D) Collapse Assay: Collapse or Expand the assay for view.
(E) Time Period for Assay Data: Set the time period for data to be used in displaying the assay results.
(F) Assay Events: Select an individual assay event to see more details. See View Assay Alert Details for more information.

Assay Details View

The following details are visible by selecting the Assay View Details icon:

(A) Assay Name: The name of the assay displayed.
(B) Input / Output: The input or output and the index of the element being monitored.
(C) Baseline: The time period used to generate the baseline.
(D) Last Run: The date and time the assay was last run.
(E) Next Run: The future date and time the assay will be run again. NOTE: If the assay is paused, then it will not run at the scheduled time. When unpaused, the date will be updated to the next date and time that the assay will be run.
(F) Aggregation Type: The aggregation type used with the assay.
(G) Threshold: The threshold value used for the assay.
(H) Metric: The metric type used for the assay.
(I) Number of Bins: The number of bins used for the assay.
(J) Bin Weight: The weight applied to each bin.
(K) Bin Mode: The type of bin node applied to each bin.

View Assay Alert Details

To view details on an assay alert:

Select the data with available alert data.
Mouse hover of a specific Assay Event Alert to view the data and time of the event and the alert value.
Select the Assay Event Alert to view the Baseline and Window details of the alert including the left_outlier and right_outlier.

Hover over a bar chart graph to view additional details.

Select the ⊗ symbol to exit the Assay Event Alert details and return to the Assay View.

Build an Assay Through the Pipeline Dashboard

To create a new assay through the Wallaroo Pipeline Dashboard:

Log into the Wallaroo Dashboard.
Select the workspace containing the pipeline with the models being monitored from the Change Current Workspace and Workspace Management drop down.
Select View Pipelines.
Select the pipeline containing the models being monitored.
Select Insights.
Select +Create Assay.
On the Assay Name module, enter the following:
1. Assay Name: The name of the new assay.
2. Monitor output data or Monitor input data: Select whether to monitor input or output data.
3. Select an output/input to monitor: Select the input or output to monitor.
  1. Named Field: The name of the field to monitor.
  2. Index: The index of the monitored field.
4. Select Next to continue.
On the Specify Baseline Module:
1. (A) Select the data to use for the baseline. This can either be set with a preset recent time period (last 30 seconds, last 60 seconds, etc) or with a custom date range.
Once selected, a preview graph of the baseline values will be displayed (B). Note that this may take a few seconds to generate.
1. Select Next to continue.
On the Settings Module:
1. Set the date and time range to view values generated by the assay. This can either be set with a preset recent time period (last 30 seconds, last 60 seconds, etc) or with a custom date range.
  New assays are configured to run a new analysis for every 24 hours starting at the end of the baseline period. For information on how to adjust the scheduling period and other settings for the assay scheduling window, see the SDK section on how to Schedule Assay.
2. Set the following Advanced Settings.
  1. (A) Preview Date Range: The date and times to for the preview chart.
  2. (B) Preview: A preview of the assay results will be displayed based on the settings below.
  3. (C) Scheduling: Set the Frequency (Daily, Every Minute, Hourly, Weekly, Default: Daily) and the Time (increments of one hour Default: 1:00 AM).
  4. (D) Group Results: How the results are grouped: Daily, Weekly, or Monthly.
  5. (E) Aggregation Type: Density or Cumulative.
  6. (F) Threshold:
    1. Default: 0.1
  7. (G) Metric:
    1. Default: Population Stability Index
    2. Maximum Difference of Bins
    3. Sum of the Difference of Bins
  8. (H) Number of Bins: From 5 to 14. Default: 5
  9. (F) Bin Mode:
    1. Equally Spaced
    2. Default: Quantile
  10. (I) Bin Weights: The bin weights:
    1. Equally Weighted (Default)
    2. Custom: Users can assign their own bin weights as required.
3. Review the preview chart to verify the settings are correct.
4. Select Build to complete the process and build the new assay.

Once created, it may take a few minutes for the assay to complete compiling data. If needed, reload the Pipeline Dashboard to view changes.

Manage Assays via the Wallaroo SDK

List Assays

Assays are listed through the Wallaroo Client list_assays method.

wl.list_assays()

name	active	status	warning_threshold	alert_threshold	pipeline_name
api_assay	True	created	0.0	0.1	housepricepipe

Interactive Baseline Runs

We can do an interactive run of just the baseline part to see how the baseline data will be put into bins. This assay uses quintiles so all 5 bins (not counting the outlier bins) have 20% of the predictions. We can see the bin boundaries along the x-axis.

baseline_run.chart()

baseline mean = 12.940910643273655
baseline median = 12.884286880493164
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False

We can also get a dataframe with the bin/edge information.

baseline_run.baseline_bins()

	b_edges	b_edge_names	b_aggregated_values	b_aggregation
0	12.00	left_outlier	0.00	Density
1	12.55	q_20	0.20	Density
2	12.81	q_40	0.20	Density
3	12.98	q_60	0.20	Density
4	13.33	q_80	0.20	Density
5	14.97	q_100	0.20	Density
6	inf	right_outlier	0.00	Density

The previous assay used quintiles so all of the bins had the same percentage/count of samples. To get bins that are divided equally along the range of values we can use BinMode.EQUAL.

equal_bin_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)
equal_bin_builder.summarizer_builder.add_bin_mode(BinMode.EQUAL)
equal_baseline = equal_bin_builder.build().interactive_baseline_run()
equal_baseline.chart()

baseline mean = 12.940910643273655
baseline median = 12.884286880493164
bin_mode = Equal
aggregation = Density
metric = PSI
weighted = False

We now see very different bin edges and sample percentages per bin.

equal_baseline.baseline_bins()

	b_edges	b_edge_names	b_aggregated_values	b_aggregation
0	12.00	left_outlier	0.00	Density
1	12.60	p_1.26e1	0.24	Density
2	13.19	p_1.32e1	0.49	Density
3	13.78	p_1.38e1	0.22	Density
4	14.38	p_1.44e1	0.04	Density
5	14.97	p_1.50e1	0.01	Density
6	inf	right_outlier	0.00	Density

Interactive Assay Runs

By default the assay builder creates an assay with some good starting parameters. In particular the assay is configured to run a new analysis for every 24 hours starting at the end of the baseline period. Additionally, it sets the number of bins to 5 so creates quintiles, and sets the target iopath to "outputs 0 0" which means we want to monitor the first column of the first output/prediction.

We then run it with interactive_run and convert it to a dataframe for easy analysis with to_dataframe.

Now lets do an interactive run of the first assay as it is configured. Interactive runs don’t save the assay to the database (so they won’t be scheduled in the future) nor do they save the assay results. Instead the results are returned after a short while for further analysis.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)
assay_config = assay_builder.add_run_until(last_day).build()
assay_results = assay_config.interactive_run()

assay_df = assay_results.to_dataframe()
assay_df.loc[:, ~assay_df.columns.isin(['assay_id', 'iopath', 'name', 'warning_threshold'])]

	score	start	min	max	mean	median	std	alert_threshold	status
0	0.00	2023-01-02T00:00:00+00:00	12.05	14.71	12.97	12.90	0.48	0.25	Ok
1	0.09	2023-01-03T00:00:00+00:00	12.04	14.65	12.96	12.93	0.41	0.25	Ok
2	0.04	2023-01-04T00:00:00+00:00	11.87	14.02	12.98	12.95	0.46	0.25	Ok
3	0.06	2023-01-05T00:00:00+00:00	11.92	14.46	12.93	12.87	0.46	0.25	Ok
4	0.02	2023-01-06T00:00:00+00:00	12.02	14.15	12.95	12.90	0.43	0.25	Ok
5	0.03	2023-01-07T00:00:00+00:00	12.18	14.58	12.96	12.93	0.44	0.25	Ok
6	0.02	2023-01-08T00:00:00+00:00	12.01	14.60	12.92	12.90	0.46	0.25	Ok
7	0.04	2023-01-09T00:00:00+00:00	12.01	14.40	13.00	12.97	0.45	0.25	Ok
8	0.06	2023-01-10T00:00:00+00:00	11.99	14.79	12.94	12.91	0.46	0.25	Ok
9	0.02	2023-01-11T00:00:00+00:00	11.90	14.66	12.91	12.88	0.45	0.25	Ok
10	0.02	2023-01-12T00:00:00+00:00	11.96	14.82	12.94	12.90	0.46	0.25	Ok
11	0.03	2023-01-13T00:00:00+00:00	12.07	14.61	12.96	12.93	0.47	0.25	Ok
12	0.15	2023-01-14T00:00:00+00:00	12.00	14.20	13.06	13.03	0.43	0.25	Ok
13	2.92	2023-01-15T00:00:00+00:00	12.74	15.62	14.00	14.01	0.57	0.25	Alert
14	7.89	2023-01-16T00:00:00+00:00	14.64	17.19	15.91	15.87	0.63	0.25	Alert
15	8.87	2023-01-17T00:00:00+00:00	16.60	19.23	17.94	17.94	0.63	0.25	Alert
16	8.87	2023-01-18T00:00:00+00:00	18.67	21.29	20.01	20.04	0.64	0.25	Alert
17	8.87	2023-01-19T00:00:00+00:00	20.72	23.57	22.17	22.18	0.65	0.25	Alert
18	8.87	2023-01-20T00:00:00+00:00	23.04	25.72	24.32	24.33	0.66	0.25	Alert
19	8.87	2023-01-21T00:00:00+00:00	25.06	27.67	26.48	26.49	0.63	0.25	Alert
20	8.87	2023-01-22T00:00:00+00:00	27.21	29.89	28.63	28.58	0.65	0.25	Alert
21	8.87	2023-01-23T00:00:00+00:00	29.36	32.18	30.82	30.80	0.67	0.25	Alert
22	8.87	2023-01-24T00:00:00+00:00	31.56	34.35	32.98	32.98	0.65	0.25	Alert
23	8.87	2023-01-25T00:00:00+00:00	33.68	36.44	35.14	35.14	0.66	0.25	Alert
24	8.87	2023-01-26T00:00:00+00:00	35.93	38.51	37.31	37.33	0.65	0.25	Alert
25	3.69	2023-01-27T00:00:00+00:00	12.06	39.91	29.29	38.65	12.66	0.25	Alert
26	0.05	2023-01-28T00:00:00+00:00	11.87	13.88	12.92	12.90	0.38	0.25	Ok
27	0.10	2023-01-29T00:00:00+00:00	12.02	14.36	12.98	12.96	0.38	0.25	Ok
28	0.11	2023-01-30T00:00:00+00:00	11.99	14.44	12.89	12.88	0.37	0.25	Ok
29	0.01	2023-01-31T00:00:00+00:00	12.00	14.64	12.92	12.89	0.40	0.25	Ok

Basic functionality for creating quick charts is included.

assay_results.chart_scores()

We see that the difference scores are low for a while and then jump up to indicate there is an issue. We can examine that particular window to help us decide if that threshold is set correctly or not.

We can generate a quick chart of the results. This chart shows the 5 quantile bins (quintiles) derived from the baseline data plus one for left outliers and one for right outliers. We also see that the data from the window falls within the baseline quintiles but in a different proportion and is skewing higher. Whether this is an issue or not is specific to your use case.

First lets examine a day that is only slightly different than the baseline. We see that we do see some values that fall outside of the range from the baseline values, the left and right outliers, and that the bin values are different but similar.

assay_results[0].chart()

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.0029273068646199748
scores = [0.0, 0.000514261205558409, 0.0002139202456922972, 0.0012617897456473992, 0.0002139202456922972, 0.0007234154220295724, 0.0]
index = None

Other days, however are significantly different.

assay_results[12].chart()

baseline mean = 12.940910643273655
window mean = 13.06380216891949
baseline median = 12.884286880493164
window median = 13.027600288391112
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.15060511096978788
scores = [4.6637149189075455e-05, 0.05969428191167242, 0.00806617426854112, 0.008316273402678306, 0.07090885609902021, 0.003572888138686759, 0.0]
index = None

assay_results[13].chart()

baseline mean = 12.940910643273655
window mean = 14.004728427908038
baseline median = 12.884286880493164
window median = 14.009637832641602
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 2.9220486095961196
scores = [0.0, 0.7090936334784107, 0.7130482300184766, 0.33500731896676245, 0.12171058214520876, 0.9038825518183468, 0.1393062931689142]
index = None

If we want to investigate further, we can run interactive assays on each of the inputs to see if any of them show anything abnormal. In this example we’ll provide the feature labels to create more understandable titles.

The current assay expects continuous data. Sometimes categorical data is encoded as 1 or 0 in a feature and sometimes in a limited number of values such as 1, 2, 3. If one value has high a percentage the analysis emits a warning so that we know the scores for that feature may not behave as we expect.

labels = ['bedrooms', 'bathrooms', 'lat', 'long', 'waterfront', 'sqft_living', 'sqft_lot', 'floors', 'view', 'condition', 'grade', 'sqft_above', 'sqft_basement', 'yr_built', 'yr_renovated', 'sqft_living15', 'sqft_lot15']

topic = wl.get_topic_name(pipeline.id())

all_inferences = wl.get_raw_pipeline_inference_logs(topic, baseline_start, last_day, model_name, limit=1_000_000)

assay_builder = wl.build_assay("Input Assay", pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.window_builder().add_width(hours=4)
assay_config = assay_builder.build()
assay_results = assay_config.interactive_input_run(all_inferences, labels)
iadf = assay_results.to_dataframe()
display(iadf.loc[:, ~iadf.columns.isin(['assay_id', 'iopath', 'name', 'warning_threshold'])])

column distinct_vals label           largest_pct
     0            17 bedrooms        0.4244 
     1            44 bathrooms       0.2398 
     2          3281 lat             0.0014 
     3           959 long            0.0066 
     4             4 waterfront      0.9156 *** May not be continuous feature
     5          3901 sqft_living     0.0032 
     6          3487 sqft_lot        0.0173 
     7            11 floors          0.4567 
     8            10 view            0.8337 
     9             9 condition       0.5915 
    10            19 grade           0.3943 
    11           745 sqft_above      0.0096 
    12           309 sqft_basement   0.5582 
    13           224 yr_built        0.0239 
    14            77 yr_renovated    0.8889 
    15           649 sqft_living15   0.0093 
    16          3280 sqft_lot15      0.0199

	score	start	min	max	mean	median	std	alert_threshold	status
0	0.19	2023-01-02T00:00:00+00:00	-2.54	1.75	0.21	0.68	0.99	0.25	Ok
1	0.03	2023-01-02T04:00:00+00:00	-1.47	2.82	0.21	-0.40	0.95	0.25	Ok
2	0.09	2023-01-02T08:00:00+00:00	-2.54	3.89	-0.04	-0.40	1.22	0.25	Ok
3	0.05	2023-01-02T12:00:00+00:00	-1.47	2.82	-0.12	-0.40	0.94	0.25	Ok
4	0.08	2023-01-02T16:00:00+00:00	-1.47	1.75	-0.00	-0.40	0.76	0.25	Ok
...	...	...	...	...	...	...	...	...	...
3055	0.08	2023-01-31T04:00:00+00:00	-0.42	4.87	0.25	-0.17	1.13	0.25	Ok
3056	0.58	2023-01-31T08:00:00+00:00	-0.43	2.01	-0.04	-0.21	0.48	0.25	Alert
3057	0.13	2023-01-31T12:00:00+00:00	-0.32	7.75	0.30	-0.20	1.57	0.25	Ok
3058	0.26	2023-01-31T16:00:00+00:00	-0.43	5.88	0.19	-0.18	1.17	0.25	Alert
3059	0.84	2023-01-31T20:00:00+00:00	-0.40	0.52	-0.17	-0.25	0.18	0.25	Alert

3060 rows × 9 columns

We can chart each of the iopaths and do a visual inspection. From the charts we see that if any of the input features had significant differences in the first two days which we can choose to inspect further. Here we choose to show 3 charts just to save space in this notebook.

assay_results.chart_iopaths(labels=labels, selected_labels=['bedrooms', 'lat', 'sqft_living'])

When we are comfortable with what alert threshold should be for our specific purposes we can create and save an assay that will be automatically run on a daily basis.

In this example we’re create an assay that runs everyday against the baseline and has an alert threshold of 0.5.

Once we upload it it will be saved and scheduled for future data as well as run against past data.

alert_threshold = 0.5
import string
import random

prefix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))

assay_name = f"{prefix}example assay"
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_alert_threshold(alert_threshold)
assay_id = assay_builder.upload()

After a short while, we can get the assay results for further analysis.

When we get the assay results, we see that the assays analysis is similar to the interactive run we started with though the analysis for the third day does not exceed the new alert threshold we set. And since we called upload instead of interactive_run the assay was saved to the system and will continue to run automatically on schedule from now on.

Scheduling Assays

By default assays are scheduled to run every 24 hours starting immediately after the baseline period ends.

However, you can control the start time by setting start and the frequency by setting interval on the window.

So to recap:

The window width is the size of the window. The default is 24 hours.
The interval is how often the analysis is run, how far the window is slid into the future based on the last run. The default is the window width.
The window start is when the analysis should start. The default is the end of the baseline period.

For example to run an analysis every 12 hours on the previous 24 hours of data you’d set the window width to 24 (the default) and the interval to 12.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)
assay_builder = assay_builder.add_run_until(last_day)

assay_builder.window_builder().add_width(hours=24).add_interval(hours=12)

assay_config = assay_builder.build()

assay_results = assay_config.interactive_run()
print(f"Generated {len(assay_results)} analyses")

Generated 59 analyses

assay_results.chart_scores()

To start a weekly analysis of the previous week on a specific day, set the start date (taking care to specify the desired timezone), and the width and interval to 1 week and of course an analysis won’t be generated till a window is complete.

report_start = datetime.datetime.fromisoformat('2022-01-03T00:00:00+00:00')

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end)
assay_builder = assay_builder.add_run_until(last_day)

assay_builder.window_builder().add_width(weeks=1).add_interval(weeks=1).add_start(report_start)

assay_config = assay_builder.build()

assay_results = assay_config.interactive_run()
print(f"Generated {len(assay_results)} analyses")

Generated 5 analyses

assay_results.chart_scores()

Advanced Configuration

The assay can be configured in a variety of ways to help customize it to your particular needs. Specifically you can:

change the BinMode to evenly spaced, quantile or user provided
change the number of bins to use
provide weights to use when scoring the bins
calculate the score using the sum of differences, maximum difference or population stability index
change the value aggregation for the bins to density, cumulative or edges

Lets take a look at these in turn.

Default configuration

First lets look at the default configuration. This is a lot of information but much of it is useful to know where it is available.

We see that the assay is broken up into 4 sections. A top level meta data section, a section for the baseline specification, a section for the window specification and a section that specifies the summarization configuration.

In the meta section we see the name of the assay, that it runs on the first column of the first output "outputs 0 0" and that there is a default threshold of 0.25.

The summarizer section shows us the defaults of Quantile, Density and PSI on 5 bins.

The baseline section shows us that it is configured as a fixed baseline with the specified start and end date times.

And the window tells us what model in the pipeline we are analyzing and how often.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
print(assay_builder.build().to_json())

{
    "name": "onmyexample assay",
    "pipeline_id": 1,
    "pipeline_name": "housepricepipe",
    "active": true,
    "status": "created",
    "iopath": "output dense_2 0",
    "baseline": {
        "Fixed": {
            "pipeline": "housepricepipe",
            "model": "housepricemodel",
            "start_at": "2023-01-01T00:00:00+00:00",
            "end_at": "2023-01-02T00:00:00+00:00"
        }
    },
    "window": {
        "pipeline": "housepricepipe",
        "model": "housepricemodel",
        "width": "24 hours",
        "start": null,
        "interval": null
    },
    "summarizer": {
        "type": "UnivariateContinuous",
        "bin_mode": "Quantile",
        "aggregation": "Density",
        "metric": "PSI",
        "num_bins": 5,
        "bin_weights": null,
        "bin_width": null,
        "provided_edges": null,
        "add_outlier_edges": true
    },
    "warning_threshold": null,
    "alert_threshold": 0.25,
    "run_until": "2023-02-01T00:00:00+00:00",
    "workspace_id": 5
}

Defaults

We can run the assay interactively and review the first analysis. The method compare_basic_stats gives us a dataframe with basic stats for the baseline and window data.

assay_results = assay_builder.build().interactive_run()
ar = assay_results[0]

ar.compare_basic_stats()

	Baseline	Window	diff	pct_diff
count	182.00	181.00	-1.00	-0.55
min	12.00	12.05	0.04	0.36
max	14.97	14.71	-0.26	-1.71
mean	12.94	12.97	0.03	0.22
median	12.88	12.90	0.01	0.12
std	0.45	0.48	0.03	5.68
start	2023-01-01T00:00:00+00:00	2023-01-02T00:00:00+00:00	NaN	NaN
end	2023-01-02T00:00:00+00:00	2023-01-03T00:00:00+00:00	NaN	NaN

The method compare_bins gives us a dataframe with the bin information. Such as the number of bins, the right edges, suggested bin/edge names and the values for each bin in the baseline and the window.

assay_bins = ar.compare_bins()
display(assay_bins.loc[:, assay_bins.columns!='w_aggregation'])

	b_edges	b_edge_names	b_aggregated_values	b_aggregation	w_edges	w_edge_names	w_aggregated_values	diff_in_pcts
0	12.00	left_outlier	0.00	Density	12.00	left_outlier	0.00	0.00
1	12.55	q_20	0.20	Density	12.55	e_1.26e1	0.19	-0.01
2	12.81	q_40	0.20	Density	12.81	e_1.28e1	0.21	0.01
3	12.98	q_60	0.20	Density	12.98	e_1.30e1	0.18	-0.02
4	13.33	q_80	0.20	Density	13.33	e_1.33e1	0.21	0.01
5	14.97	q_100	0.20	Density	14.97	e_1.50e1	0.21	0.01
6	NaN	right_outlier	0.00	Density	NaN	right_outlier	0.00	0.00

We can also plot the chart to visualize the values of the bins.

ar.chart()

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.0029273068646199748
scores = [0.0, 0.000514261205558409, 0.0002139202456922972, 0.0012617897456473992, 0.0002139202456922972, 0.0007234154220295724, 0.0]
index = None

Binning Mode

We can change the bin mode algorithm to equal and see that the bins/edges are partitioned at different points and the bins have different values.

prefix= ''.join(random.choice(string.ascii_lowercase) for i in range(4))

assay_name = f"{prefix}example assay"

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.EQUAL)
assay_results = assay_builder.build().interactive_run()
assay_results_df = assay_results[0].compare_bins()
display(assay_results_df.loc[:, ~assay_results_df.columns.isin(['b_aggregation', 'w_aggregation'])])
assay_results[0].chart()

	b_edges	b_edge_names	b_aggregated_values	w_edges	w_edge_names	w_aggregated_values	diff_in_pcts
0	12.00	left_outlier	0.00	12.00	left_outlier	0.00	0.00
1	12.60	p_1.26e1	0.24	12.60	e_1.26e1	0.24	0.00
2	13.19	p_1.32e1	0.49	13.19	e_1.32e1	0.48	-0.02
3	13.78	p_1.38e1	0.22	13.78	e_1.38e1	0.22	-0.00
4	14.38	p_1.44e1	0.04	14.38	e_1.44e1	0.06	0.02
5	14.97	p_1.50e1	0.01	14.97	e_1.50e1	0.01	0.00
6	NaN	right_outlier	0.00	NaN	right_outlier	0.00	0.00

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Equal
aggregation = Density
metric = PSI
weighted = False
score = 0.011074287819376092
scores = [0.0, 7.3591419975306595e-06, 0.000773779195360713, 8.538514991838585e-05, 0.010207597078872246, 1.6725322721660374e-07, 0.0]
index = None

User Provided Bin Edges

The values in this dataset run from ~11.6 to ~15.81. And lets say we had a business reason to use specific bin edges. We can specify them with the BinMode.PROVIDED and specifying a list of floats with the right hand / upper edge of each bin and optionally the lower edge of the smallest bin. If the lowest edge is not specified the threshold for left outliers is taken from the smallest value in the baseline dataset.

edges = [11.0, 12.0, 13.0, 14.0, 15.0, 16.0]
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.PROVIDED, edges)
assay_results = assay_builder.build().interactive_run()
assay_results_df = assay_results[0].compare_bins()
display(assay_results_df.loc[:, ~assay_results_df.columns.isin(['b_aggregation', 'w_aggregation'])])
assay_results[0].chart()

	b_edges	b_edge_names	b_aggregated_values	w_edges	w_edge_names	w_aggregated_values	diff_in_pcts
0	11.00	left_outlier	0.00	11.00	left_outlier	0.00	0.00
1	12.00	e_1.20e1	0.00	12.00	e_1.20e1	0.00	0.00
2	13.00	e_1.30e1	0.62	13.00	e_1.30e1	0.59	-0.03
3	14.00	e_1.40e1	0.36	14.00	e_1.40e1	0.35	-0.00
4	15.00	e_1.50e1	0.02	15.00	e_1.50e1	0.06	0.03
5	16.00	e_1.60e1	0.00	16.00	e_1.60e1	0.00	0.00
6	NaN	right_outlier	0.00	NaN	right_outlier	0.00	0.00

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Provided
aggregation = Density
metric = PSI
weighted = False
score = 0.0321620386600679
scores = [0.0, 0.0, 0.0014576920813015586, 3.549754401142936e-05, 0.030668849034754912, 0.0, 0.0]
index = None

Number of Bins

We could also choose to a different number of bins, lets say 10, which can be evenly spaced or based on the quantiles (deciles).

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.QUANTILE).add_num_bins(10)
assay_results = assay_builder.build().interactive_run()
assay_results_df = assay_results[1].compare_bins()
display(assay_results_df.loc[:, ~assay_results_df.columns.isin(['b_aggregation', 'w_aggregation'])])
assay_results[1].chart()

	b_edges	b_edge_names	b_aggregated_values	w_edges	w_edge_names	w_aggregated_values	diff_in_pcts
0	12.00	left_outlier	0.00	12.00	left_outlier	0.00	0.00
1	12.41	q_10	0.10	12.41	e_1.24e1	0.09	-0.00
2	12.55	q_20	0.10	12.55	e_1.26e1	0.04	-0.05
3	12.72	q_30	0.10	12.72	e_1.27e1	0.14	0.03
4	12.81	q_40	0.10	12.81	e_1.28e1	0.05	-0.05
5	12.88	q_50	0.10	12.88	e_1.29e1	0.12	0.02
6	12.98	q_60	0.10	12.98	e_1.30e1	0.09	-0.01
7	13.15	q_70	0.10	13.15	e_1.32e1	0.18	0.08
8	13.33	q_80	0.10	13.33	e_1.33e1	0.14	0.03
9	13.47	q_90	0.10	13.47	e_1.35e1	0.07	-0.03
10	14.97	q_100	0.10	14.97	e_1.50e1	0.08	-0.02
11	NaN	right_outlier	0.00	NaN	right_outlier	0.00	0.00

baseline mean = 12.940910643273655
window mean = 12.956829186961135
baseline median = 12.884286880493164
window median = 12.929338455200195
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.16591076620684958
scores = [0.0, 0.0002571306027792045, 0.044058279699182114, 0.009441459631493015, 0.03381618572319047, 0.0027335446937028877, 0.0011792419836838435, 0.051023062424253904, 0.009441459631493015, 0.008662563542113508, 0.0052978382749576496, 0.0]
index = None

Bin Weights

Now lets say we only care about differences at the higher end of the range. We can use weights to specify that difference in the lower bins should not be counted in the score.

If we stick with 10 bins we can provide 10 a vector of 12 weights. One weight each for the original bins plus one at the front for the left outlier bin and one at the end for the right outlier bin.

Note we still show the values for the bins but the scores for the lower 5 and left outlier are 0 and only the right half is counted and reflected in the score.

weights = [0] * 6
weights.extend([1] * 6)
print("Using weights: ", weights)
assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_bin_mode(BinMode.QUANTILE).add_num_bins(10).add_bin_weights(weights)
assay_results = assay_builder.build().interactive_run()
assay_results_df = assay_results[1].compare_bins()
display(assay_results_df.loc[:, ~assay_results_df.columns.isin(['b_aggregation', 'w_aggregation'])])
assay_results[1].chart()

Using weights:  [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]

	b_edges	b_edge_names	b_aggregated_values	w_edges	w_edge_names	w_aggregated_values	diff_in_pcts
0	12.00	left_outlier	0.00	12.00	left_outlier	0.00	0.00
1	12.41	q_10	0.10	12.41	e_1.24e1	0.09	-0.00
2	12.55	q_20	0.10	12.55	e_1.26e1	0.04	-0.05
3	12.72	q_30	0.10	12.72	e_1.27e1	0.14	0.03
4	12.81	q_40	0.10	12.81	e_1.28e1	0.05	-0.05
5	12.88	q_50	0.10	12.88	e_1.29e1	0.12	0.02
6	12.98	q_60	0.10	12.98	e_1.30e1	0.09	-0.01
7	13.15	q_70	0.10	13.15	e_1.32e1	0.18	0.08
8	13.33	q_80	0.10	13.33	e_1.33e1	0.14	0.03
9	13.47	q_90	0.10	13.47	e_1.35e1	0.07	-0.03
10	14.97	q_100	0.10	14.97	e_1.50e1	0.08	-0.02
11	NaN	right_outlier	0.00	NaN	right_outlier	0.00	0.00

baseline mean = 12.940910643273655
window mean = 12.956829186961135
baseline median = 12.884286880493164
window median = 12.929338455200195
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = True
score = 0.012600694309416988
scores = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.00019654033061397393, 0.00850384373737565, 0.0015735766052488358, 0.0014437605903522511, 0.000882973045826275, 0.0]
index = None

Metrics

The score is a distance or dis-similarity measure. The larger it is the less similar the two distributions are. We currently support
summing the differences of each individual bin, taking the maximum difference and a modified Population Stability Index (PSI).

The following three charts use each of the metrics. Note how the scores change. The best one will depend on your particular use case.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.0029273068646199748
scores = [0.0, 0.000514261205558409, 0.0002139202456922972, 0.0012617897456473992, 0.0002139202456922972, 0.0007234154220295724, 0.0]
index = None

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_metric(Metric.SUMDIFF)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = SumDiff
weighted = False
score = 0.025438649748041997
scores = [0.0, 0.009956893934794486, 0.006648048084512165, 0.01548175581324751, 0.006648048084512165, 0.012142553579017668, 0.0]
index = None

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_metric(Metric.MAXDIFF)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = MaxDiff
weighted = False
score = 0.01548175581324751
scores = [0.0, 0.009956893934794486, 0.006648048084512165, 0.01548175581324751, 0.006648048084512165, 0.012142553579017668, 0.0]
index = 3

Aggregation Options

Also, bin aggregation can be done in histogram Aggregation.DENSITY style (the default) where we count the number/percentage of values that fall in each bin or Empirical Cumulative Density Function style Aggregation.CUMULATIVE where we keep a cumulative count of the values/percentages that fall in each bin.

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_aggregation(Aggregation.DENSITY)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Density
metric = PSI
weighted = False
score = 0.0029273068646199748
scores = [0.0, 0.000514261205558409, 0.0002139202456922972, 0.0012617897456473992, 0.0002139202456922972, 0.0007234154220295724, 0.0]
index = None

assay_builder = wl.build_assay(assay_name, pipeline, model_name, baseline_start, baseline_end).add_run_until(last_day)
assay_builder.summarizer_builder.add_aggregation(Aggregation.CUMULATIVE)
assay_results = assay_builder.build().interactive_run()
assay_results[0].chart()

baseline mean = 12.940910643273655
window mean = 12.969964654406132
baseline median = 12.884286880493164
window median = 12.899214744567873
bin_mode = Quantile
aggregation = Cumulative
metric = PSI
weighted = False
score = 0.04419889502762442
scores = [0.0, 0.009956893934794486, 0.0033088458502823492, 0.01879060166352986, 0.012142553579017725, 0.0, 0.0]
index = None

7 - Wallaroo Monitoring Management

How to manage your Wallaroo performance.

The following guides instruct users on how to monitor Wallaroo’s performance, retrieve logs, and other monitoring tasks.

7.1 - Integrate Azure Kubernetes Wallaroo Cluster with Azure Managed Grafana

How to integrate Azure Grafana to an Azure Kubernetes based installation of Wallaroo

Organizations that have installed Wallaroo using Microsoft Azure can integrate Azure Managed Grafana. This allows reports to be created tracking the performance of Wallaroo pipelines, overall cluster health, and other vital performance data benchmarks.

Create Azure Managed Grafana Workspace

To create a new Azure Managed Grafana Workspace:

Log into Microsoft Azure. From the Azure Services list, either select Azure Managed Grafana or search for Azure Managed Grafana in the search bar.
From the Azure Managed Grafana dashboard, select +Create.
Set the following minimum settings. Any other settings are up to the organization’s requirements.
1. Subscription: The subscription used for billing the Grafana workspace.
2. Resource Group Name: Select from an existing or use Create new to create a new Azure Resource Group for managing permissions to the Grafana workspace.
3. Instance Details
  1. Location: Where the Grafana workspace is hosted. It is recommended it be in the same location as the Kubernetes cluster hosting the Wallaroo instance.
  2. Name: The name of the Grafana workspace.
Select Review + create when finished. Review the settings, then select Create to complete the process.

Add Azure Managed Grafana Workspace to Microsoft Azure Kubernetes Cluster

To integrate an Azure Managed Grafana Workspace to a Microsoft Azure Kubernetes cluster for monitoring:

Log into Microsoft Azure. From the Azure Services list, either select Kubernetes Services or search for Kubernetes Services in the search bar.
From the Kubernetes services dashboard, select the cluster to monitor.
From the cluster dashboard, from the left navigation panel select Monitoring->Insights.
If Insights have not been configured before, select Configure.
Set the following:
1. Enable Prometheus metrics: Enable.
2. Azure Monitor workspace: Either select an existing Azure Monitor workspace, or create a new one.
3. Azure Managed Grafana: Select the Grafana workspace to use with this cluster.
When complete, select Configure.

The onboarding process will take approximately 10-15 minutes.

Run Wallaroo Performance Results in Grafana

The following are two methods for accessing an Azure Kubernetes Cluster insights with Grafana.

Access Via the Azure Kubernetes Cluster

To access the Azure managed Grafana insights from a Kubernetes cluster:

Log into Microsoft Azure. From the Azure Services list, either select Kubernetes Services or search for Kubernetes Services in the search bar.
Select the cluster.
From the left navigation panel, select Insights.
Select View Grafana.
Select the Grafana instance.
From the Grafana instance, select Overview->Endpoint.

Access Via the Azure Managed Grafana Dashboard

To access the Azure managed Grafana insights for a cluster from the Azure Managed Grafana Dashboard:

Log into Microsoft Azure. From the Azure Services list, either select Azure Managed Grafana or search for Azure Managed Grafana in the search bar.
From the Azure Managed Grafana dashboard, select the Grafana instance.
From the Grafana instance, select Overview->Endpoint.

Load Dashboards

Azure managed Grafana comes pre-packaged with several Dashboards. To view the available Dashboards, from the left navigation panel select Dashboards->Browser.

Recommended Dashboards

The following dashboards are recommended for checking on the performance of the overall Kubernetes cluster hosting the Wallaroo instance, and the performance of deployed Wallaroo pipelines. Each of the following are available in the Managed Prometheus folder.

Kubernetes Compute Resources Cluster

Displays the total load of the Kubernetes cluster. Select the Data Source, then the Cluster to monitor. From here, the CPU Usage, Memory Usage, Bandwidth, and other metrics can be viewed.

Kubernetes Compute Resources Namespace (Pods)

This dashboard breaks down the compute resources by Namespace. Deployed Wallaroo pipelines are associated with the Kubernetes namespace matching the format {WallarooPipelineName-WallarooPipelineID} the Wallaroo pipeline name. For example, the pipeline demandcurvepipeline with the the id 3 is associated with the namespace demandcurvepipeline-3.

Select the Data Source, Cluster, then the namespace to monitor. This dashboard can be useful to check if a pipeline requires more resources, or can be configured to use more or fewer resources to allocate more resources to other pipelines.

To drill down even further, select a pod. engine-lb pods are LoadBalancer pods, while engine pods represent the deployed model.

Manage Grafana Permissions

To allow other Azure users or groups access to the managed Grafana instance:

Log into Microsoft Azure. From the Azure Services list, either select Azure Managed Grafana or search for Azure Managed Grafana in the search bar.
From the Azure Managed Grafana dashboard, select the Grafana instance.
From the Grafana instance, select Overview->Access control (IAM).
To add a new user or group access, select + Add->Add role assignment.
Select Job function roles, then select Next.
Select the role, then select Next.
Under Members, select +Select members and select from the user or group to assign to the Grafana role. Select Review + assign. Review the settings, then select Review + assign to save the settings.

8 - Wallaroo Configuration Guide

How to configure Wallaroo

Wallaroo comes with a plethora of options to enable different services, set performance options, and everything you need to run Wallaroo in the most efficient way.

The following guides are made to help organizations configure Wallaroo and provides integrate it into other services.

8.1 - DNS Integration Guide

Integrate Wallaroo Enterprise Into an Organization’s DNS.

The following guide demonstrates how to integrate a Wallaroo Enterprise instance with an organization’s DNS. DNS services integration is required for Wallaroo Enterprise edition. It is not required for Wallaroo Community. This guide is indented to assist organizations complete their Wallaroo Enterprise installation, and can be used as a reference if changes to the DNS services are modified and updates to the Wallaroo Enterprise instance are required.

IMPORTANT NOTE

Changing either the DNS name or the security certificates will require the Wallaroo instance be reinstalled.

DNS Services Integration Introduction

DNS services integration is required for Wallaroo Enterprise to provide access to the various supporting services that are part of the Wallaroo instance. These include:

Simplified user authentication and management.
Centralized services for accessing the Wallaroo Dashboard, Wallaroo SDK and Authentication.
Collaboration features allowing teams to work together.
Managed security, auditing and traceability.

This guide is not intended for Wallaroo Community, as those DNS entries are managed by Wallaroo during the installation. For more information on installing Wallaroo Community, see the Wallaroo Community Install Guides.

Once integrated, users can access the following services directly from a URL starting with the suffix domain - this is the domain name where other DNS entries are appended to. For example, if the suffix domain is sales.example.com, then the other services would be referenced by https://api.sales.sample.com, etc.

Note that even when accessing specific Wallaroo services directly that the user must still be authenticated through Wallaroo.

Service	DNS Entry	Description
Wallaroo Dashboard	`suffix domain`	Provides access to a user interface for updating workspaces, pipelines, and models. Also provides access to the integrated JupyterHub service.
JupyterHub	jupyterhub	Allows the use of Jupyter Notebooks and access to the Wallaroo SDK.
API	api	Provides access to the Wallaroo API.
Keycloak	keycloak	Keycloak provides user management to the Wallaroo instance.

Connections to Wallaroo services are provided as https://service.{suffix domain}. For example, if the domain suffix is wallaroo.example.com then the URLs to access the various Wallaroo services would be:

https://wallaroo.example.com
https://jupyter.wallaroo.example.com
https://api.wallaroo.example.com
https://keycloak.wallaroo.example.com

Prerequisites

Install Wallaroo Enterprise into a qualified environment. For more details, see the Wallaroo Install Guides and the Wallaroo Enterprise Install Guides.
Determine whether your organization will use a prefix or not as detailed above.
Have access to the Wallaroo Administrative Dashboard - this requires access to the Kubernetes environment that the Wallaroo instance is installed into.
Have access to internal corporate DNS configurations that can be updated. A subdomain for the Wallaroo instance will be created through this process.
Have the IP address for the Wallaroo instance.
Install kubectl into the Kubernetes cluster administrative node.

Wallaroo IP Address Retrieval Methods

Retrieve LoadBalancer IP with kubectl

For most organizations that install Wallaroo into a cloud based Kubernetes cluster such as Micosoft Azure, AWS, etc the external IP address is tied to Wallaroo Loadbalancer service. This can be retrieved with the kubectl command as follows:

Retrieve the external IP address for your Wallaroo instance LoadBalancer. For example, this can be performed through the following kubectl command:

kubectl get svc  -A

Example Result:

NAMESPACE     NAME                           TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)                                     AGE
default       kubernetes                     ClusterIP      10.64.16.1     <none>         443/TCP                                     3d19h
wallaroo      alertmanager                   ClusterIP      10.64.16.48    <none>         9093/TCP                                    2d22h
wallaroo      api-lb                         LoadBalancer   10.64.30.169   34.173.211.9   80:32176/TCP,443:32332/TCP,8080:30971/TCP   2d22h

In this example, the External-IP of the wallaroo LoadBalancer is 34.173.211.9. A more specific command to retrieve just the LoadBalancer address would be:

kubectl get svc api-lb -n wallaroo -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

34.173.211.9

This procedure is appropriate for both clusters that are in external or internal mode.

Other Methods

For organizations install Wallaroo other methods, such as Air Gap or Single Node Linux may find the kubectl get svc api-lb command only returns the internal IP address.

Depending on the instance, there are different methods of acquiring that IP address. The links below reference difference sources.

Refer to your Wallaroo support representative if further assistance is needed.

DNS Integration Steps

To integrate the Wallaroo instance IP address with a DNS service:

Create a CA-signed TLS certificate for your Wallaroo domain with the following settings:
1. Certificate Authority Options:
  1. Use a public Certificate Authority such as Let’s Encrypt or Verisign. In general, you would send a Certificate Signing Request to your CA and they would respond with your certificates.
  2. Use a private Certificate Authority (CA) to provide the certificates. Your organization will have procedures for clients to verify the certificates from the private CA.
  3. Use a Wallaroo certificate and public name server. Contact our CSS team for details.
2. Subject Domain:
  1. Set the certificate’s Subject CN to your Wallaroo domain.
    1. With Wildcards: To use wildcards, use the wildcard *.{suffix domain}. For example, if the domain suffix is wallaroo.example.com, then the Subject CNs would be:
      1. wallaroo.example.com
      2. *.wallaroo.example.com
    2. If wildcard domains are not desired, use a combination of Subject and Subject Alternative Names to set names as follows:
      1. wallaroo.example.com
      2. api.wallaroo.example.com
      3. jupyter.wallaroo.example.com
      4. keycloak.wallaroo.example.com
3. Save your certificates.
  1. You should have two files: the TLS Certificate (.crt) and TLS private key (.key). Store these in a secure location - these will be installed into Wallaroo at a later step.
Create DNS the following entries based on the list above for the Wallaroo instance’s IP address, updating the domain name depending on whether there is a prefix or not:
1. api: A (address) record
2. jupyter: A (address) record
3. keycloak: A (address) record
4. Suffix domain: A record, NS (Name Server) record, SOA (Start Of Authority) record.
For example:
Access the Wallaroo Administrative Dashboard in your browser. This can be done either after installation, or through the following command (assuming your Wallaroo instance was installed into the namespace wallaroo). By default this provides the Wallaroo Administrative Dashboard through the URL https://localhost:8080.
```
kubectl-kots admin-console --namespace wallaroo
```
From the Wallaroo Dashboard, select Config and set the following:
1. Networking Configuration
  1. Ingress Mode for Wallaroo Endpoints:
    1. None: Port forwarding or other methods are used for access.
    2. Internal: For environments where only nodes within the same Kubernetes environment and no external connections are required.
    3. External: Connections from outside the Kubernetes environment is allowed.
      1. Enable external URL inference endpoints: Creates pipeline inference endpoints. For more information, see Model Endpoints Guide.
2. DNS
  1. DNS Suffix (Mandatory): The domain name for your Wallaroo instance.
3. TLS Certificates
  1. Use custom TLS Certs: Checked
  2. TLS Certificate: Enter your TLS Certificate (.crt file).
  3. TLS Private Key: Enter your TLS private key (.key file).
4. Other settings as desired.
Once complete, scroll to the bottom of the Config page and select Save config.
A pop-up window will display The config for Wallaroo Enterprise has been updated.. Select Go to updated version to continue.
From the Version History page, select Deploy. Once the new deployment is finished, you will be able to access your Wallaroo services via their DNS addresses.

To verify the configuration is complete, access the Wallaroo Dashboard through the suffix domain. For example if the suffix domain is wallaroo.example.com then access https://wallaroo.example.com in a browser and verify the connection and certificates.

8.2 - Model Endpoints Guide

Enable external deployment URLs to perform inferences through API calls.

Wallaroo provides the ability to perform inferences through deployed pipelines via both internal and external inferences URLs. These URLs allow inferences to be performed by submitting data to the internal or external inferences URL, with the inference results returned in the same format as the InferenceResult Object.

Internal URLs are available only through the internal Kubernetes environment hosting the Wallaroo instance.
External URLs are available outside of the Kubernetes environment, such as the public internet. Authentication will be required to connect to these external deployment URLs.

The following process will enable external inference URLs

Enable external URl inference endpoints through the Wallaroo Administrative Dashboard or through helm setup. This can be accessed through the kots or helm as detailed in the Wallaroo Install Guides and the How to Install Wallaroo Enterprise via Helm guides.
helm users can update the configuration and enable endpoints by setting the apilb\external_inference_endpoints_enabled to true as follows:
```
apilb:
    # Required to perform remote inferences either through the SDK or the API
    external_inference_endpoints_enabled: true
```
For kots users: To access the Wallaroo Administrative Dashboard:
1. From a terminal shell connected to the Kubernetes environment hosting the Wallaroo instance, run the following kots command:
```
kubectl kots admin-console --namespace wallaroo
```
This provides the following standard output:
```
  • Press Ctrl+C to exit
  • Go to http://localhost:8800 to access the Admin Console
```
This will host a http connection to the Wallaroo Administrative Dashboard, by default at http://localhost:8800.
1. Open a browser at the URL detailed in the step above and authenticate using the console password set as described in the as detailed in the Wallaroo Install Guides.
2. From the top menu, select Config then verify that Networking Configuration -> Ingress Mode for Wallaroo interactive services -> Enable external URL inference endpoints is enabled.
3. Save the updated configuration, then deploy it. Once complete, the external URL inference endpoints will be enabled.

8.3 - Manage Minio Storage for Models Storage

How to manage model storage in Wallaroo

Targeted Role
Dev Ops

Organizations can manage their ML Model storage in their Wallaroo instances through the MinIO interface included in the standard Wallaroo installation.

The following details how to access and the Wallaroo MinIO service. For full details on using the MinIO service, see the MinIO Documentation site.

All of the steps below require administrative access to the Kubernetes service hosting the Wallaroo instance.

IMPORTANT NOTE

Deleting a model from MinIO storage frees up storage by deleting the model artifacts, but does not remove the model objects from Wallaroo workspaces. Please ensure that pipeline deployments are not dependent on the model artifacts being deleted.

Wallaroo MinIO Model Storage

Wallaroo ML Models are stored in the MinIO bucket model-bucket.

Retrieving the Wallaroo MinIO Password

Access to the Wallaroo MinIO service is password protected. DevOps with administrative access to the Kubernetes cluster hosting the Wallaroo instance can retrieve this password with the following:

The kubectl command.
The namespace the Wallaroo instance is installed to.

This command takes the following format:

kubectl -n {Wallaroo Namespace} get secret minio -o 'jsonpath={.data.rootPassword}' | base64 -d

For example, if the Wallaroo instance is installed into the namespace wallaroo this command would be:

kubectl -n wallaroo get secret minio -o 'jsonpath={.data.rootPassword}' | base64 -d

Accessing the MinIO Service

Access to the MinIO service in a Wallaroo instance is performed either with the Command Line Interface (CLI), or through a browser based User Interface (UI).

Accessing the MinIO Service Through CLI

Access to the MinIO service included with the Wallaroo instance can be performed with the command line tool mc. For more details, see the MinIO Client documentation.

Installing Minio CLI

The following demonstrates installing the mc command for MacOS and Linux.

Installing for MacOS with Brew

MacOS users who have installed Homebrew can install mc with the following:

brew install minio/stable/mc

Installing for Linux

Linux users can install the MinIO CLI tool mc with the following:

wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/mc

Port Forward for MinIO CLI Access

To access the Wallaroo MinIO service, use Kubernetes port-forward to connect. By default this is on port 9000. This command requires the following:

The kubectl command.
The namespace the Wallaroo instance is installed to.

This command is in the following format:

kubectl port-forward services/minio 9000:9000 -n {Wallaroo Namespace}

For example, if the Wallaroo instance is installed to the default namespace wallaroo, this command is:

kubectl port-forward services/minio 9000:9000 -n wallaroo

Show MinIO Disk Usage through CLI

To view the Wallaroo MinIO service through the CLI, the following is required:

The kubectl command.
The namespace the Wallaroo instance is installed to.
The MinIO CLI tool mc.
The MinIO password.

Here’s an example showing MinIO disk usage via the CLI for the namespace wallaroo. When prompted for the Secret Key, press Enter for none.

The following script displays the space used for a default installation of Wallaroo in the namespace wallaroo.

#!/bin/bash
if kubectl -n wallaroo get secret minio >& /dev/null; then
    u="$(kubectl -n wallaroo get secret minio -o jsonpath='{ .data.rootUser }' | base64 -d)"
    p="$(kubectl -n wallaroo get secret minio -o jsonpath='{ .data.rootPassword }' | base64 -d)"
    creds="$u $p"
fi

mc alias set --insecure wallaroo http://localhost:9000 $creds; mc du --recursive wallaroo

The output:

Added `wallaroo` successfully.
2.2GiB	22 objects	model-bucket
2.2GiB	22 objects

Accessing the MinIO Service Through the UI

The MinIO service included with the Wallaroo instance can be access through the MinIO user interface. By default this is port 9001.

Port Forward for MinIO UI Access

The MinIO UI port 9001 can be access through the kubectl port-forward command.

This command requires the following:

The kubectl command.
The namespace the Wallaroo instance is installed to.

This command is in the following format:

kubectl port-forward services/minio-console 9001:9001 -n {Wallaroo Namespace}

For example, to port forward through the default installation namespace wallaroo:

kubectl port-forward services/minio-console 9001:9001 -n wallaroo

Accessing the Wallaroo MinIO UI

Once the port forward command is running, the MinIO UI is access through a browser on port 9001 with the user minio and the MinIO administrative password retrieved through the step Retrieving the Minio Administrative Password.

Viewing General Storage

General disk usage is displayed through Monitoring->Metrics.

Viewing ML Model Storage

ML Models stored for Wallaroo are accessed through the bucket model-bucket.

Select Browse to view the contents of the model-bucket. To determine the specific file name, access the Name of the object and view the Tags. The file name is access via the Tag file-name.

Objects can be deleted from this bucket with the Delete option.

8.4 - Private Containerized Model Container Registry Guide

How to enable private Containerized Model Container Registry with Wallaroo.

Configure Wallaroo with a Private Containerized Model Container Registry

Organizations can configure Wallaroo to use private Containerized Model Container Registry. This allows for the use of containerized models such as MLFlow.

The following steps will provide sample instructions on setting up different private registry services from different providers. In each case, refer to the official documentation for the providers for any updates or more complex use cases.

The following process is used with a GitHub Container Registry to create the authentication tokens for use with a Wallaroo instance’s Private Model Registry configuration.

See the GitHub Working with the Container registry for full details.

The following process is used register a GitHub Container Registry with Wallaroo.

Create a new token as per the instructions from the Creating a personal access token (classic) guide. Note that a classic token is recommended for this process. Store this token in a secure location as it will not be able to be retrieved later from GitHub. Verify the following permissions are set:
1. Select the write:packages scope to download and upload container images and read and write their metadata.
2. Select the read:packages scope to download container images and read their metadata (selected when write:packages is selected by default).
3. Select the delete:packages scope to delete container images.

Configure Wallaroo Via Kots

If Wallaroo was installed via kots, use the following procedure to add the private model registry information.

Launch the Wallaroo Administrative Dashboard through a terminal linked to the Kubernetes cluster. Replace the namespace with the one used in your installation.
```
kubectl kots admin-console --namespace wallaroo
```
Launch the dashboard, by default at http://localhost:8800.
From the admin dashboard, select Config -> Private Model Container Registry.
Enable Provide private container registry credentials for model images.
Provide the following:
1. Registry URL: The URL of the Containerized Model Container Registry. Typically in the format host:port. In this example, the registry for GitHub is used.
2. email: The email address of the Github user generating the token.
3. username: The username of the Github user authentication to the registry service.
4. password: The GitHub token generated in the previous steps.
Scroll down and select Save config.
Deploy the new version.

Once complete, the Wallaroo instance will be able to authenticate to the Containerized Model Container Registry and retrieve the images.

Configure Wallaroo Via Helm

During either the installation process or updates, set the following in the local-values.yaml file:
1. privateModelRegistry:
  1. enabled: true
  2. secretName: model-registry-secret
  3. registry: The URL of the Containerized Model Container Registry. Typically in the format host:port.
  4. email: The email address of the Github user generating the token.
  5. username: The username of the Github user authentication to the registry service.
  6. password: The GitHub token generated in the previous steps.
    For example:
```
# Other settings - DNS entries, etc.

# The private registry settings
privateModelRegistry:
   enabled: true
   secretName: model-registry-secret
   registry: "ghcr.io/johnhansarickwallaroo"
   email: "sample.user@wallaroo.ai"
   username: "johnhansarickwallaroo"
   password: "abcdefg"
```
Install or update the Wallaroo instance via Helm as per the Wallaroo Helm Install instructions.

Once complete, the Wallaroo instance will be able to authenticate to the registry service and retrieve the images.

The following process is an example of setting up an Artifact Registry Service with Google Cloud Platform (GCP) that is used to store containerized model images and retrieve them for use with Wallaroo.

Uploading and downloading containerized models to a Google Cloud Platform Registry follows these general steps.

Create the GCP registry.
Create a Service Account that will manage the registry service requests.
Assign appropriate Artifact Registry role to the Service Account
Retrieve the Service Account credentials.
Using either a specific user, or the Service Account credentials, upload the containerized model to the registry service.
Add the service account credentials to the Wallaroo instance’s containerized model private registry configuration.
Prerequisites

The commands below use the Google gcloud command line tool, and expect that a Google Cloud Platform account is created and the gcloud application is associated with the GCP Project for the organization.

For full details on the process and other methods, see the Google GCP documentation.

Create the Registry

The following is based on the Create a repository using the Google Cloud CLI.

The following information is needed up front:

$REPOSITORY_NAME: What to call the registry.
$LOCATION: Where the repository will be located. GCP locations are derived through the gcloud artifacts locations list command.
$DESCRIPTION: Any details to be displayed. Sensitive data should not be included.

The follow example script will create a GCP registry with the minimum requirements.

REPOSITORY_NAME="YOUR NAME"
LOCATION="us-west1"
DESCRIPTION="My amazing registry."

gcloud artifacts repositories create REPOSITORY \
    --repository-format=docker \
    --location=LOCATION \
    --description="$DESCRIPTION" \
    --async

Create a GCP Registry Service Account

The GCP Registry Service Account is used to manage the GCP registry service. The steps are details from the Google Create a service account guide.

The gcloud process for these steps are:

Connect the gcloud application to the organization’s project.

$PROJECT_ID="YOUR PROJECT ID"
gcloud config set project $PROJECT_ID

Create the service account with the following:

The name of the service account.
A description of its purpose.
The name to show when displayed.

SA_NAME="YOUR SERVICE ACCOUNT NAME"
DESCRIPTION="Wallaroo container registry SA"
DISPLAY_NAME="Wallaroo the Roo"

gcloud iam service-accounts create $SA_NAME \
--description=$DESCRIPTION \
--display-name=$DISPLAY_NAME

Assign Artifact Registry Role

Assign one or more of the following accounts to the new registry role based on the following criteria, as detailed in the Google GCP Repository Roles and Permissions Guide.

For pkg.dev domains.

Role	Description
Artifact Registry Reader (roles/artifactregistry.reader)	View and get artifacts, view repository metadata.
Artifact Registry Writer (roles/artifactregistry.writer)	Read and write artifacts.
Artifact Registry Repository Administrator (roles/artifactregistry.repoAdmin)	Read, write, and delete artifacts.
Artifact Registry Administrator (roles/artifactregistry.admin)	Create and manage repositories and artifacts.

For gcr.io repositories.

Role	Description
Artifact Registry Create-on-push Writer (roles/artifactregistry.createOnPushWriter)	Read and write artifacts. Create gcr.io repositories.
Artifact Registry Create-on-push Repository Administrator (roles/artifactregistry.createOnPushRepoAdmin)	Read, write, and delete artifacts. Create gcr.io repositories.

For this example, we will add the Artifact Registry Create-on-push Writer to the created Service Account from the previous step.

Add the role to the service account, specifying the member as the new service account, and the role as the selected role. For this example, a pkg.dev is assumed for the Artifact Registry type.


# for pkg.dev
ROLE="roles/artifactregistry.writer"

# for gcr.io 
#ROLE="roles/artifactregistry.createOnPushWriter

gcloud projects add-iam-policy-binding \
    $PROJECT_ID \
    --member="serviceAccount:$SA_NAME@$PROJECT_ID.iam.gserviceaccount.com" \
    --role=$ROLE

Authenticate to Repository

To push and pull image from the new registry, we’ll use our new service account and authenticate through the local Docker application. See the GCP Push and pull images for details on using Docker and other methods to add artifacts to the GCP artifact registry.

Set up Service Account Key

To set up the Service Account key, we’ll use the Google Console IAM & ADMIN dashboard based on the Set up authentication for Docker, using the JSON key approach.

From GCP console, search for IAM & Admin.
Select Service Accounts.
Select the service account to generate keys for.
Select the Email address listed and store this for later steps with the key generated through this process.
Select Keys, then Add Key, then Create new key.
Select JSON, then Create.
Store the key in a safe location.

Convert SA Key to Base64

The key file downloaded in Set up Service Account Key needs to be converted to base64 with the following command, replacing the locations of KEY_FILE and KEYFILEBASE64:

KEY_FILE = ~/.gcp-sa-registry-keyfile.json
KEYFILEBASE64 = ~/.gcp-sa-registry-keyfile-b64.json
base64 -i $KEY_FILE -o $KEYFILEBASE64

Authenticate with Docker

Launch Docker.
Run the following command using the base64 version of the key file.
```
cat $KEYFILEBASE64 | docker login -u _json_key_base64 --password-stdin https://$LOCATION-docker.pkg.dev
```
If successful, the following will be returned.
Login Succeeded.

Tag the statsmodel and postprocess containers based on the repository used for the container registry. The GCP Registry format is $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE. In this example, it is us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/, with the images mlflow-postprocess-example and mlflow-statsmodels-example.

Get the internal name through docker images.

docker image list

REPOSITORY                                                           TAG       IMAGE ID       CREATED        SIZE
mlflow-postprocess-example                                           2023.1    ff121d335e24   5 months ago   3.28GB
mlflow-statsmodels-example                                           2023.1    4c23cac0a7b1   5 months ago   3.34GB

Tag the images with the repository address. Note the repository must be in lowercase.

docker tag mlflow-postprocess-example:2023.1 us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/mlflow-postprocess-example:2023.1

docker tag mlflow-statsmodels-example:2023.1 us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/mlflow-statsmodels-example:2023.1

Verify with docker images. Note that the new tags match the same Image ID as the original tags.

docker images

REPOSITORY                                                                                 TAG       IMAGE ID       CREATED        SIZE
mlflow-postprocess-example                                                                 2023.1    ff121d335e24   5 months ago   3.28GB
us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/mlflow-postprocess-example   2023.1    ff121d335e24   5 months ago   3.28GB
mlflow-statsmodels-example                                                                 2023.1    4c23cac0a7b1   5 months ago   3.34GB
us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/mlflow-statsmodels-example   2023.1    4c23cac0a7b1   5 months ago   3.34GB

Push the containers to the registry. This may take some time depending on the speed of your connection. Wait until both are complete.

docker push us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/mlflow-postprocess-example:2023.1

docker push us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/mlflow-statsmodels-example:2023.1

Verify the uploaded containers are available. From the GCP console, search for “Artifact Registry”, then select the registry.

With the packages published, they will be available to a Wallaroo instance.

Configure Wallaroo Private Registry for GCP Registry

The following process demonstrates how to configure a Wallaroo Instance to use the GCP Registry created and used in previous steps.

Prerequisites

Before starting, the following will be needed:

The GCP Registry service full URL as created from the Create the Registry process. This can be retrieved through the gcloud command:
```
gcloud artifacts repositories describe $REPOSITORY_NAME --location=$LOCATION
```
The private registry service account email address. This is retrieved as described in the process Set up Service Account Key.
The private registry service account credentials in base64. These were created in the step Convert SA Key to Base64. These can be displayed directly from the base Service Account credentials file with:
```
cat $KEYFILEBASE64
```
Configure for Kots Installations

For kots based installations of Wallaroo, use the following procedure. These are based on the Wallaroo Install Guides.

Log into the Wallaroo Administrative Dashboard from a Kubernetes terminal with administrative access to the Wallaroo instance with the following command, replacing the namespace wallaroo with the one where the Wallaroo instance is installed.
```
kubectl kots admin-console --namespace wallaroo
```
Select Config, then Private Model Container Registry.
Enable Provide private container registry credentials for model images.
Update the following fields:
1. Registry URL: Insert the full path of your registry. The GCP Registry format is $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE. In this example, it is us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/.
2. Email: The email address of the service account used with the registry service.
3. User: Set to _json_key_base64.
4. Password: Set to the private registry service account credentials in base64.
Scroll to the bottom and select Save Config.
When the update module appears, select Go to updated version.
Wait for the preflight checks to completed, then select Deploy.

Configure Wallaroo Via Helm

During either the installation process or updates, set the following in the local-values.yaml file:
1. privateModelRegistry:
  1. enabled: true
  2. secretName: model-registry-secret
  3. registry: Insert the full path of your registry. The GCP Registry format is $LOCATION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE. In this example, it is us-west1-docker.pkg.dev/wallaroo-dev-253816/doc-test-registry/.
  4. email: The email address of the service account used with the registry service.
  5. username: Set to _json_key_base64.
  6. password: Set to the private registry service account credentials in base64.
    For example:
```
# Other settings - DNS entries, etc.

# The private registry settings
privateModelRegistry:
enabled: true
secretName: model-registry-secret
registry: "YOUR REGISTRY URL:YOUR REGISTRY PORT"
email: "serviceAccount:doc-test@wallaroo-dev.iam.gserviceaccount.com"
username: "_json_key_base64_"
password: "abcde"
```
Install or update the Wallaroo instance via Helm as per the Wallaroo Helm Install instructions.

Once complete, the Wallaroo instance will be able to authenticate to the registry service and retrieve the images.

Setting Private Registry Configuration in Wallaroo

Configure Via Kots

If Wallaroo was installed via kots, use the following procedure to add the private model registry information.

Launch the Wallaroo Administrative Dashboard through a terminal linked to the Kubernetes cluster. Replace the namespace with the one used in your installation.
```
kubectl kots admin-console --namespace wallaroo
```
Launch the dashboard, by default at http://localhost:8800.
From the admin dashboard, select Config -> Private Model Container Registry.
Enable Provide private container registry credentials for model images.
Provide the following:
1. Registry URL: The URL of the Containerized Model Container Registry. Typically in the format host:port. In this example, the registry for GitHub is used. NOTE: When setting the URL for the Containerized Model Container Registry, only the actual service address is needed. For example: with the full URL of the model as ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2022.4, the URL would be ghcr.io/wallaroolabs.
2. email: The email address of the user authenticating to the registry service.
3. username: The username of the user authentication to the registry service.
4. password: The password of the user authentication or token to the registry service.
Scroll down and select Save config.
Deploy the new version.

Once complete, the Wallaroo instance will be able to authenticate to the Containerized Model Container Registry and retrieve the images.

Configure via Helm

During either the installation process or updates, set the following in the local-values.yaml file:
1. privateModelRegistry:
  1. enabled: true
  2. secretName: model-registry-secret
  3. registry: The URL of the private registry.
  4. email: The email address of the user authenticating to the registry service.
  5. username: The username of the user authentication to the registry service.
  6. password: The password of the user authentication to the registry service.
  For example:
```
# Other settings - DNS entries, etc.

# The private registry settings
privateModelRegistry:
  enabled: true
  secretName: model-registry-secret
  registry: "YOUR REGISTRY URL:YOUR REGISTRY PORT"
  email: "YOUR EMAIL ADDRESS"
  username: "YOUR USERNAME"
  password: "Your Password here"
```
Install or update the Wallaroo instance via Helm as per the Wallaroo Helm Install instructions.

Once complete, the Wallaroo instance will be able to authenticate to the registry service and retrieve the images.

8.5 - Wallaroo Support Bundle Generation Guide

How to generate support bundles to troubleshoot Wallaroo

To track potential issues, Wallaroo provides a method to create a support bundle: a collection of logs, configurations, and other information that is submitted to Wallaroo support staff to determine where an issue may be and offer a correction.

Support bundles are generated depending on the method of installation:

kots: If Wallaroo was installed via kots, the support bundle is generated through the Wallaroo Administrative Dashboard.
helm: If Wallaroo was installed via helm, the support bundle is generated through a command line process.

Generating via Kots

To manage support bundles:

Log into the administration console.
Select the Troubleshoot tab.
Select Analyze Wallaroo.
Select Download bundle to save the bundle file as a compressed archive. Depending on your browser settings the file download location can be specified.
Send the file to Wallaroo technical support.

At any time, any existing bundle can be examined and downloaded from the Troubleshoot tab.

Generating via Helm

If issues are detected in the Wallaroo instance, a support bundle file is generated using the support-bundle.yaml file provided by the Wallaroo support representative.

This support bundle is generated through the following command:

kubectl support-bundle support-bundle.yaml --interactive=false

9 - Wallaroo Backup and Restore Guides

How to backup Wallaroo data and restore it.

The following guides are made to help organizations configure backup Wallaroo data and restore it when needed.

9.1 - Wallaroo Instance Backup and Restore with Velero

How to backup a Wallaroo instance and restore it using Velero

One method of Wallaroo backup and restores is through the Velero application. This application provides a method of storing snapshots of the Wallaroo installation, including deployed pipelines, user settings, log files, etc., which can be retrieved and restored at a later date.

For full details and setup procedures, see the Velero Documentation. The installation steps below are intended as short guides.

The following procedures are for Wallaroo Enterprise installed via kots or helm in the cloud services listed below. These procedures are not tested for other environments.

Prerequisites

A Wallaroo Enterprise instance
A client connected to the Kubernetes environment hosting the Wallaroo instance running the velero client.
Kubernetes cloud storage, such as:
- Azure Storage Container
- Google Cloud Storage (GCS) Bucket
- AWS S3 Bucket

Velero contains both a client and a Kubernetes service that is used to manage backups and restores.

Client Install

The Velero client supports MacOS and Linux. Windows support is available but not officially supported. The following steps are based on the Velero CLI installation procedure.

MacOS Install

Velero is available on MacOS through the Homebrew project. With Homebrew installed, Velero is installed with the following command:

brew install velero

Linux Install

Velero is available through a tarball installation through the Velero releases page. Once downloaded, expand the tar.gz file and place the velero executable into an executable path directory.

Velero Kubernetes Install

The Velero service runs in the same Kubernetes environment where the Wallaroo instance is installed. Before installation, storage known as a bucket must be made available for the Velero service to place the backup files.

The following shows basic steps on creating the storage containers used for each major cloud service. Organizations are encourage to use these steps with the official Velero instructions, available from the links within each cloud provider section below.

9.1.1 - Velero AWS Cluster Installation

How to set up Velero with a AWS Kubernetes cluster

The following instructions are based on the Velero Plugin for AWS instructions.

These steps assume the user has installed the AWS Command-Line Interface (CLI) and has the necessary permissions to perform the steps below.

The following items are required to create the Velero bucket via a AWS S3 Storage:

S3 Bucket Name: The name of the S3 bucket used to store Wallaroo backups.
Amazon Web Services Region: The region where the Velero bucket is stored. This should be in the same region as the Wallaroo Kubernetes cluster.
Authentication Method: A method of authenticating to AWS for the Velero service either with an IAM user or kube2iam as defined in the Velero plugins for AWS Set permissions for Velero.

If these steps are complete, jump to the Install the Velero Service into the AWS Wallaroo Cluster.

Create AWS Bucket for Velero

Create the S3 bucket used for Velero based backups and restores with the following command, replacing the variables AWS_BUCKET_NAME and AWS_REGION based on your organization’s requirements. In the command below, if the region is us-east-1, remove the --create-bucket-configuration option.

AWS_BUCKET_NAME=<YOUR_BUCKET>
AWS_REGION=<YOUR_REGION>
aws s3api create-bucket \
    --bucket $AWS_BUCKET_NAME \
    --region $AWS_REGION \
    --create-bucket-configuration LocationConstraint=$AWS_REGION

Set Permissions for AWS Velero

There are multiple options for setting permissions for the Velero service in an AWS Kubernetes cluster as detailed in the Velero plugins for AWS Set permissions for Velero. The following examples assume the IAM user method as follows.

Create the IAM user. In this example, the name is velero.
```
aws iam create-user --user-name velero
```

Attach the following AWS policies to the new velero AWS user.

cat > velero-policy.json <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeVolumes",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags",
                "ec2:CreateVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::${BUCKET}"
            ]
        }
    ]
}
EOF

Create an access key for the velero user:

aws iam create-access-key --user-name velero

This creates the following sample output:

{
"AccessKey": {
        "UserName": "velero",
        "Status": "Active",
        "CreateDate": "2017-07-31T22:24:41.576Z",
        "SecretAccessKey": <AWS_SECRET_ACCESS_KEY>,
        "AccessKeyId": <AWS_ACCESS_KEY_ID>
}
}

Store the SecretAccessKey and AccessKeyID for the next step. In this case, the file ~/.credentials-velero-aws:
```
[default]
aws_access_key_id=<AWS_ACCESS_KEY_ID>
aws_secret_access_key=<AWS_SECRET_ACCESS_KEY>
```

Install the Velero Service into the AWS Wallaroo Cluster

The following procedure will install the Velero service into the AWS Kubernetes cluster hosting the Wallaroo instance.

Verify the connection to the GCP Kubernetes cluster hosting the Wallaroo instance.

kubectl get nodes
NAME                                             STATUS   ROLES    AGE   VERSION
aws-ce-default-pool-5dd3c344-fxs3   Ready    <none>   31s   v1.23.14-gke.1800
aws-ce-default-pool-5dd3c344-q95a   Ready    <none>   25d   v1.23.14-gke.1800
aws-ce-default-pool-5dd3c344-scmc   Ready    <none>   31s   v1.23.14-gke.1800
aws-ce-default-pool-5dd3c344-wnkn   Ready    <none>   31s   v1.23.14-gke.1800

Install Velero into the AWS Kubernetes cluster. This assumes the $BUCKET_NAME and $REGION variables from earlier, and the AWS velero user credentials are stored in ~/.credentials-velero-aws

velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.6.0 \
--bucket $BUCKET_NAME \
--backup-location-config region=$REGION \
--secret-file ./credentials-velero-aws \
--use-volume-snapshots=false \
--use-node-agent --wait

Once complete, verify the installation is complete by checking for the velero namespace in the Kubernetes cluster:

kubectl get namespaces
NAME              STATUS   AGE
default           Active   222d
kube-node-lease   Active   222d
kube-public       Active   222d
kube-system       Active   222d
velero            Active   5m32s
wallaroo          Active   7d23h

If using Kubernetes taints and tolerations for the Wallaroo installation, update the velero namespace to accept all pods:

kubectl -n velero patch ds node-agent -p='{"spec": {"template": {"spec": {"tolerations":[{"operator": "Exists"}]}}}}'

9.1.2 - Velero Azure Cluster Installation

How to set up Velero with a Azure Kubernetes cluster

The following instructions are based on the Velero Plugin for Microsoft Azure instructions.

These steps assume the user has installed the Azure Command-Line Interface (CLI) and has the necessary permissions to perform the steps below.

The following items are required to create the Velero bucket via a Microsoft Azure Storage Container:

Resource Group: The resource group that the storage container belongs to. It is recommended to either use the same Resource Group as the Azure Kubernetes cluster hosting the Wallaroo instance, or create a Resource Group in the same Azure location.
- Resource Group Location: The Azure location for the resource group.
Azure Storage Account ID: Used to manage the storage container settings.
Azure Storage Container Name: The name of the container being used.
Azure Kubernetes Cluster Name: The name of the Azure Kubernetes Cluster hosting the Wallaroo instance.
Create Azure Storage Account Access Key: This step sets a method for the Velero service to authenticate with Azure to create the backup and restore jobs. Velero recommends different options in its Velero Plugin for Microsoft Azure Set permissions for Velero documentation. The steps below will cover using a storage account access key.

If these elements are available, then skip straight to the Install Velero In the Wallaroo Azure Kubernetes Cluster step.

Get Azure Subscription ID

To retrieve the Azure Subscription ID:

Login to Microsoft Azure.
From the search bar, search for Subscription.
From the Subscriptions Dashboard, select the Subscription ID to be used and store it for later use.

Create Azure Resource Group

To create the Azure Resource Group, use the following command, replacing the variables $AZURE_VELERO_RESOURCE_GROUP and $AZURE_LOCATION with your organization’s requirements.

IMPORTANT NOTE: This resource group must be in the same Azure Subscription ID as in the Get Azure Subscription ID above.

az group create -n $AZURE_VELERO_RESOURCE_GROUP --location $AZURE_LOCATION

Create Azure Storage Account

To create the Azure Storage Account, the Azure Storage Account ID must be composed of only lower case alphanumeric characters and - and ., with the ID beginning or ending in an alphanumeric character. So velero-backup-account is appropriate, while VELERO_BACKUP will not. Update the variables $AZURE_VELERO_RESOURCE_GROUP and $AZURE_STORAGE_ACCOUNT_ID with your organization’s requirements.

AZURE_STORAGE_ACCOUNT_ID="wallaroo_velero_storage"
az storage account create \
    --name $AZURE_STORAGE_ACCOUNT_ID \
    --resource-group $AZURE_VELERO_RESOURCE_GROUP \
    --sku Standard_GRS \
    --encryption-services blob \
    --https-only true \
    --min-tls-version TLS1_2 \
    --kind BlobStorage \
    --access-tier Hot

Create Azure Storage Container

Use the following command to create the Azure Storage Container for use by the Velero service. Replace the BLOB_CONTAINER variable with your organization’s requirements. Note that this new container should have a unique name.

BLOB_CONTAINER=velero
az storage container create -n $BLOB_CONTAINER --public-access off --account-name $AZURE_STORAGE_ACCOUNT_ID

Create Azure Storage Account Access Key

This step sets a method for the Velero service to authenticate with Azure to create the backup and restore jobs. Velero recommends different options in its Velero Plugin for Microsoft Azure Set permissions for Velero documentation. Organizations are encouraged to use the method that aligns with their security requirements.

The steps below will cover using a storage account access key.

Set the default resource group to the same one used for the Valero Resource Group in the step Create Azure Resource Group.
```
az configure --defaults group=$AZURE_VELERO_RESOURCE_GROUP
```
Retrieve the Azure Storage Account Access Key using the $AZURE_STORAGE_ACCOUNT_ID created in the step Create Azure Storage Account. Store this key in a secure location.
```
AZURE_STORAGE_ACCOUNT_ACCESS_KEY=`az storage account keys list --account-name $AZURE_STORAGE_ACCOUNT_ID --query "[?keyName == 'key1'].value" -o tsv`
```
Store the name of the Azure Kubernetes cluster hosting the Wallaroo instance as $AZURE_CLOUD_NAME and the $AZURE_STORAGE_ACCOUNT_ACCESS_KEY into a secret key file. The following command will store it in the location ~/.credentials-velero-azure:
```
cat << EOF  > ~/.credentials-velero-azure
AZURE_STORAGE_ACCOUNT_ACCESS_KEY=${AZURE_STORAGE_ACCOUNT_ACCESS_KEY}
AZURE_CLOUD_NAME=AzurePublicCloud
EOF
```

Install Velero In the Wallaroo Azure Kubernetes Cluster

This step will install the Velero service into the Azure Kubernetes Cluster hosting the Wallaroo instance using the variables from the steps above.

Install the Velero service into the cluster with the following command:

velero install \
    --provider azure \
    --plugins velero/velero-plugin-for-microsoft-azure:v1.6.0 \
    --bucket $BLOB_CONTAINER \
    --secret-file ~/.credentials-velero-azure \
    --backup-location-config storageAccount=$AZURE_STORAGE_ACCOUNT_ID,storageAccountKeyEnvVar=AZURE_STORAGE_ACCOUNT_ACCESS_KEY \
    --use-volume-snapshots=false \
    --use-node-agent --wait

Once complete, verify the installation is complete by checking for the velero namespace in the Kubernetes cluster:

kubectl get namespaces
NAME              STATUS   AGE
default           Active   222d
kube-node-lease   Active   222d
kube-public       Active   222d
kube-system       Active   222d
velero            Active   5m32s
wallaroo          Active   7d23h

To view the logs for the Velero service installation, use the command kubectl logs deployment/velero -n velero.

If using Kubernetes taints and tolerations for the Wallaroo installation, update the velero namespace to accept all pods:

kubectl -n velero patch ds node-agent -p='{"spec": {"template": {"spec": {"tolerations":[{"operator": "Exists"}]}}}}'

9.1.3 - Velero GCP Cluster Installation

How to set up Velero with a GCP Kubernetes cluster

The following instructions are based on the Velero Plugin for Google Cloud Platform (GCP) instructions.

These steps assume the user has installed the gcloud Command-Line Interface (CLI) and gsutil tool and has the necessary permissions to perform the steps below.

The following items are required to create the Velero bucket via a GCP Bucket:

Google Cloud Platform (GCP) Project ID: The project ID for where commands are performed from.
Google Cloud Storage (GCS) Bucket: The object storage bucket where backups are stored.
Google Service Account (GSA): A Velero specific Google Service Account to backup and restore the Wallaroo instance when required.
Either a Google Service Account Key or Workload Identity: Either of these methods are used by the Velero service to authenticate to GCP for its backup and restore tasks.

If these items are already complete, jump to the step Install Velero In the Wallaroo GCP Kubernetes Cluster.

Create GCS Bucket

Create the GCS bucket for storing the Wallaroo backup and restores with the following command. Replace the variable $BUCKET_NAME based on your organization’s requirements.

BUCKET_NAME=<YOUR_BUCKET>

gsutil mb gs://$BUCKET_NAME/

Create Google Service Account for Velero

Create the Google Service Account for the Velero service using the following commands:

Retrieve your organization’s GCP Project ID and store it in the PROJECT_ID variable. Note that this will retrieve the default project ID for the gcloud configuration. Replace with the actual GCP Project ID as required.
```
PROJECT_ID=$(gcloud config get-value project)
```

Create the service account. Update the $GSA_NAME variable based on the organization’s requirements.

GSA_NAME=velero
gcloud iam service-accounts create $GSA_NAME \
    --display-name "Velero service account"

Use gcloud iam service-accounts list to list out the services.

gcloud iam service-accounts list
DISPLAY NAME                            EMAIL                                                                       DISABLED
Velero service account                  veleroexample.iam.gserviceaccount.com                  False

Select the email address for the new Velero service account and set the variable SERVICE_ACCOUNT_EMAIL equal to the accounts email address:
```
SERVICE_ACCOUNT_EMAIL=veleroexample.iam.gserviceaccount.com
```

Create a Custom Role with the following minimum positions, and bind it to the new Velero service account. The ROLE needs to be unique and DNS compliant.

ROLE="velero.server"
TITLE="Velero Server"

ROLE_PERMISSIONS=(
    compute.disks.get
    compute.disks.create
    compute.disks.createSnapshot
    compute.snapshots.get
    compute.snapshots.create
    compute.snapshots.useReadOnly
    compute.snapshots.delete
    compute.zones.get
    storage.objects.create
    storage.objects.delete
    storage.objects.get
    storage.objects.list
    iam.serviceAccounts.signBlob
)

gcloud iam roles create $ROLE \
    --project $PROJECT_ID \
    --title $TITLE \
    --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \
    --role projects/$PROJECT_ID/roles/$ROLE

Bind the bucket to the new Service Account:

gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET_NAME}

Grant Velero Service GCP Access

There are multiple methods of granting the Velero service GCP access as detailed in the Plugins for Google Cloud Platform (GCP) Grant access to Velero steps. The following examples will use the Service Account Key method.

Create the Google Service Account Key, and store it in a secure location. In this example, it is stored in ~/.credentials-velero-gcp:

gcloud iam service-accounts keys create ~/.credentials-velero-gcp \
    --iam-account $SERVICE_ACCOUNT_EMAIL

Install Velero In the Wallaroo GCP Kubernetes Cluster

The following steps assume that the Google Service Account Key method was used in the Grant Velero Service GCP Access. See the Plugins for Google Cloud Platform (GCP) Grant access to Velero for other methods.

To install the Velero service into the Kubernetes cluster hosting the Wallaroo service:

Verify the connection to the GCP Kubernetes cluster hosting the Wallaroo instance.

kubectl get nodes
NAME                                             STATUS   ROLES    AGE   VERSION
gke-wallaroodocs-ce-default-pool-5dd3c344-fxs3   Ready    <none>   31s   v1.23.14-gke.1800
gke-wallaroodocs-ce-default-pool-5dd3c344-q95a   Ready    <none>   25d   v1.23.14-gke.1800
gke-wallaroodocs-ce-default-pool-5dd3c344-scmc   Ready    <none>   31s   v1.23.14-gke.1800
gke-wallaroodocs-ce-default-pool-5dd3c344-wnkn   Ready    <none>   31s   v1.23.14-gke.1800

Install Velero into the GCP Kubernetes cluster. This assumes the $BUCKET_NAME variable from earlier, and the Google Service Account Key are stored in ~/.credentials-velero-gcp

velero install \
--provider gcp \
--plugins velero/velero-plugin-for-gcp:v1.6.0 \
--bucket $BUCKET_NAME \
--secret-file ~/.credentials-velero-gcp \
--use-volume-snapshots=false \
--use-node-agent --wait

Once complete, verify the installation is complete by checking for the velero namespace in the Kubernetes cluster:

kubectl get namespaces
NAME              STATUS   AGE
default           Active   222d
kube-node-lease   Active   222d
kube-public       Active   222d
kube-system       Active   222d
velero            Active   5m32s
wallaroo          Active   7d23h

If using Kubernetes taints and tolerations for the Wallaroo installation, update the velero namespace to accept all pods:

kubectl -n velero patch ds node-agent -p='{"spec": {"template": {"spec": {"tolerations":[{"operator": "Exists"}]}}}}'

9.1.4 - Wallaroo Backup and Restore with Velero Guide

How to use Velero in an installed Kubernetes cluster to back up and restore a Wallaroo instance

Once the Velero Installation Procedure and the Velero Kubernetes Install are complete, Wallaroo instance backups are performed through the following process:

Before starting the backup, force the Plateau service to complete writing logs so they can be captured by the backup.
```
kubectl -n wallaroo scale --replicas=0 deploy/plateau
kubectl -n wallaroo scale --replicas=1 deploy/plateau
```
Set the $BACKUP_NAME. This must be all lowercase characters or numbers or -/. and must end in alphanumeric characters.
```
BACKUP_NAME={give it your own name}
```
Issue the following backup command. The --exclude-namespaces is used to exclude namespaces that are not required for the Wallaroo backup and restore. By default, these are the namespaces velero, default, kube-node-lease, kube-public, and kube-system.
This process will back up all namespaces that are not excluded, including deployed Wallaroo pipelines. Add any other namespaces that should not be part of the backup to the --exclude-namespaces option as per your organization’s requirements.
```
velero backup create $BACKUP_NAME --default-volumes-to-fs-backup --include-cluster-resources=true --exclude-namespaces velero,default,kube-node-lease,kube-public,kube-system
```

To view the status of the backup, use the following command. Once the Completed field shows a date and time, the backup is complete.

velero backup describe $BACKUP_NAME
Name:         doctest-20230315a
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.23.15
            velero.io/source-cluster-k8s-major-version=1
            velero.io/source-cluster-k8s-minor-version=23

Phase:  Completed

Errors:    0
Warnings:  0

Namespaces:
Included:  *
Excluded:  velero, default, kube-node-lease, kube-public, kube-system

Resources:
Included:        *
Excluded:        <none>
Cluster-scoped:  included

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

CSISnapshotTimeout:  10m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2023-03-15 10:52:27 -0600 MDT
Completed:  2023-03-15 10:52:49 -0600 MDT

Expiration:  2023-04-14 10:52:27 -0600 MDT

Total items to be backed up:  397
Items backed up:              397

Velero-Native Snapshots: <none included>

restic Backups (specify --details for more information):
Completed:  5

List Previous Wallaroo Backups

To list previous Wallaroo backups and their logs, use the following commands below:

List backups with velero backup get to list all backups and their progress:

velero backup get
NAME                                    STATUS            ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
doctest-20230315a                       Completed         0        0          2023-03-15 10:52:27 -0600 MDT   28d       default            <none>
doctest-magicalbear-20230315            Completed         0        1          2023-03-15 11:52:17 -0600 MDT

Retrieve backup logs with velero backup logs $BACKUP_NAME:
```
velero backup logs $BACKUP_NAME
```

Wallaroo Restore Procedure

To restore a from a Wallaroo backup:

Set the backup name as the variable $BACKUP_NAME. Use the command velero backup get for a list of previous backups.

 velero backup get
 NAME                                    STATUS            ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
 doctest-20230315a                       Completed         0        0          2023-03-15 10:52:27 -0600 MDT   28d       default            <none>
 doctest-magicalbear-20230315            Completed         0        1          2023-03-15 11:52:17 -0600 MDT   

 BACKUP_NAME={give it your own name}

Use the velero restore create command to create the restore job, using the $BACKUP_NAME variable set in the step above.

velero restore create --from-backup $BACKUP_NAME
Restore request "doctest-20230315a-20230315105647" submitted successfully.
Run `velero restore describe doctest-20230315a-20230315105647` or `velero restore logs doctest-20230315a-20230315105647` for more details.

To check the restore status, use the velero restore describe command. The optional flag –details provides more information.
```
velero restore describe doctest-20230315a-20230315105647 --details
```
If the Kubernetes cluster does not have a static IP address assigned to the Wallaroo loadBalancer service, the DNS information may need to be updated if the IP address has changed. Check with the DNS Integration Guide for more information.

10 - Wallaroo ML Workload Orchestration Management

How to use Wallaroo ML Workload Orchestrations

Wallaroo provides ML Workload Orchestrations and Tasks to automate processes in a Wallaroo instance. For example:

Deploy a pipeline, retrieve data through a data connector, submit the data for inferences, undeploy the pipeline
Replace a model with a new version
Retrieve shadow deployed inference results and submit them to a database

Orchestration Flow

ML Workload Orchestration flow works within 3 tiers:

Tier	Description
ML Workload Orchestration	User created custom instructions that provide automated processes that follow the same steps every time without error. Orchestrations contain the instructions to be performed, uploaded as a .ZIP file with the instructions, requirements, and artifacts.
Task	Instructions on when to run an Orchestration as a scheduled Task. Tasks can be Run Once, where is creates a single Task Run, or Run Scheduled, where a Task Run is created on a regular schedule based on the Kubernetes cronjob specifications. If a Task is Run Scheduled, it will create a new Task Run every time the schedule parameters are met until the Task is killed.
Task Run	The execution of an task. These validate business operations are successful identify any unsuccessful task runs. If the Task is Run Once, then only one Task Run is generated. If the Task is a Run Scheduled task, then a new Task Run will be created each time the schedule parameters are met, with each Task Run having its own results and logs.

One example may be of making donuts.

The ML Workload Orchestration is the recipe.
The Task is the order to make the donuts. It might be Run Once, so only one set of donuts are made, or Run Scheduled, so donuts are made every 2nd Sunday at 6 AM. If Run Scheduled, the donuts are made every time the schedule hits until the order is cancelled (aka killed).
The Task Run are the donuts with their own receipt of creation (logs, etc).

For details on uploading ML Workload Orchestrations, see the Wallaroo SDK Essentials Guide: ML Workload Orchestration and the Wallaroo MLOps API Essentials Guide: ML Workload Orchestration Management.

10.1 - ML Workload Orchestration Configuration Guide

How to enable or disable Wallaroo ML Workload Orchestration

Wallaroo ML Workload Orchestration allows organizations to automate processes and run them either on demand or on a reoccurring schedule.

The following guide shows how to enable Wallaroo ML Workload Orchestration in a Wallaroo instance.

Enable Wallaroo ML Workload Orchestration for Kots Installations

Organizations that install Wallaroo using kots can enable or disable Wallaroo ML Workload Orchestration from the Wallaroo Administrative Dashboard through the following process.

From a terminal shell with access to the Kubernetes environment hosting the Wallaroo instance, run the following. Replace wallaroo with the namespace used for the Wallaroo installation.
```
kubectl kots admin-console --namespace wallaroo
```
Access the Wallaroo Administrative Dashboard through a browser. By default, http://localhost:8080. Enter the administrative password when requested.
From the top navigation menu, select Config.
From Pipeline Orchestration, either enable or disable Enable Pipeline Orchestration service.
Scroll to the bottom and select Save config.
From the The config for Wallaroo has been updated. module, select Go to updated version.
Select the most recent version with the updated configuration and select Deploy.

The update process will take 10-15 minutes depending on your Wallaroo instance and other changes made.

Enable Wallaroo ML Workload Orchestration for Helm Installations

To enable Wallaroo ML Workload Orchestration for Helm based installations of Wallaroo:

Set the local values YAML file - by default local-values.yaml with the following:
```
orchestration:
    enabled: true
```
If installing Wallaroo, then format the helm install command as follows, setting the $RELEASE, $REGISTRYURL, $VERSION, and $LOCALVALUES.yaml as required for the installation settings.
```
helm install $RELEASE $REGISTRYURL --version $VERSION --values $LOCALVALUES.yaml
```
If performing an update to the Wallaroo instance configuration, then use the helm upgrade, , setting the $RELEASE, $REGISTRYURL, $VERSION, and $LOCALVALUES.yaml as required for the installation settings.

For example, to upgrade the registration wallaroo from the EE channel the command would be:

helm upgrade wallaroo oci://registry.replicated.com/wallaroo/uat/wallaroo --version 2023.1.0-2662 --values local-values.yaml

The update process will take 10-15 minutes depending on your Wallaroo instance and other changes made.

References

Wallaroo SDK ML Workload Orchestration Guide for details on creating and using orchestrations through the Wallaroo SDK.
Wallaroo Install Guides.

10.2 - Wallaroo ML Workload Orchestrations User Interface

How to view uploaded ML Workload Orchestrations and their associates Tasks

Orchestrations and their Tasks and Task Runs are visible through the Wallaroo Dashboard through the following process.

IMPORTANT NOTE

If you do not see the Workloads link at the top of the Wallaroo Dashboard, check with your Wallaroo administrator to verify that ML Workloads are enabled. See the ML Workload Orchestration Configuration Guide for full details.

From the Wallaroo Dashboard, select the workspace where the workloads were uploaded to.
From the upper right corner, select Workloads.
The list of uploaded ML Workload Orchestrations are displayed with the following:
- Search Bar (A): Filter the list by workload name.
- Status Filter (B): Filter the list by:
  - Only Active: Only show Active workloads.
  - Only Inactive: Only show Inactive workloads.
  - Only Error: Only show workloads flagged with an Error.
- Workload (C): The assigned name of the workload.
- Status (D): Whether the workload orchestration status is Active, Inactive, or has an Error.
- Created At (E): The date the Orchestration was uploaded.
- ID (F): The unique workload orchestration ID in UUID format.
Select a workload orchestration to view Tasks generated from this workload. The Orchestration Page has the following details:
- Orchestration Details:
  - Orchestration Name (A): The name assigned when the workload was created from the workload orchestration.
  - Orchestration ID (B): The ID of the workload in UUID format.
  - Created (C): The date the workload was uploaded.
  - File (D): The file the workload was uploaded from.
  - File Hash (E): The hash of the workload file.
- Workload Log: Each Task generated from the Orchestration displays the following:
  - Type (F): Tasks are shown as either Run Once (a lightning bolt icon) or Run Scheduled (circular arrow icon).
  - Task Name (G): The name of the task.
  - Task ID (H): The ID of the task in UUID format.
  - Last Run Status (I): The status of the task’s last Task Run as either Success or Failure.
  - Run At (J): For Run Once tasks, shows the date and time the Task Run was started. For Run Scheduled tasks, shows the date and time the last Task Run was started, and the next scheduled run for the next Task Run. The actual Task Run start time may vary due to multiple factors.

10.3 - Wallaroo ML Workload Orchestration Requirements

Requirements for uploading a Wallaroo ML Workload Orchestration

Orchestration Requirements

Orchestrations are uploaded to the Wallaroo instance as a ZIP file with the following requirements:

Parameter	Type	Description
User Code	(Required) Python script as `.py` files	If `main.py` exists, then that will be used as the task entrypoint. Otherwise, the first `main.py` found in any subdirectory will be used as the entrypoint. If no `main.py` is found, the orchestration will not be accepted.
Python Library Requirements	(Optional) `requirements.txt` file in the requirements file format.	A standard Python requirements.txt for any dependencies to be provided in the task environment. The Wallaroo SDK will already be present and should not be included in the requirements.txt. Multiple requirements.txt files are not allowed.
Other artifacts		Other artifacts such as files, data, or code to support the orchestration.

Zip Instructions

In a terminal with the zip command, assemble artifacts as above and then create the archive. The zip command is included by default with the Wallaroo JupyterHub service.

zip commands take the following format, with {zipfilename}.zip as the zip file to save the artifacts to, and each file thereafter as the files to add to the archive.

zip {zipfilename}.zip file1, file2, file3....

For example, the following command will add the files main.py and requirements.txt into the file hello.zip.

$ zip hello.zip main.py requirements.txt 
  adding: main.py (deflated 47%)
  adding: requirements.txt (deflated 52%)

Example requirements.txt file

dbt-bigquery==1.4.3
dbt-core==1.4.5
dbt-extractor==0.4.1
dbt-postgres==1.4.5
google-api-core==2.8.2
google-auth==2.11.0
google-auth-oauthlib==0.4.6
google-cloud-bigquery==3.3.2
google-cloud-bigquery-storage==2.15.0
google-cloud-core==2.3.2
google-cloud-storage==2.5.0
google-crc32c==1.5.0
google-pasta==0.2.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.4

Orchestration Recommendations

The following recommendations will make using Wallaroo orchestrations.

The version of Python used should match the same version as in the Wallaroo JupyterHub service.
The same version of the Wallaroo SDK should match the server. For a 2023.2.1 Wallaroo instance, use the Wallaroo SDK version 2023.2.1.
Specify the version of pip dependencies.
The wallaroo.Client constructor auth_type argument is ignored. Using wallaroo.Client() is sufficient.
The following methods will assist with orchestrations:
- wallaroo.in_task() : Returns True if the code is running within an orchestration task.
- wallaroo.task_args(): Returns a Dict of invocation-specific arguments passed to the run_ calls.
Orchestrations will be run in the same way as running within the Wallaroo JupyterHub service, from the version of Python libraries (unless specifically overridden by the requirements.txt setting, which is not recommended), and running in the virtualized directory /home/jovyan/.

Orchestration Code Samples

The following demonstres using the wallaroo.in_task() and wallaroo.task_args() methods within an Orchestration. This sample code uses wallaroo.in_task() to verify whether or not the script is running as a Wallaroo Task. If true, it will gather the wallaroo.task_args() and use them to set the workspace and pipeline. If False, then it sets the pipeline and workspace manually.

# get the arguments
wl = wallaroo.Client()

# if true, get the arguments passed to the task
if wl.in_task():
  arguments = wl.task_args()
  
  # arguments is a key/value pair, set the workspace and pipeline name
  workspace_name = arguments['workspace_name']
  pipeline_name = arguments['pipeline_name']
  
# False:  We're not in a Task, so set the pipeline manually
else:
  workspace_name="bigqueryworkspace"
  pipeline_name="bigquerypipeline"

Wallaroo Operations Guide

1 - Wallaroo Install Guides

1.1 - Wallaroo Prerequisites Guide

Environment Requirements

Environment Hardware Requirements

Enterprise Network Requirements

Environment Software Requirements

Node Selectors

Cost Calculators

Kubernetes Admin Requirements

kubectl Quick Install Guide

kubectl Install For Deb Package based Linux Systems

kubectl Install For macOS Using Homebrew

kots Quick Install Guide

Manual Kots Install

1.2 - Wallaroo Enterprise Install Guides

1.2.1 - Wallaroo Enterprise Comprehensive Install Guide

Custom Configurations

Environment Setup Guides

Uninstall Guides

Environment Setup Guides

Install Wallaroo

IMPORTANT NOTE

Automated Install

Interactive Install

Configure Wallaroo

DNS Services

User Management

1.2.2 - Wallaroo Enterprise Simple Install Guide

Install Wallaroo

Configure Wallaroo

DNS Services

User Management

1.2.3 - Wallaroo Enterprise Air Gap Install Guide

Average Install Time

Environment Requirements

Environment Hardware Requirements

Enterprise Network Requirements

Environment Software Requirements

Node Selectors

Install Instructions

Download Assets

IMPORTANT NOTE

Install Kots

Install the Kots Admin Console

Install Wallaroo Airgap

Preflight Checks

Wallaroo Admin Console

Status Checks

Troubleshooting

Example Registry Service Install

Private Container Registry Service Install Process

1.2.4 - Wallaroo Enterprise Helm Setup and Install Guides

1.2.4.1 - Wallaroo Helm Standard Cloud Install Procedures

Environment Requirements

Environment Hardware Requirements

Enterprise Network Requirements

Environment Software Requirements

Node Selectors

Kubernetes Installation Instructions

Install Kubernetes

Install Helm

Install Krew

Install Support Tools

Install Wallaroo via Helm

Wallaroo Provided Data

Registration Login

Preflight Verification

IMPORTANT NOTE

IMPORTANT NOTE

Install Wallaroo

Troubleshoot Wallaroo

Uninstall

IMPORTANT NOTE

1.2.4.2 - Wallaroo Helm Reference Guides

1.2.4.2.1 - Wallaroo Helm Reference Table

Configuration

1.2.4.2.2 - Wallaroo Helm Reference Details

post_delete_hook

imageRegistry