Wallaroo Enterprise Comprehensive Install Guide
Table of Contents
This guide is targeted towards system administrators and data scientists who want to work with the easiest, fastest, and comprehensive method of running your own machine learning models.
A typical installation of Wallaroo follows this process:
Step | Description | Average Setup Time |
---|---|---|
Setup Environment | Create an environment that meets the Wallaroo prerequisites | 30 minutes |
Install Wallaroo | Install Wallaroo into a prepared environment | 15 minutes |
Configure Wallaroo | Update Wallaroo with required post-install configurations. | Variable |
Some knowledge of the following will be useful in working with this guide:
- Working knowledge of Linux distributions, particularly Ubuntu.
- A cloud provider including Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure experience.
- Working knowledge of Kubernetes, mainly
kubectl
andkots
orhelm
.
For more information, Contact Us for additional details.
The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.
Software or Runtime | Description | Minimum Supported Version | Preferred Version(s) |
---|---|---|---|
Kubernetes | Cluster deployment management | 1.23 | 1.25 |
containerd | Container Management | 1.7.0 | 1.7.0 |
kubectl | Kubernetes administrative console application | 1.26 | 1.26 |
Custom Configurations
Wallaroo can be configured with custom installations depending on your organization’s needs. The following options are available:
- Install Wallaroo to Specific Nodes: How to specify what nodes in a Kubernetes cluster to install Wallaroo to.
- Install Wallaroo with Minimum Services: How to install Wallaroo with reduced services to fit into a lower-resource environment.
- Taints and Tolerances Guide: How to configure Wallaroo for specific taints and tolerances so ensure that only Wallaroo services are running in specific nodes.
Environment Setup Guides
The following setup guides are used to set up the environment that will host the Wallaroo instance. Verify that the environment is prepared and meets the Wallaroo Prerequisites Guide.
Uninstall Guides
The following is a short version of the uninstallation procedure to remove a previously installed version of Wallaroo. For full details, see the How to Uninstall Wallaroo. These instructions assume administrative use of the Kubernetes command kubectl
.
To uninstall a previously installed Wallaroo instance:
Delete any Wallaroo pipelines still deployed with the command
kubectl delete namespace {namespace}
. Typically these are the pipeline name with some numerical ID. For example, in the following list of namespaces the namespaceccfraud-pipeline-21
correspond to the Wallaroo pipelineccfraud-pipeline
. Verify these are Wallaroo pipelines before deleting.-> kubectl get namespaces NAME STATUS AGE default Active 7d4h kube-node-lease Active 7d4h kube-public Active 7d4h ccfraud-pipeline-21 Active 4h23m wallaroo Active 3d6h -> kubectl delete namespaces ccfraud-pipeline-21
Use the following bash script or run the commands individually. Warning: If the selector is incorrect or missing from the kubectl command, the cluster could be damaged beyond repair. For a default installation, the selector and namespace will be
wallaroo
.#!/bin/bash kubectl delete ns wallaroo && \ kubectl delete all,secret,configmap,clusterroles,clusterrolebindings,storageclass,crd \ --selector app.kubernetes.io/part-of=wallaroo --selector kots.io/app-slug=wallaroo
Wallaroo can now be reinstalled into this environment.
Environment Setup Guides
- AWS Cluster for Wallaroo Enterprise Instructions
The following instructions are made to assist users set up their Amazon Web Services (AWS) environment for running Wallaroo Enterprise using AWS Elastic Kubernetes Service (EKS).
These represent a recommended setup, but can be modified to fit your specific needs.
- AWS Prerequisites
To install Wallaroo in your AWS environment based on these instructions, the following prerequisites must be met:
- Register an AWS account: https://aws.amazon.com/ and assign the proper permissions according to your organization’s needs.
- The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux with using the
containerd
driver. - Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Recommended Aws Machine type:
c5.4xlarge
. For more information, see the AWS Instance Types.
- Nodes must be OS type Linux with using the
- Installed eksctl version 0.101.0 and above.
- If the cluster will utilize autoscaling, install the Cluster Autoscaler on AWS.
IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.
- AWS Environment Setup Steps
The following steps are guidelines to assist new users in setting up their AWS environment for Wallaroo. Feel free to replace these with commands with ones that match your needs.
These commands make use of the command line tool eksctl which streamlines the process in creating Amazon Elastic Kubernetes Service clusters for our Wallaroo environment.
The following are used for the example commands below. Replace them with your specific environment settings:
AWS Cluster Name:
wallarooAWS
Create an AWS EKS Cluster
The following eksctl
configuration file is an example of setting up the AWS environment for a Wallaroo cluster, including the static and adaptive nodepools. Adjust these names and settings based on your organizations requirements.
This sample YAML file can be downloaded from here:wallaroo_enterprise_aws_install.yaml
Or copied from here:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: wallarooAWS
region: us-east-1
version: "1.25"
addons:
- name: aws-ebs-csi-driver
iam:
withOIDC: true
serviceAccounts:
- metadata:
name: cluster-autoscaler
namespace: kube-system
labels: {aws-usage: "cluster-ops"}
wellKnownPolicies:
autoScaler: true
roleName: eksctl-cluster-autoscaler-role
nodeGroups:
- name: mainpool
instanceType: m5.2xlarge
desiredCapacity: 3
containerRuntime: containerd
amiFamily: AmazonLinux2
availabilityZones:
- us-east-1a
- name: postgres
instanceType: m5.2xlarge
desiredCapacity: 1
taints:
- key: wallaroo.ai/postgres
value: "true"
effect: NoSchedule
containerRuntime: containerd
amiFamily: AmazonLinux2
availabilityZones:
- us-east-1a
- name: engine-lb
instanceType: c5.4xlarge
minSize: 1
maxSize: 3
taints:
- key: wallaroo.ai/enginelb
value: "true"
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine-lb
k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
iam:
withAddonPolicies:
autoScaler: true
containerRuntime: containerd
amiFamily: AmazonLinux2
availabilityZones:
- us-east-1a
- name: engine
instanceType: c5.2xlarge
minSize: 1
maxSize: 3
taints:
- key: wallaroo.ai/engine
value: "true"
effect: NoSchedule
tags:
k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine
k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
iam:
withAddonPolicies:
autoScaler: true
containerRuntime: containerd
amiFamily: AmazonLinux2
availabilityZones:
- us-east-1a
- Create the Cluster
Create the cluster with the following command, which creates the environment and sets the correct Kubernetes version.
eksctl create cluster -f wallaroo_enterprise_aws_install.yaml
During the process the Kubernetes credentials will be copied into the local environment. To verify the setup is complete, use the kubectl get nodes
command to display the available nodes as in the following example:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-192-168-21-253.us-east-2.compute.internal Ready <none> 13m v1.23.8-eks-9017834
ip-192-168-30-36.us-east-2.compute.internal Ready <none> 13m v1.23.8-eks-9017834
ip-192-168-38-31.us-east-2.compute.internal Ready <none> 9m46s v1.23.8-eks-9017834
ip-192-168-55-123.us-east-2.compute.internal Ready <none> 12m v1.23.8-eks-9017834
ip-192-168-79-70.us-east-2.compute.internal Ready <none> 13m v1.23.8-eks-9017834
ip-192-168-37-222.us-east-2.compute.internal Ready <none> 13m v1.23.8-eks-9017834
- Azure Cluster for Wallaroo Enterprise Instructions
The following instructions are made to assist users set up their Microsoft Azure Kubernetes environment for running Wallaroo Enterprise. These represent a recommended setup, but can be modified to fit your specific needs.
If your prepared to install the environment now, skip to Setup Environment Steps.
There are two methods we’ve detailed here on how to setup your Kubernetes cloud environment in Azure:
Quick Setup Script: Download a bash script to automatically set up the Azure environment through the Microsoft Azure command line interface
az
.Manual Setup Guide: A list of the
az
commands used to create the environment through manual commands.
To install Wallaroo in your Microsoft Azure environment, the following prerequisites must be met:
- Register a Microsoft Azure account: https://azure.microsoft.com/.
- Install the Microsoft Azure CLI and complete the Azure CLI Get Started Guide to connect your
az
application to your Microsoft Azure account. - The Kubernetes cluster must include the following minimum settings:
- Nodes must be OS type Linux the
containerd
driver as the default. - Role-based access control (RBAC) must be enabled.
- Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
- RBAC is enabled.
- Minimum machine type is set to to
Standard_D8s_v4
.
- Nodes must be OS type Linux the
IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.
- Standard Setup Variables
The following variables are used in the Quick Setup Script and the Manual Setup Guide detailed below. Modify them as best fits your organization.
Variable Name | Default Value | Description |
---|---|---|
WALLAROO_RESOURCE_GROUP | wallaroogroup | The Azure Resource Group used for the KUbernetes environment. |
WALLAROO_GROUP_LOCATION | eastus | The region that the Kubernetes environment will be installed to. |
WALLAROO_CONTAINER_REGISTRY | wallarooacr | The Azure Container Registry used for the Kubernetes environment. |
WALLAROO_CLUSTER | wallarooaks | The name of the Kubernetes cluster that Wallaroo is installed to. |
WALLAROO_SKU_TYPE | Base | The Azure Kubernetes Service SKU type. |
WALLAROO_VM_SIZE | Standard_D8s_v4 | The VM type used for the standard Wallaroo cluster nodes. |
POSTGRES_VM_SIZE | Standard_D8s_v4 | The VM type used for the postgres nodepool. |
ENGINELB_VM_SIZE | Standard_D8s_v4 | The VM type used for the engine-lb nodepool. |
ENGINE_VM_SIZE | Standard_F8s_v2 | The VM type used for the engine nodepool. |
Setup Environment Steps
Quick Setup Script
A sample script is available here, and creates an Azure Kubernetes environment ready for use with Wallaroo Enterprise. This script requires the following prerequisites listed above and uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.
The following script is available for download: wallaroo_enterprise_azure_expandable.bash
The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.
- Download the script above.
- In a terminal window set the script status as
execute
with the commandchmod +x wallaroo_enterprise_install_azure_expandable.bash
. - Modify the script variables listed above based on your requirements.
- Run the script with either
bash wallaroo_enterprise_install_azure_expandable.bash
or./wallaroo_enterprise_install_azure_expandable.bash
from the same directory as the script.
- Manual Setup Guide
The following steps are guidelines to assist new users in setting up their Azure environment for Wallaroo.
The process uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.
See the Azure Command-Line Interface for full details on commands and settings.
Setting up an Azure AKS environment is based on the Azure Kubernetes Service tutorial, streamlined to show the minimum steps in setting up your own Wallaroo environment in Azure.
This follows these major steps:
- Set Variables
The following are the variables used for the rest of the commands. Modify them as fits your organization’s needs.
WALLAROO_RESOURCE_GROUP=wallaroogroup
WALLAROO_GROUP_LOCATION=eastus
WALLAROO_CONTAINER_REGISTRY=wallarooacr
WALLAROO_CLUSTER=wallarooaks
WALLAROO_SKU_TYPE=Base
WALLAROO_VM_SIZE=Standard_D8s_v4
POSTGRES_VM_SIZE=Standard_D8s_v4
ENGINELB_VM_SIZE=Standard_D8s_v4
ENGINE_VM_SIZE=Standard_F8s_v2
- Create an Azure Resource Group
To create an Azure Resource Group for Wallaroo in Microsoft Azure, use the following template:
az group create --name $WALLAROO_RESOURCE_GROUP --location $WALLAROO_GROUP_LOCATION
(Optional): Set the default Resource Group to the one recently created. This allows other Azure commands to automatically select this group for commands such as az aks list
, etc.
az configure --defaults group={Resource Group Name}
For example:
az configure --defaults group=wallarooGroup
- Create an Azure Container Registry
An Azure Container Registry(ACR) manages the container images for services includes Kubernetes. The template for setting up an Azure ACR that supports Wallaroo is the following:
az acr create -n $WALLAROO_CONTAINER_REGISTRY \
-g $WALLAROO_RESOURCE_GROUP \
--sku $WALLAROO_SKU_TYPE \
--location $WALLAROO_GROUP_LOCATION
- Create an Azure Kubernetes Services
Now we can create our Kubernetes service in Azure that will host our Wallaroo with the az aks create
command.
az aks create \
--resource-group $WALLAROO_RESOURCE_GROUP \
--name $WALLAROO_CLUSTER \
--node-count 3 \
--generate-ssh-keys \
--vm-set-type VirtualMachineScaleSets \
--load-balancer-sku standard \
--node-vm-size $WALLAROO_VM_SIZE \
--nodepool-name mainpool \
--attach-acr $WALLAROO_CONTAINER_REGISTRY \
--kubernetes-version=1.23.15 \
--zones 1 \
--location $WALLAROO_GROUP_LOCATION
- Wallaroo Enterprise Nodepools
Wallaroo Enterprise supports autoscaling and static nodepools. The following commands are used to create both to support the Wallaroo Enterprise cluster.
The following static nodepools are set up to support the Wallaroo cluster for postgres
. Update the VM_SIZE
based on your requirements.
az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name postgres \
--node-count 1 \
--node-vm-size $POSTGRES_VM_SIZE \
--no-wait \
--node-taints wallaroo.ai/postgres=true:NoSchedule \
--zones 1
The following autoscaling nodepools are used for the engineLB
and the engine
nodepools. Adjust the settings based on your organizations requirements.
az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name enginelb \
--node-count 1 \
--node-vm-size $ENGINELB_VM_SIZE \
--no-wait \
--enable-cluster-autoscaler \
--max-count 3 \
--min-count 1 \
--node-taints wallaroo.ai/enginelb=true:NoSchedule \
--labels wallaroo-node-type=enginelb \
--zones 1
az aks nodepool add \
--resource-group $WALLAROO_RESOURCE_GROUP \
--cluster-name $WALLAROO_CLUSTER \
--name engine \
--node-count 1 \
--node-vm-size $ENGINE_VM_SIZE \
--no-wait \
--enable-cluster-autoscaler \
--max-count 3 \
--min-count 1 \
--node-taints wallaroo.ai/engine=true:NoSchedule \
--labels wallaroo-node-type=engine \
--zones 1
For additional settings such as customizing the node pools for your Wallaroo Kubernetes cluster to customize the type of virtual machines used and other settings, see the Microsoft Azure documentation on using system node pools.
- Download Wallaroo Kubernetes Configuration
Once the Kubernetes environment is complete, associate it with the local Kubernetes configuration by importing the credentials through the following template command:
az aks get-credentials --resource-group $WALLAROO_RESOURCE_GROUP --name $WALLAROO_CLUSTER
Verify the cluster is available through the kubectl get nodes
command.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-engine-99896855-vmss000000 Ready agent 40m v1.23.8
aks-enginelb-54433467-vmss000000 Ready agent 48m v1.23.8
aks-mainpool-37402055-vmss000000 Ready agent 81m v1.23.8
aks-mainpool-37402055-vmss000001 Ready agent 81m v1.23.8
aks-mainpool-37402055-vmss000002 Ready agent 81m v1.23.8
aks-postgres-40215394-vmss000000 Ready agent 52m v1.23.8
The following instructions are made to assist users set up their Google Cloud Platform (GCP) Kubernetes environment for running Wallaroo. These represent a recommended setup, but can be modified to fit your specific needs. In particular, these instructions will provision a GKE cluster with 56 CPUs in total. Please ensure that your project’s resource limits support that.
Quick Setup Script: Download a bash script to automatically set up the GCP environment through the Google Cloud Platform command line interface
gcloud
.Manual Setup Guide: A list of the
gcloud
commands used to create the environment through manual commands.
Organizations that wish to run Wallaroo in their Google Cloud Platform environment must complete the following prerequisites:
- Register a Google Cloud Account: https://cloud.google.com/
- Create a Google Cloud project: https://cloud.google.com/resource-manager/docs/creating-managing-projects
- Install
gcloud
and rungcloud init
orgcloud init
–console on the local system used to set up your environment: https://cloud.google.com/sdk/docs/install - Enable the Google Compute Engine(GCE): https://cloud.google.com/endpoints/docs/openapi/enable-api
- Enable the Google Kubernetes Engine(GKE) on your project: https://console.cloud.google.com/apis/enableflow?apiid=container.googleapis.com
- Select a default Computer Engine region and zone: https://cloud.google.com/compute/docs/regions-zones.
IMPORTANT NOTE
Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.
Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.
- Standard Setup Variables
The following variables are used in the Quick Setup Script and the Manual Setup Guide. Modify them as best fits your organization.
Variable Name | Default Value | Description |
---|---|---|
WALLAROO_GCP_PROJECT | wallaroo | The name of the Google Project used for the Wallaroo instance. |
WALLAROO_CLUSTER | wallaroo | The name of the Kubernetes cluster for the Wallaroo instance. |
WALLAROO_GCP_REGION | us-central1 | The region the Kubernetes environment is installed to. Update this to your GCP Computer Engine region. |
WALLAROO_NODE_LOCATION | us-central1-f | The location the Kubernetes nodes are installed to. Update this to your GCP Compute Engine Zone. |
WALLAROO_GCP_NETWORK_NAME | wallaroo-network | The Google network used with the Kubernetes environment. |
WALLAROO_GCP_SUBNETWORK_NAME | wallaroo-subnet-1 | The Google network subnet used with the Kubernets environment. |
DEFAULT_VM_SIZE | e2-standard-8 | The VM type used for the default nodepool. |
POSTGRES_VM_SIZE | n2-standard-8 | The VM type used for the postgres nodepool. |
ENGINELB_VM_SIZE | c2-standard-8 | The VM type used for the engine-lb nodepool. |
ENGINE_VM_SIZE | c2-standard-8 | The VM type used for the engine nodepool. |
- Quick Setup Script
A sample script is available here, and creates a Google Kubernetes Engine cluster ready for use with Wallaroo Enterprise. This script requires the prerequisites listed above and uses the variables as listed in Standard Setup Variables
The following script is available for download: wallaroo_enterprise_gcp_expandable.bash
The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.
- Download the script above.
- In a terminal window set the script status as
execute
with the commandchmod +x bash wallaroo_enterprise_gcp_expandable.bash
. - Modify the script variables listed above based on your requirements.
- Run the script with either
bash wallaroo_enterprise_gcp_expandable.bash
or./wallaroo_enterprise_gcp_expandable.bash
from the same directory as the script.
- Set Variables
The following are the variables used in the environment setup process. Modify them as best fits your organization’s needs.
WALLAROO_GCP_PROJECT=wallaroo
WALLAROO_CLUSTER=wallaroo
WALLAROO_GCP_REGION=us-central1
WALLAROO_NODE_LOCATION=us-central1-f
WALLAROO_GCP_NETWORK_NAME=wallaroo-network
WALLAROO_GCP_SUBNETWORK_NAME=wallaroo-subnet-1
DEFAULT_VM_SIZE=n2-standard-8
POSTGRES_VM_SIZE=n2-standard-8
ENGINELB_VM_SIZE=c2-standard-8
ENGINE_VM_SIZE=c2-standard-8
- Manual Setup Guide
The following steps are guidelines to assist new users in setting up their GCP environment for Wallaroo. The variables used in the commands are as listed in Standard Setup Variables listed above. Feel free to replace these with ones that match your needs.
See the Google Cloud SDK for full details on commands and settings.
- Create a GCP Network
First create a GCP network that is used to connect to the cluster with the gcloud compute networks create
command. For more information, see the gcloud compute networks create page.
gcloud compute networks \