Wallaroo Enterprise Comprehensive Install Guide

How to set up Wallaroo Enterprise, environments, and other configurations.

Table of Contents

This guide is targeted towards system administrators and data scientists who want to work with the easiest, fastest, and comprehensive method of running your own machine learning models.

A typical installation of Wallaroo follows this process:

StepDescription   Average Setup Time   
Setup EnvironmentCreate an environment that meets the Wallaroo prerequisites30 minutes
Install WallarooInstall Wallaroo into a prepared environment15 minutes
Configure WallarooUpdate Wallaroo with required post-install configurations.Variable

Some knowledge of the following will be useful in working with this guide:

  • Working knowledge of Linux distributions, particularly Ubuntu.
  • A cloud provider including Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure experience.
  • Working knowledge of Kubernetes, mainly kubectl and kots or helm.

For more information, Contact Us for additional details.

The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.

Software or RuntimeDescriptionMinimum Supported VersionPreferred Version(s)
KubernetesCluster deployment management1.231.25
containerdContainer Management1.7.01.7.0
kubectlKubernetes administrative console application1.261.26

Custom Configurations

Wallaroo can be configured with custom installations depending on your organization’s needs. The following options are available:


Environment Setup Guides

The following setup guides are used to set up the environment that will host the Wallaroo instance. Verify that the environment is prepared and meets the Wallaroo Prerequisites Guide.

Uninstall Guides

The following is a short version of the uninstallation procedure to remove a previously installed version of Wallaroo. For full details, see the How to Uninstall Wallaroo. These instructions assume administrative use of the Kubernetes command kubectl.

To uninstall a previously installed Wallaroo instance:

  1. Delete any Wallaroo pipelines still deployed with the command kubectl delete namespace {namespace}. Typically these are the pipeline name with some numerical ID. For example, in the following list of namespaces the namespace ccfraud-pipeline-21 correspond to the Wallaroo pipeline ccfraud-pipeline. Verify these are Wallaroo pipelines before deleting.

      -> kubectl get namespaces
        NAME			    STATUS        AGE
        default		        Active        7d4h
        kube-node-lease	    Active		    7d4h
        kube-public		    Active		    7d4h
        ccfraud-pipeline-21    Active         4h23m
        wallaroo             Active         3d6h
    
      -> kubectl delete namespaces ccfraud-pipeline-21
    
  2. Use the following bash script or run the commands individually. Warning: If the selector is incorrect or missing from the kubectl command, the cluster could be damaged beyond repair. For a default installation, the selector and namespace will be wallaroo.

    #!/bin/bash
    kubectl delete ns wallaroo && \ 
    kubectl delete all,secret,configmap,clusterroles,clusterrolebindings,storageclass,crd \
    --selector app.kubernetes.io/part-of=wallaroo --selector kots.io/app-slug=wallaroo
    

Wallaroo can now be reinstalled into this environment.

Environment Setup Guides

  • AWS Cluster for Wallaroo Enterprise Instructions

The following instructions are made to assist users set up their Amazon Web Services (AWS) environment for running Wallaroo Enterprise using AWS Elastic Kubernetes Service (EKS).

These represent a recommended setup, but can be modified to fit your specific needs.

  • AWS Prerequisites

To install Wallaroo in your AWS environment based on these instructions, the following prerequisites must be met:

  • Register an AWS account: https://aws.amazon.com/ and assign the proper permissions according to your organization’s needs.
  • The Kubernetes cluster must include the following minimum settings:
    • Nodes must be OS type Linux with using the containerd driver.
    • Role-based access control (RBAC) must be enabled.
    • Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
    • RBAC is enabled.
    • Recommended Aws Machine type: c5.4xlarge. For more information, see the AWS Instance Types.
  • Installed eksctl version 0.101.0 and above.
  • If the cluster will utilize autoscaling, install the Cluster Autoscaler on AWS.
  • IMPORTANT NOTE

    Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.

    Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

  • AWS Environment Setup Steps

The following steps are guidelines to assist new users in setting up their AWS environment for Wallaroo. Feel free to replace these with commands with ones that match your needs.

These commands make use of the command line tool eksctl which streamlines the process in creating Amazon Elastic Kubernetes Service clusters for our Wallaroo environment.

The following are used for the example commands below. Replace them with your specific environment settings:

  • AWS Cluster Name: wallarooAWS

  • Create an AWS EKS Cluster

The following eksctl configuration file is an example of setting up the AWS environment for a Wallaroo cluster, including the static and adaptive nodepools. Adjust these names and settings based on your organizations requirements.

This sample YAML file can be downloaded from here:wallaroo_enterprise_aws_install.yaml

Or copied from here:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: wallarooAWS
  region: us-east-1
  version: "1.25"

addons:
  - name: aws-ebs-csi-driver

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: cluster-autoscaler
      namespace: kube-system
      labels: {aws-usage: "cluster-ops"}
    wellKnownPolicies:
      autoScaler: true
    roleName: eksctl-cluster-autoscaler-role

nodeGroups:
  - name: mainpool
    instanceType: m5.2xlarge
    desiredCapacity: 3
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: postgres
    instanceType: m5.2xlarge
    desiredCapacity: 1
    taints:
      - key: wallaroo.ai/postgres
        value: "true"
        effect: NoSchedule
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine-lb
    instanceType: c5.4xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/enginelb
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine-lb
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine
    instanceType: c5.2xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/engine
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  • Create the Cluster

Create the cluster with the following command, which creates the environment and sets the correct Kubernetes version.

eksctl create cluster -f wallaroo_enterprise_aws_install.yaml

During the process the Kubernetes credentials will be copied into the local environment. To verify the setup is complete, use the kubectl get nodes command to display the available nodes as in the following example:

kubectl get nodes
NAME                                           STATUS   ROLES    AGE     VERSION
ip-192-168-21-253.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-30-36.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-38-31.us-east-2.compute.internal    Ready    <none>   9m46s   v1.23.8-eks-9017834
ip-192-168-55-123.us-east-2.compute.internal   Ready    <none>   12m     v1.23.8-eks-9017834
ip-192-168-79-70.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-37-222.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834
  • Azure Cluster for Wallaroo Enterprise Instructions

The following instructions are made to assist users set up their Microsoft Azure Kubernetes environment for running Wallaroo Enterprise. These represent a recommended setup, but can be modified to fit your specific needs.

If your prepared to install the environment now, skip to Setup Environment Steps.

There are two methods we’ve detailed here on how to setup your Kubernetes cloud environment in Azure:

  • Quick Setup Script: Download a bash script to automatically set up the Azure environment through the Microsoft Azure command line interface az.

  • Manual Setup Guide: A list of the az commands used to create the environment through manual commands.

    • Azure Prerequisites

    To install Wallaroo in your Microsoft Azure environment, the following prerequisites must be met:

    • Register a Microsoft Azure account: https://azure.microsoft.com/.
    • Install the Microsoft Azure CLI and complete the Azure CLI Get Started Guide to connect your az application to your Microsoft Azure account.
    • The Kubernetes cluster must include the following minimum settings:
      • Nodes must be OS type Linux the containerd driver as the default.
      • Role-based access control (RBAC) must be enabled.
      • Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
      • RBAC is enabled.
      • Minimum machine type is set to to Standard_D8s_v4.
    • IMPORTANT NOTE

      Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.

      Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

    • Standard Setup Variables

    The following variables are used in the Quick Setup Script and the Manual Setup Guide detailed below. Modify them as best fits your organization.

    Variable NameDefault ValueDescription
    WALLAROO_RESOURCE_GROUPwallaroogroupThe Azure Resource Group used for the KUbernetes environment.
    WALLAROO_GROUP_LOCATIONeastusThe region that the Kubernetes environment will be installed to.
    WALLAROO_CONTAINER_REGISTRYwallarooacrThe Azure Container Registry used for the Kubernetes environment.
    WALLAROO_CLUSTERwallarooaksThe name of the Kubernetes cluster that Wallaroo is installed to.
    WALLAROO_SKU_TYPEBaseThe Azure Kubernetes Service SKU type.
    WALLAROO_VM_SIZEStandard_D8s_v4The VM type used for the standard Wallaroo cluster nodes.
    POSTGRES_VM_SIZEStandard_D8s_v4The VM type used for the postgres nodepool.
    ENGINELB_VM_SIZEStandard_D8s_v4The VM type used for the engine-lb nodepool.
    ENGINE_VM_SIZEStandard_F8s_v2The VM type used for the engine nodepool.
    • Setup Environment Steps

    • Quick Setup Script

    A sample script is available here, and creates an Azure Kubernetes environment ready for use with Wallaroo Enterprise. This script requires the following prerequisites listed above and uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.

    The following script is available for download: wallaroo_enterprise_azure_expandable.bash

    The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

    1. Download the script above.
    2. In a terminal window set the script status as execute with the command chmod +x wallaroo_enterprise_install_azure_expandable.bash.
    3. Modify the script variables listed above based on your requirements.
    4. Run the script with either bash wallaroo_enterprise_install_azure_expandable.bash or ./wallaroo_enterprise_install_azure_expandable.bash from the same directory as the script.
    • Manual Setup Guide

    The following steps are guidelines to assist new users in setting up their Azure environment for Wallaroo.
    The process uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.

    See the Azure Command-Line Interface for full details on commands and settings.

    Setting up an Azure AKS environment is based on the Azure Kubernetes Service tutorial, streamlined to show the minimum steps in setting up your own Wallaroo environment in Azure.

    This follows these major steps:

    • Set Variables

    The following are the variables used for the rest of the commands. Modify them as fits your organization’s needs.

    WALLAROO_RESOURCE_GROUP=wallaroogroup
    WALLAROO_GROUP_LOCATION=eastus
    WALLAROO_CONTAINER_REGISTRY=wallarooacr
    WALLAROO_CLUSTER=wallarooaks
    WALLAROO_SKU_TYPE=Base
    WALLAROO_VM_SIZE=Standard_D8s_v4
    POSTGRES_VM_SIZE=Standard_D8s_v4
    ENGINELB_VM_SIZE=Standard_D8s_v4
    ENGINE_VM_SIZE=Standard_F8s_v2
    
    • Create an Azure Resource Group

    To create an Azure Resource Group for Wallaroo in Microsoft Azure, use the following template:

    az group create --name $WALLAROO_RESOURCE_GROUP --location $WALLAROO_GROUP_LOCATION
    

    (Optional): Set the default Resource Group to the one recently created. This allows other Azure commands to automatically select this group for commands such as az aks list, etc.

    az configure --defaults group={Resource Group Name}
    

    For example:

    az configure --defaults group=wallarooGroup
    
    • Create an Azure Container Registry

    An Azure Container Registry(ACR) manages the container images for services includes Kubernetes. The template for setting up an Azure ACR that supports Wallaroo is the following:

    az acr create -n $WALLAROO_CONTAINER_REGISTRY \
    -g $WALLAROO_RESOURCE_GROUP \
    --sku $WALLAROO_SKU_TYPE \
    --location $WALLAROO_GROUP_LOCATION
    
    • Create an Azure Kubernetes Services

    Now we can create our Kubernetes service in Azure that will host our Wallaroo with the az aks create command.

    az aks create \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --name $WALLAROO_CLUSTER \
    --node-count 3 \
    --generate-ssh-keys \
    --vm-set-type VirtualMachineScaleSets \
    --load-balancer-sku standard \
    --node-vm-size $WALLAROO_VM_SIZE \
    --nodepool-name mainpool \
    --attach-acr $WALLAROO_CONTAINER_REGISTRY \
    --kubernetes-version=1.23.15 \
    --zones 1 \
    --location $WALLAROO_GROUP_LOCATION
    
    • Wallaroo Enterprise Nodepools

    Wallaroo Enterprise supports autoscaling and static nodepools. The following commands are used to create both to support the Wallaroo Enterprise cluster.

    The following static nodepools are set up to support the Wallaroo cluster for postgres. Update the VM_SIZE based on your requirements.

    az aks nodepool add \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --cluster-name $WALLAROO_CLUSTER \
    --name postgres \
    --node-count 1 \
    --node-vm-size $POSTGRES_VM_SIZE \
    --no-wait \
    --node-taints wallaroo.ai/postgres=true:NoSchedule \
    --zones 1
    

    The following autoscaling nodepools are used for the engineLB and the engine nodepools. Adjust the settings based on your organizations requirements.

    az aks nodepool add \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --cluster-name $WALLAROO_CLUSTER \
    --name enginelb \
    --node-count 1 \
    --node-vm-size $ENGINELB_VM_SIZE \
    --no-wait \
    --enable-cluster-autoscaler \
    --max-count 3 \
    --min-count 1 \
    --node-taints wallaroo.ai/enginelb=true:NoSchedule \
    --labels wallaroo-node-type=enginelb \
    --zones 1
    
    az aks nodepool add \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --cluster-name $WALLAROO_CLUSTER \
    --name engine \
    --node-count 1 \
    --node-vm-size $ENGINE_VM_SIZE \
    --no-wait \
    --enable-cluster-autoscaler \
    --max-count 3 \
    --min-count 1 \
    --node-taints wallaroo.ai/engine=true:NoSchedule \
    --labels wallaroo-node-type=engine \
    --zones 1
    

    For additional settings such as customizing the node pools for your Wallaroo Kubernetes cluster to customize the type of virtual machines used and other settings, see the Microsoft Azure documentation on using system node pools.

    • Download Wallaroo Kubernetes Configuration

    Once the Kubernetes environment is complete, associate it with the local Kubernetes configuration by importing the credentials through the following template command:

    az aks get-credentials --resource-group $WALLAROO_RESOURCE_GROUP --name $WALLAROO_CLUSTER
    

    Verify the cluster is available through the kubectl get nodes command.

    kubectl get nodes
    
    NAME                               STATUS   ROLES   AGE   VERSION
    aks-engine-99896855-vmss000000     Ready    agent   40m   v1.23.8
    aks-enginelb-54433467-vmss000000   Ready    agent   48m   v1.23.8
    aks-mainpool-37402055-vmss000000   Ready    agent   81m   v1.23.8
    aks-mainpool-37402055-vmss000001   Ready    agent   81m   v1.23.8
    aks-mainpool-37402055-vmss000002   Ready    agent   81m   v1.23.8
    aks-postgres-40215394-vmss000000   Ready    agent   52m   v1.23.8
    

    The following instructions are made to assist users set up their Google Cloud Platform (GCP) Kubernetes environment for running Wallaroo. These represent a recommended setup, but can be modified to fit your specific needs. In particular, these instructions will provision a GKE cluster with 56 CPUs in total. Please ensure that your project’s resource limits support that.

    • Quick Setup Script: Download a bash script to automatically set up the GCP environment through the Google Cloud Platform command line interface gcloud.

    • Manual Setup Guide: A list of the gcloud commands used to create the environment through manual commands.

      • GCP Prerequisites

      Organizations that wish to run Wallaroo in their Google Cloud Platform environment must complete the following prerequisites:

      • IMPORTANT NOTE

        Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.

        Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

      • Standard Setup Variables

      The following variables are used in the Quick Setup Script and the Manual Setup Guide. Modify them as best fits your organization.

      Variable NameDefault ValueDescription
      WALLAROO_GCP_PROJECTwallarooThe name of the Google Project used for the Wallaroo instance.
      WALLAROO_CLUSTERwallarooThe name of the Kubernetes cluster for the Wallaroo instance.
      WALLAROO_GCP_REGIONus-central1The region the Kubernetes environment is installed to. Update this to your GCP Computer Engine region.
      WALLAROO_NODE_LOCATIONus-central1-fThe location the Kubernetes nodes are installed to. Update this to your GCP Compute Engine Zone.
      WALLAROO_GCP_NETWORK_NAMEwallaroo-networkThe Google network used with the Kubernetes environment.
      WALLAROO_GCP_SUBNETWORK_NAMEwallaroo-subnet-1The Google network subnet used with the Kubernets environment.
      DEFAULT_VM_SIZEe2-standard-8The VM type used for the default nodepool.
      POSTGRES_VM_SIZEn2-standard-8The VM type used for the postgres nodepool.
      ENGINELB_VM_SIZEc2-standard-8The VM type used for the engine-lb nodepool.
      ENGINE_VM_SIZEc2-standard-8The VM type used for the engine nodepool.
      • Quick Setup Script

      A sample script is available here, and creates a Google Kubernetes Engine cluster ready for use with Wallaroo Enterprise. This script requires the prerequisites listed above and uses the variables as listed in Standard Setup Variables

      The following script is available for download: wallaroo_enterprise_gcp_expandable.bash

      The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

      1. Download the script above.
      2. In a terminal window set the script status as execute with the command chmod +x bash wallaroo_enterprise_gcp_expandable.bash.
      3. Modify the script variables listed above based on your requirements.
      4. Run the script with either bash wallaroo_enterprise_gcp_expandable.bash or ./wallaroo_enterprise_gcp_expandable.bash from the same directory as the script.
      • Set Variables

      The following are the variables used in the environment setup process. Modify them as best fits your organization’s needs.

      WALLAROO_GCP_PROJECT=wallaroo
      WALLAROO_CLUSTER=wallaroo
      WALLAROO_GCP_REGION=us-central1
      WALLAROO_NODE_LOCATION=us-central1-f
      WALLAROO_GCP_NETWORK_NAME=wallaroo-network
      WALLAROO_GCP_SUBNETWORK_NAME=wallaroo-subnet-1
      DEFAULT_VM_SIZE=n2-standard-8
      POSTGRES_VM_SIZE=n2-standard-8
      ENGINELB_VM_SIZE=c2-standard-8
      ENGINE_VM_SIZE=c2-standard-8
      
      • Manual Setup Guide

      The following steps are guidelines to assist new users in setting up their GCP environment for Wallaroo. The variables used in the commands are as listed in Standard Setup Variables listed above. Feel free to replace these with ones that match your needs.

      See the Google Cloud SDK for full details on commands and settings.

      • Create a GCP Network

      First create a GCP network that is used to connect to the cluster with the gcloud compute networks create command. For more information, see the gcloud compute networks create page.

      gcloud compute networks \