Wallaroo Enterprise Comprehensive Install Guide

How to set up Wallaroo Enterprise, environments, and other configurations.

Table of Contents

This guide is targeted towards system administrators and data scientists who want to work with the easiest, fastest, and comprehensive method of running your own machine learning models.

A typical installation of Wallaroo follows this process:

StepDescription   Average Setup Time   
Setup EnvironmentCreate an environment that meets the Wallaroo prerequisites30 minutes
Install WallarooInstall Wallaroo into a prepared environment15 minutes
Configure WallarooUpdate Wallaroo with required post-install configurations.Variable

Some knowledge of the following will be useful in working with this guide:

  • Working knowledge of Linux distributions, particularly Ubuntu.
  • A cloud provider including Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure experience.
  • Working knowledge of Kubernetes, mainly kubectl and kots or helm.

For more information, Contact Us for additional details.

The following software or runtimes are required for Wallaroo 2023.2.1. Most are automatically available through the supported cloud providers.

Software or RuntimeDescriptionMinimum Supported VersionPreferred Version(s)
KubernetesCluster deployment management1.231.25
containerdContainer Management1.7.01.7.0
kubectlKubernetes administrative console application1.261.26

Custom Configurations

Wallaroo can be configured with custom installations depending on your organization’s needs. The following options are available:


Environment Setup Guides

The following setup guides are used to set up the environment that will host the Wallaroo instance. Verify that the environment is prepared and meets the Wallaroo Prerequisites Guide.

Uninstall Guides

The following is a short version of the uninstallation procedure to remove a previously installed version of Wallaroo. For full details, see the How to Uninstall Wallaroo. These instructions assume administrative use of the Kubernetes command kubectl.

To uninstall a previously installed Wallaroo instance:

  1. Delete any Wallaroo pipelines still deployed with the command kubectl delete namespace {namespace}. Typically these are the pipeline name with some numerical ID. For example, in the following list of namespaces the namespace ccfraud-pipeline-21 correspond to the Wallaroo pipeline ccfraud-pipeline. Verify these are Wallaroo pipelines before deleting.

      -> kubectl get namespaces
        NAME			    STATUS        AGE
        default		        Active        7d4h
        kube-node-lease	    Active		    7d4h
        kube-public		    Active		    7d4h
        ccfraud-pipeline-21    Active         4h23m
        wallaroo             Active         3d6h
    
      -> kubectl delete namespaces ccfraud-pipeline-21
    
  2. Use the following bash script or run the commands individually. Warning: If the selector is incorrect or missing from the kubectl command, the cluster could be damaged beyond repair. For a default installation, the selector and namespace will be wallaroo.

    #!/bin/bash
    kubectl delete ns wallaroo && \ 
    kubectl delete all,secret,configmap,clusterroles,clusterrolebindings,storageclass,crd \
    --selector app.kubernetes.io/part-of=wallaroo --selector kots.io/app-slug=wallaroo
    

Wallaroo can now be reinstalled into this environment.

Environment Setup Guides

  • AWS Cluster for Wallaroo Enterprise Instructions

The following instructions are made to assist users set up their Amazon Web Services (AWS) environment for running Wallaroo Enterprise using AWS Elastic Kubernetes Service (EKS).

These represent a recommended setup, but can be modified to fit your specific needs.

  • AWS Prerequisites

To install Wallaroo in your AWS environment based on these instructions, the following prerequisites must be met:

  • Register an AWS account: https://aws.amazon.com/ and assign the proper permissions according to your organization’s needs.
  • The Kubernetes cluster must include the following minimum settings:
    • Nodes must be OS type Linux with using the containerd driver.
    • Role-based access control (RBAC) must be enabled.
    • Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
    • RBAC is enabled.
    • Recommended Aws Machine type: c5.4xlarge. For more information, see the AWS Instance Types.
  • Installed eksctl version 0.101.0 and above.
  • If the cluster will utilize autoscaling, install the Cluster Autoscaler on AWS.
  • IMPORTANT NOTE

    Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.

    Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

  • AWS Environment Setup Steps

The following steps are guidelines to assist new users in setting up their AWS environment for Wallaroo. Feel free to replace these with commands with ones that match your needs.

These commands make use of the command line tool eksctl which streamlines the process in creating Amazon Elastic Kubernetes Service clusters for our Wallaroo environment.

The following are used for the example commands below. Replace them with your specific environment settings:

  • AWS Cluster Name: wallarooAWS

  • Create an AWS EKS Cluster

The following eksctl configuration file is an example of setting up the AWS environment for a Wallaroo cluster, including the static and adaptive nodepools. Adjust these names and settings based on your organizations requirements.

This sample YAML file can be downloaded from here:wallaroo_enterprise_aws_install.yaml

Or copied from here:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: wallarooAWS
  region: us-east-1
  version: "1.25"

addons:
  - name: aws-ebs-csi-driver

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: cluster-autoscaler
      namespace: kube-system
      labels: {aws-usage: "cluster-ops"}
    wellKnownPolicies:
      autoScaler: true
    roleName: eksctl-cluster-autoscaler-role

nodeGroups:
  - name: mainpool
    instanceType: m5.2xlarge
    desiredCapacity: 3
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: postgres
    instanceType: m5.2xlarge
    desiredCapacity: 1
    taints:
      - key: wallaroo.ai/postgres
        value: "true"
        effect: NoSchedule
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine-lb
    instanceType: c5.4xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/enginelb
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine-lb
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  - name: engine
    instanceType: c5.2xlarge
    minSize: 1
    maxSize: 3
    taints:
      - key: wallaroo.ai/engine
        value: "true"
        effect: NoSchedule
    tags:
      k8s.io/cluster-autoscaler/node-template/label/k8s.dask.org/node-purpose: engine
      k8s.io/cluster-autoscaler/node-template/taint/k8s.dask.org/dedicated: "true:NoSchedule"
    iam:
      withAddonPolicies:
        autoScaler: true
    containerRuntime: containerd
    amiFamily: AmazonLinux2
    availabilityZones:
      - us-east-1a
  • Create the Cluster

Create the cluster with the following command, which creates the environment and sets the correct Kubernetes version.

eksctl create cluster -f wallaroo_enterprise_aws_install.yaml

During the process the Kubernetes credentials will be copied into the local environment. To verify the setup is complete, use the kubectl get nodes command to display the available nodes as in the following example:

kubectl get nodes
NAME                                           STATUS   ROLES    AGE     VERSION
ip-192-168-21-253.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-30-36.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-38-31.us-east-2.compute.internal    Ready    <none>   9m46s   v1.23.8-eks-9017834
ip-192-168-55-123.us-east-2.compute.internal   Ready    <none>   12m     v1.23.8-eks-9017834
ip-192-168-79-70.us-east-2.compute.internal    Ready    <none>   13m     v1.23.8-eks-9017834
ip-192-168-37-222.us-east-2.compute.internal   Ready    <none>   13m     v1.23.8-eks-9017834
  • Azure Cluster for Wallaroo Enterprise Instructions

The following instructions are made to assist users set up their Microsoft Azure Kubernetes environment for running Wallaroo Enterprise. These represent a recommended setup, but can be modified to fit your specific needs.

If your prepared to install the environment now, skip to Setup Environment Steps.

There are two methods we’ve detailed here on how to setup your Kubernetes cloud environment in Azure:

  • Quick Setup Script: Download a bash script to automatically set up the Azure environment through the Microsoft Azure command line interface az.

  • Manual Setup Guide: A list of the az commands used to create the environment through manual commands.

    • Azure Prerequisites

    To install Wallaroo in your Microsoft Azure environment, the following prerequisites must be met:

    • Register a Microsoft Azure account: https://azure.microsoft.com/.
    • Install the Microsoft Azure CLI and complete the Azure CLI Get Started Guide to connect your az application to your Microsoft Azure account.
    • The Kubernetes cluster must include the following minimum settings:
      • Nodes must be OS type Linux the containerd driver as the default.
      • Role-based access control (RBAC) must be enabled.
      • Minimum of 4 nodes, each node with a minimum of 8 CPU cores and 16 GB RAM. 50 GB will be allocated per node for a total of 625 GB for the entire cluster.
      • RBAC is enabled.
      • Minimum machine type is set to to Standard_D8s_v4.
    • IMPORTANT NOTE

      Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.

      Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

    • Standard Setup Variables

    The following variables are used in the Quick Setup Script and the Manual Setup Guide detailed below. Modify them as best fits your organization.

    Variable NameDefault ValueDescription
    WALLAROO_RESOURCE_GROUPwallaroogroupThe Azure Resource Group used for the KUbernetes environment.
    WALLAROO_GROUP_LOCATIONeastusThe region that the Kubernetes environment will be installed to.
    WALLAROO_CONTAINER_REGISTRYwallarooacrThe Azure Container Registry used for the Kubernetes environment.
    WALLAROO_CLUSTERwallarooaksThe name of the Kubernetes cluster that Wallaroo is installed to.
    WALLAROO_SKU_TYPEBaseThe Azure Kubernetes Service SKU type.
    WALLAROO_VM_SIZEStandard_D8s_v4The VM type used for the standard Wallaroo cluster nodes.
    POSTGRES_VM_SIZEStandard_D8s_v4The VM type used for the postgres nodepool.
    ENGINELB_VM_SIZEStandard_D8s_v4The VM type used for the engine-lb nodepool.
    ENGINE_VM_SIZEStandard_F8s_v2The VM type used for the engine nodepool.
    • Setup Environment Steps

    • Quick Setup Script

    A sample script is available here, and creates an Azure Kubernetes environment ready for use with Wallaroo Enterprise. This script requires the following prerequisites listed above and uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.

    The following script is available for download: wallaroo_enterprise_azure_expandable.bash

    The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

    1. Download the script above.
    2. In a terminal window set the script status as execute with the command chmod +x wallaroo_enterprise_install_azure_expandable.bash.
    3. Modify the script variables listed above based on your requirements.
    4. Run the script with either bash wallaroo_enterprise_install_azure_expandable.bash or ./wallaroo_enterprise_install_azure_expandable.bash from the same directory as the script.
    • Manual Setup Guide

    The following steps are guidelines to assist new users in setting up their Azure environment for Wallaroo.
    The process uses the variables listed in Standard Setup Variables. Modify them as best fits your organization’s needs.

    See the Azure Command-Line Interface for full details on commands and settings.

    Setting up an Azure AKS environment is based on the Azure Kubernetes Service tutorial, streamlined to show the minimum steps in setting up your own Wallaroo environment in Azure.

    This follows these major steps:

    • Set Variables

    The following are the variables used for the rest of the commands. Modify them as fits your organization’s needs.

    WALLAROO_RESOURCE_GROUP=wallaroogroup
    WALLAROO_GROUP_LOCATION=eastus
    WALLAROO_CONTAINER_REGISTRY=wallarooacr
    WALLAROO_CLUSTER=wallarooaks
    WALLAROO_SKU_TYPE=Base
    WALLAROO_VM_SIZE=Standard_D8s_v4
    POSTGRES_VM_SIZE=Standard_D8s_v4
    ENGINELB_VM_SIZE=Standard_D8s_v4
    ENGINE_VM_SIZE=Standard_F8s_v2
    
    • Create an Azure Resource Group

    To create an Azure Resource Group for Wallaroo in Microsoft Azure, use the following template:

    az group create --name $WALLAROO_RESOURCE_GROUP --location $WALLAROO_GROUP_LOCATION
    

    (Optional): Set the default Resource Group to the one recently created. This allows other Azure commands to automatically select this group for commands such as az aks list, etc.

    az configure --defaults group={Resource Group Name}
    

    For example:

    az configure --defaults group=wallarooGroup
    
    • Create an Azure Container Registry

    An Azure Container Registry(ACR) manages the container images for services includes Kubernetes. The template for setting up an Azure ACR that supports Wallaroo is the following:

    az acr create -n $WALLAROO_CONTAINER_REGISTRY \
    -g $WALLAROO_RESOURCE_GROUP \
    --sku $WALLAROO_SKU_TYPE \
    --location $WALLAROO_GROUP_LOCATION
    
    • Create an Azure Kubernetes Services

    Now we can create our Kubernetes service in Azure that will host our Wallaroo with the az aks create command.

    az aks create \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --name $WALLAROO_CLUSTER \
    --node-count 3 \
    --generate-ssh-keys \
    --vm-set-type VirtualMachineScaleSets \
    --load-balancer-sku standard \
    --node-vm-size $WALLAROO_VM_SIZE \
    --nodepool-name mainpool \
    --attach-acr $WALLAROO_CONTAINER_REGISTRY \
    --kubernetes-version=1.23.15 \
    --zones 1 \
    --location $WALLAROO_GROUP_LOCATION
    
    • Wallaroo Enterprise Nodepools

    Wallaroo Enterprise supports autoscaling and static nodepools. The following commands are used to create both to support the Wallaroo Enterprise cluster.

    The following static nodepools are set up to support the Wallaroo cluster for postgres. Update the VM_SIZE based on your requirements.

    az aks nodepool add \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --cluster-name $WALLAROO_CLUSTER \
    --name postgres \
    --node-count 1 \
    --node-vm-size $POSTGRES_VM_SIZE \
    --no-wait \
    --node-taints wallaroo.ai/postgres=true:NoSchedule \
    --zones 1
    

    The following autoscaling nodepools are used for the engineLB and the engine nodepools. Adjust the settings based on your organizations requirements.

    az aks nodepool add \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --cluster-name $WALLAROO_CLUSTER \
    --name enginelb \
    --node-count 1 \
    --node-vm-size $ENGINELB_VM_SIZE \
    --no-wait \
    --enable-cluster-autoscaler \
    --max-count 3 \
    --min-count 1 \
    --node-taints wallaroo.ai/enginelb=true:NoSchedule \
    --labels wallaroo-node-type=enginelb \
    --zones 1
    
    az aks nodepool add \
    --resource-group $WALLAROO_RESOURCE_GROUP \
    --cluster-name $WALLAROO_CLUSTER \
    --name engine \
    --node-count 1 \
    --node-vm-size $ENGINE_VM_SIZE \
    --no-wait \
    --enable-cluster-autoscaler \
    --max-count 3 \
    --min-count 1 \
    --node-taints wallaroo.ai/engine=true:NoSchedule \
    --labels wallaroo-node-type=engine \
    --zones 1
    

    For additional settings such as customizing the node pools for your Wallaroo Kubernetes cluster to customize the type of virtual machines used and other settings, see the Microsoft Azure documentation on using system node pools.

    • Download Wallaroo Kubernetes Configuration

    Once the Kubernetes environment is complete, associate it with the local Kubernetes configuration by importing the credentials through the following template command:

    az aks get-credentials --resource-group $WALLAROO_RESOURCE_GROUP --name $WALLAROO_CLUSTER
    

    Verify the cluster is available through the kubectl get nodes command.

    kubectl get nodes
    
    NAME                               STATUS   ROLES   AGE   VERSION
    aks-engine-99896855-vmss000000     Ready    agent   40m   v1.23.8
    aks-enginelb-54433467-vmss000000   Ready    agent   48m   v1.23.8
    aks-mainpool-37402055-vmss000000   Ready    agent   81m   v1.23.8
    aks-mainpool-37402055-vmss000001   Ready    agent   81m   v1.23.8
    aks-mainpool-37402055-vmss000002   Ready    agent   81m   v1.23.8
    aks-postgres-40215394-vmss000000   Ready    agent   52m   v1.23.8
    

    The following instructions are made to assist users set up their Google Cloud Platform (GCP) Kubernetes environment for running Wallaroo. These represent a recommended setup, but can be modified to fit your specific needs. In particular, these instructions will provision a GKE cluster with 56 CPUs in total. Please ensure that your project’s resource limits support that.

    • Quick Setup Script: Download a bash script to automatically set up the GCP environment through the Google Cloud Platform command line interface gcloud.

    • Manual Setup Guide: A list of the gcloud commands used to create the environment through manual commands.

      • GCP Prerequisites

      Organizations that wish to run Wallaroo in their Google Cloud Platform environment must complete the following prerequisites:

      • IMPORTANT NOTE

        Organizations that intend to stop and restart their Kubernetes environment on an intentional or regular basis are recommended to use a single availability zone for their nodes. This minimizes issues such as persistent volumes in different availability zones, etc.

        Organizations that intend to use Wallaroo Enterprise in a high availability cluster are encouraged to follow best practices including using separate availability zones for redundancy, etc.

      • Standard Setup Variables

      The following variables are used in the Quick Setup Script and the Manual Setup Guide. Modify them as best fits your organization.

      Variable NameDefault ValueDescription
      WALLAROO_GCP_PROJECTwallarooThe name of the Google Project used for the Wallaroo instance.
      WALLAROO_CLUSTERwallarooThe name of the Kubernetes cluster for the Wallaroo instance.
      WALLAROO_GCP_REGIONus-central1The region the Kubernetes environment is installed to. Update this to your GCP Computer Engine region.
      WALLAROO_NODE_LOCATIONus-central1-fThe location the Kubernetes nodes are installed to. Update this to your GCP Compute Engine Zone.
      WALLAROO_GCP_NETWORK_NAMEwallaroo-networkThe Google network used with the Kubernetes environment.
      WALLAROO_GCP_SUBNETWORK_NAMEwallaroo-subnet-1The Google network subnet used with the Kubernets environment.
      DEFAULT_VM_SIZEe2-standard-8The VM type used for the default nodepool.
      POSTGRES_VM_SIZEn2-standard-8The VM type used for the postgres nodepool.
      ENGINELB_VM_SIZEc2-standard-8The VM type used for the engine-lb nodepool.
      ENGINE_VM_SIZEc2-standard-8The VM type used for the engine nodepool.
      • Quick Setup Script

      A sample script is available here, and creates a Google Kubernetes Engine cluster ready for use with Wallaroo Enterprise. This script requires the prerequisites listed above and uses the variables as listed in Standard Setup Variables

      The following script is available for download: wallaroo_enterprise_gcp_expandable.bash

      The following steps are geared towards a standard Linux or macOS system that supports the prerequisites listed above. Modify these steps based on your local environment.

      1. Download the script above.
      2. In a terminal window set the script status as execute with the command chmod +x bash wallaroo_enterprise_gcp_expandable.bash.
      3. Modify the script variables listed above based on your requirements.
      4. Run the script with either bash wallaroo_enterprise_gcp_expandable.bash or ./wallaroo_enterprise_gcp_expandable.bash from the same directory as the script.
      • Set Variables

      The following are the variables used in the environment setup process. Modify them as best fits your organization’s needs.

      WALLAROO_GCP_PROJECT=wallaroo
      WALLAROO_CLUSTER=wallaroo
      WALLAROO_GCP_REGION=us-central1
      WALLAROO_NODE_LOCATION=us-central1-f
      WALLAROO_GCP_NETWORK_NAME=wallaroo-network
      WALLAROO_GCP_SUBNETWORK_NAME=wallaroo-subnet-1
      DEFAULT_VM_SIZE=n2-standard-8
      POSTGRES_VM_SIZE=n2-standard-8
      ENGINELB_VM_SIZE=c2-standard-8
      ENGINE_VM_SIZE=c2-standard-8
      
      • Manual Setup Guide

      The following steps are guidelines to assist new users in setting up their GCP environment for Wallaroo. The variables used in the commands are as listed in Standard Setup Variables listed above. Feel free to replace these with ones that match your needs.

      See the Google Cloud SDK for full details on commands and settings.

      • Create a GCP Network

      First create a GCP network that is used to connect to the cluster with the gcloud compute networks create command. For more information, see the gcloud compute networks create page.

      gcloud compute networks \
      create $WALLAROO_GCP_NETWORK_NAME \
      --bgp-routing-mode regional \
      --subnet-mode custom
      

      Verify it’s creation by listing the GCP networks:

      gcloud compute networks list
      
      • Create the GCP Wallaroo Cluster

      Once the network is created, the gcloud container clusters create command is used to create a cluster. For more information see the gcloud container clusters create page.

      The following is a recommended format, replacing the {} listed variables based on your setup. For Google GKE containerd is enabled by default.

      gcloud container clusters \
      create $WALLAROO_CLUSTER \
      --region $WALLAROO_GCP_REGION \
      --node-locations $WALLAROO_NODE_LOCATION \
      --machine-type $DEFAULT_VM_SIZE \
      --network $WALLAROO_GCP_NETWORK_NAME \
      --create-subnetwork name=$WALLAROO_GCP_SUBNETWORK_NAME \
      --enable-ip-alias \
      --cluster-version=1.23
      

      The command can take several minutes to complete based on the size and complexity of the clusters. Verify the process is complete with the clusters list command:

      gcloud container clusters list
      
      • Wallaroo Enterprise Nodepools

      The following static nodepools can be set based on your organizations requirements. Adjust the settings or names based on your requirements.

      gcloud container node-pools create postgres \
      --cluster=$WALLAROO_CLUSTER \
      --machine-type=$POSTGRES_VM_SIZE \
      --num-nodes=1 \
      --region $WALLAROO_GCP_REGION \
      --node-taints wallaroo.ai/postgres=true:NoSchedule
      

      The following autoscaling nodepools are used for the engine load balancers and Wallaroo engine. Again, replace names and virtual machine types based on your organizations requirements.

      gcloud container node-pools create engine-lb \
      --cluster=$WALLAROO_CLUSTER \
      --machine-type=$ENGINELB_VM_SIZE \
      --enable-autoscaling \
      --num-nodes=1 \
      --min-nodes=0 \
      --max-nodes=3 \
      --region $WALLAROO_GCP_REGION \
      --node-taints wallaroo-engine-lb=true:NoSchedule,wallaroo.ai/enginelb=true:NoSchedule \
      --node-labels wallaroo-node-type=engine-lb
      
      gcloud container node-pools create engine \
      --cluster=$WALLAROO_CLUSTER \
      --machine-type=$ENGINE_VM_SIZE \
      --enable-autoscaling \
      --num-nodes=1 \
      --min-nodes=0 \
      --max-nodes=3 \
      --region $WALLAROO_GCP_REGION \
      --node-taints wallaroo.ai/engine=true:NoSchedule \
      --node-labels=wallaroo-node-type=engine
      
      • Retrieving Kubernetes Credentials

      Once the GCP cluster is complete, the Kubernetes credentials can be installed into the local administrative system with the gcloud container clusters get-credentials (https://cloud.google.com/sdk/gcloud/reference/container/clusters/get-credentials) command:

      gcloud container clusters \
      get-credentials $WALLAROO_CLUSTER \
      --region $WALLAROO_GCP_REGION
      

      To verify the Kubernetes credentials for your cluster have been installed locally, use the kubectl get nodes command. This will display the nodes in the cluster as demonstrated below:

      kubectl get nodes
      
      NAME                                         STATUS   ROLES    AGE   VERSION
      gke-wallaroo-default-pool-863f02db-7xd4   Ready    <none>   39m   v1.21.6-gke.1503
      gke-wallaroo-default-pool-863f02db-8j2d   Ready    <none>   39m   v1.21.6-gke.1503
      gke-wallaroo-default-pool-863f02db-hn06   Ready    <none>   39m   v1.21.6-gke.1503
      gke-wallaroo-engine-3946eaca-4l3s         Ready    <none>   89s   v1.21.6-gke.1503
      gke-wallaroo-engine-lb-2e33a27f-64wb      Ready    <none>   26m   v1.21.6-gke.1503
      gke-wallaroo-postgres-d22d73d3-5qp5       Ready    <none>   28m   v1.21.6-gke.1503
      
      • Troubleshooting
        • What does the error ‘Insufficient project quota to satisfy request: resource “CPUS_ALL_REGIONS”’ mean?

          Make sure that the Compute Engine Zone and Region are properly set based on your organization’s requirements. The instructions above default to us-central1, so change that zone to install your Wallaroo instance in the correct location.

      • Single Node Linux

      Organizations can run Wallaroo within a single node Linux environment that meet the prerequisites.

      The following guide is based on installing Wallaroo Enterprise into virtual machines based on Ubuntu 22.04 hosted in Google Cloud Platform (GCP), Amazon Web Services (AWS) and Microsoft Azure. For other environments and configurations, consult your Wallaroo support representative.

      • Prerequisites

      Before starting the bare Linux installation, the following conditions must be met:

      • Have a Wallaroo Enterprise license file. For more information, you can request a demonstration.

      • A Linux bare-metal system or virtual machine with at least 32 cores and 64 GB RAM with Ubuntu 20.04 installed.

      • 650 GB allocated for the root partition, plus 50 GB allocated per node and another 50 GB for the JupyterHub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.

      • Ensure memory swapping is disabled by removing it from /etc/fstab if needed.

      • DNS services for integrating your Wallaroo Enterprise instance. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.

      • IMPORTANT NOTE

        • Wallaroo requires out-bound network connections to download the required container images and other tasks. For situations that require limiting out-bound access, refer to the air-gap installation instructions or contact your Wallaroo support representative. Also note that if Wallaroo is being installed into a cloud environment such as Google Cloud Platform, Microsoft Azure, Amazon Web Services, etc, then additional considerations such as networking, DNS, certificates, and other considerations must be accounted for. For IP address restricted environments, see the Air Gap Installation Guide.
        • The steps below are based on minimum requirements for install Wallaroo in a single node environment.
        • For situations that require limiting external IP access or other questions, refer to your Wallaroo support representative.
      • Template Single Node Scripts

      The following template scripts are provided as examples on how to create single node virtual machines that meet the requirements listed above in AWS, GCP, and Microsoft Azure environments.

      Download template script here: aws-single-node-vm.bash

      # Variables
      
      # The name of the virtual machine
      NAME=$USER-demo-vm                     # eg bob-demo-vm
      
      # The image used : ubuntu/images/2023.2.1/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230208
      IMAGE_ID=ami-0557a15b87f6559cf
      
      # Instance type meeting the Wallaroo requirements.
      INSTANCE_TYPE=c6i.8xlarge # c6a.8xlarge is also acceptable
      
      # key name - generate keys using Amazon EC2 Key Pairs
      # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html
      # Wallaroo people: https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#KeyPairs:v=3 - 
      MYKEY=DocNode
      
      
      # We will whitelist the our source IP for maximum security -- just use 0.0.0.0/0 if you don't care.
      MY_IP=$(curl -s https://checkip.amazonaws.com)/32
      
      # Create security group in the Default VPC
      aws ec2 create-security-group --group-name $NAME --description "$USER demo" --no-cli-pager
      
      # Open port 22 and 443
      aws ec2 authorize-security-group-ingress --group-name $NAME --protocol tcp --port 22 --cidr $MY_IP --no-cli-pager
      aws ec2 authorize-security-group-ingress --group-name $NAME --protocol tcp --port 443 --cidr $MY_IP --no-cli-pager
      
      # increase Boot device size to 650 GB
      # Change the location from `/tmp/device.json` as required.
      # cat <<EOF > /tmp/device.json 
      # [{
      #   "DeviceName": "/dev/sda1",
      #   "Ebs": { 
      #     "VolumeSize": 650,
      #     "VolumeType": "gp2"
      #   }
      # }]
      # EOF
      
      # Launch instance with a 650 GB Boot device.
      aws ec2 run-instances --image-id $IMAGE_ID --count 1 --instance-type $INSTANCE_TYPE \
          --no-cli-pager \
          --key-name $MYKEY \
          --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":650,"VolumeType":"gp2"}}]'  \
          --tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=$NAME}]" \
          --security-groups $NAME
      
      # Sample output:
      # {
      #     "Instances": [
      #         {
      #             ...
      #             "InstanceId": "i-0123456789abcdef",     # Keep this instance-id for later
      #             ...
      #         }
      #     ]
      # }
      
      #INSTANCEID=YOURINSTANCE
            
      # After several minutes, a public IP will be known. This command will retrieve it.
      # aws ec2 describe-instances  --output text --instance-id $INSTANCEID \
      #    --query 'Reservations[*].Instances[*].{ip:PublicIpAddress}'
      
      # Sample Output
      # 12.23.34.56
      
      # KEYFILE=KEYFILELOCATION       #usually ~/.ssh/key.pem - verify this is the same as the key above.
      # SSH to the VM - replace $INSTANCEIP
      #ssh -i $KEYFILE ubuntu@$INSTANCEIP
      
      # Stop the VM - replace the $INSTANCEID
      #aws ec2 stop-instances --instance-id $INSTANCEID
      
      # Restart the VM
      #aws ec2 start-instances --instance-id $INSTANCEID
      
      # Clean up - destroy VM
      #aws ec2 terminate-instances --instance-id $INSTANCEID
      
      • Azure VM Template Script

      • Dependencies

      Download template script here: azure-single-node-vm.bash

      #!/bin/bash
      
      # Variables list.  Update as per your organization's settings
      NAME=$USER-demo-vm                          # eg bob-demo-vm
      RESOURCEGROUP=YOURRESOURCEGROUP
      LOCATION=eastus
      IMAGE=Canonical:0001-com-ubuntu-server-jammy:22_04-lts:22.04.202301140
      
      # Pick a location
      az account list-locations  -o table |egrep 'US|----|Name'
      
      # Create resource group
      az group create -l $LOCATION --name $USER-demo-$(date +%y%m%d)
      
      # Create VM. This will create ~/.ssh/id_rsa and id_rsa.pub - store these for later use.
      az vm create --resource-group $RESOURCEGROUP --name $NAME --image $IMAGE  --generate-ssh-keys \
         --size Standard_D32s_v4 --os-disk-size-gb 500 --public-ip-sku Standard
      
      # Sample output
      # {
      #   "location": "eastus",
      #   "privateIpAddress": "10.0.0.4",
      #   "publicIpAddress": "20.127.249.196",    <-- Write this down as MYPUBIP
      #   "resourceGroup": "mnp-demo-230213",
      #   ...
      # }
      
      # SSH port is open by default. This adds an application port.
      az vm open-port --resource-group $RESOURCEGROUP --name $NAME --port 443
      
      # SSH to the VM - assumes that ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub from above are availble.
      # ssh $MYPUBIP
      
      # Use this Stop the VM ("deallocate" frees resources and billing; "stop" does not)
      # az vm deallocate --resource-group $RESOURCEGROUP --name $NAME
      
      # Restart the VM
      # az vm start --resource-group $RESOURCEGROUP --name $NAME
      • GCP VM Template Script

      Dependencies:

      Download template script here: gcp-single-node-vm.bash

      # Settings
      
      NAME=$USER-demo-$(date +%y%m%d)      # eg bob-demo-230210
      ZONE=us-west1-a                      # For a complete list, use `gcloud compute zones list | egrep ^us-`
      PROJECT=wallaroo-dev-253816          # Insert the GCP Project ID here.  This is the one for Wallaroo.
      
      # Create VM
      
      IMAGE=projects/ubuntu-os-cloud/global/images/2023.2.1/ubuntu-2204-jammy-v20230114
      
      # Port 22 and 443 open by default
      gcloud compute instances create $NAME \
          --project=$PROJECT \
          --zone=$ZONE \
          --machine-type=e2-standard-32 \
          --network-interface=network-tier=STANDARD,subnet=default \
          --maintenance-policy=MIGRATE \
          --provisioning-model=STANDARD \
          --no-service-account \
          --no-scopes \
          --tags=https-server \
          --create-disk=boot=yes,image=${IMAGE},size=500,type=pd-standard \
          --no-shielded-secure-boot \
          --no-shielded-vtpm \
          --no-shielded-integrity-monitoring \
          --reservation-affinity=any
      
      
      # Get the external IP address
      gcloud compute instances describe $NAME --zone $ZONE --format='get(networkInterfaces[0].accessConfigs[0].natIP)'
      
      # SSH to the VM
      #gcloud compute ssh $NAME --zone $ZONE
      
      # SCP file to the instance - replace $FILE with the file path.  Useful for copying up the license file up to the instance.
      
      #gcloud compute scp --zone $ZONE $FILE $NAME:~/
      
      # SSH port forward to the VM
      #gcloud compute ssh $NAME --zone $ZONE -- -NL 8800:localhost:8800
      
      # Suspend the VM
      #gcloud compute instances stop $NAME --zone $ZONE
      
      # Restart the VM
      #gcloud compute instances start $NAME --zone $ZONE
      
      • Kubernetes Installation Steps

      The following script and steps will install the Kubernetes version and requirements into the Linux node that supports a Wallaroo single node installation.

      The process includes these major steps:

      • Install Kubernetes

      • Install Kots Version

      • Install Kubernetes

      curl is installed in the default scripts provided above. Verify that it is installed if using some other platform.

      1. Verify that the Ubuntu distribution is up to date, and reboot if necessary after updating.

        sudo apt update
        sudo apt upgrade
        
      2. Start the Kubernetes installation with the following script, substituting the URL path as appropriate for your license.

        For Wallaroo versions 2022.4 and below:

        curl https://kurl.sh/9398a3a | sudo bash
        

        For Wallaroo versions 2023.1 and later, the install is based on the license channel. For example, if your license uses the EE channel, then the path is /wallaroo-ee; that is, /wallaroo- plus the lower-case channel name. Note that the Kubernetes install channel must match the License version. Check with your Wallaroo support representative with any questions about your version.

        curl https://kurl.sh/wallaroo-ee | sudo bash
        
        1. If prompted with This application is incompatible with memory swapping enabled. Disable swap to continue? (Y/n), reply Y.
      3. Set up the Kubernetes configuration with the following commands:

        mkdir -p $HOME/.kube
        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
        sudo chown $(id -u):$(id -g) $HOME/.kube/config
        chmod u+w $HOME/.kube/config
        echo 'export KUBECONFIG=$HOME/.kube/config' >> ~/.bashrc
        
      4. Log out, and log back in as the same user. Verify the installation was successful with the following:

        kubectl get nodes
        

        It should return results similar to the following:

        NAME     STATUS   ROLES                  AGE     VERSION
        wallux   Ready    control-plane,master   6m26s   v1.23.6
        
      • Install Kots

      Install kots with the following process.

      1. Run the following script and provide your password for the sudo based commands when prompted.

        curl https://kots.io/install/1.91.3 | REPL_USE_SUDO=y bash
        
      2. Verify kots was installed with the following command:

        kubectl kots version
        

        It should return results similar to the following:

        Replicated KOTS 1.91.3
        
      • Connection Options

      Once Kubernetes has been set up on the Linux node, users can opt to copy the Kubernetes configuration to a local system, updating the IP address and other information as required. See the Configure Access to Multiple Clusters.

      The easiest method is to create a SSH tunnel to the Linux node. Usually this will be in the format:

      ssh $IP -L8800:localhost:8800
      

      For example, in an AWS instance that may be as follows, replaying $KEYFILE with the link to the keyfile and $IP with the IP address of the Linux node.

      ssh -i $KEYFILE ubuntu@$IP -L8800:localhost:8800
      

      In a GCP instance, gcloud can be used as follows, replacing $NAME with the name of the GCP instance, $ZONE with the zone it was installed into.

      gcloud compute ssh $NAME --zone $ZONE -- -NL 8800:localhost:8800
      

      Port forwarding port 8800 is used for kots based installation to access the Wallaroo Administrative Dashboard.

      Install Wallaroo

      Organizations that use cloud services such as Google Cloud Platform (GCP), Amazon Web Services (AWS), or Microsoft Azure can install Wallaroo Enterprise through the following process. These instructions also work with Single Node Linux based installations.

      Before installation, the following prerequisites must be met:

      • Have a Wallaroo Enterprise license file. For more information, you can request a demonstration.
      • Set up a cloud Kubernetes environment that meets the requirements. Clusters must meet the following minimum specifications:
        • Minimum number of nodes: 4
        • Minimum Number of CPU Cores: 8
        • Minimum RAM: 16 GB
        • A total of 625 GB of storage will be allocated for the entire cluster based on 5 users with up to four pipelines with five steps per pipeline, with 50 GB allocated per node, including 50 GB specifically for the Jupyter Hub service. Enterprise users who deploy additional pipelines will require an additional 50 GB of storage per lab node deployed.
        • Runtime: containerd is required.
      • DNS services for integrating your Wallaroo Enterprise instance. See the DNS Integration Guide for the instructions on configuring Wallaroo Enterprise with your DNS services.

      Wallaroo Enterprise can be installed either interactively or automatically through the kubectl and kots applications.

      Automated Install

      To automatically install Wallaroo into the namespace wallaroo, specify the administrative password and the license file during the installation as in the following format with the following variables:

      • NAMESPACE: The namespace for the Wallaroo Enterprise install, typically wallaroo.
      • LICENSEFILE: The location of the Wallaroo Enterprise license file.
      • SHAREDPASSWORD: The password of for the Wallaroo Administrative Dashboard.
      kubectl kots install wallaroo/ee -n $NAMESPACE --license-file $LICENSEFILE --shared-password $SHAREDPASSWORD
      

      For example, the following settings translate to the following install command:

      • NAMESPACE: wallaroo.
      • LICENSEFILE: myWallaroolicense.yaml
      • SHAREDPASSWORD: snugglebunnies

      kubectl kots install wallaroo/ee -n wallaroo --license-file myWallaroolicense.yaml --shared-password wallaroo

      Interactive Install

      The Interactive Install process allows users to adjust the configuration settings before Wallaroo is deployed. It requires users be able to access the Wallaroo Administrative Dashboard through a browser, typically on port 8080.

      • IMPORTANT NOTE: Users who install Wallaroo through another node such as in the single node installation can port use SSH tunneling to access the Wallaroo Administrative Dashboard. For example:

        ssh IP -L8800:localhost:8800
        
      1. Install the Wallaroo Enterprise Edition using kots install wallaroo/ee, specifying the namespace to install Wallaroo into. For example, if wallaroo is the namespace, then the command is:

        kubectl kots install wallaroo/ee --namespace wallaroo
        
      2. Wallaroo Enterprise Edition will be downloaded and installed into your Kubernetes environment in the namespace specified. When prompted, set the default password for the Wallaroo environment. When complete, Wallaroo Enterprise Edition will display the URL for the Admin Console, and how to end the Admin Console from running.

        • Deploying Admin Console
        • Creating namespace ✓
        • Waiting for datastore to be ready ✓
            Enter a new password to be used for the Admin Console: •••••••••••••
          • Waiting for Admin Console to be ready ✓
        
        • Press Ctrl+C to exit
        • Go to http://localhost:8800 to access the Admin Console
        

      Configure Wallaroo

      Once installed, Wallaroo will continue to run until terminated.

      To relaunch the Wallaroo Administrative Dashboard and make changes or updates, use the following command:

      kubectl-kots admin-console --namespace wallaroo
      

      DNS Services

      Wallaroo Enterprise requires integration into your organizations DNS services.

      The DNS Integration Guide details adding the Wallaroo instance to an organizations DNS services.

      User Management

      User management is handled through the Wallaroo instance Keycloak service. See the Wallaroo User Management for full guides on setting up users, identity providers, and other user configuration options.