How to Upgrade Your Wallaroo Ops

Organizations can upgrade Wallaroo to new versions, preserving their workspaces, model uploads, ML workload orchestrations, and other artifacts. The process is quick, simple, and users can return to working with Wallaroo Ops with everything the same as when they left it.

Depending on the size and number of workspaces and artifacts, a typical upgrade can take 30-60 minutes. During the upgrade procedure, services to the Wallaroo Ops instance are interrupted.

Upgrade Wallaroo follows this process:

Pre-Upgrade Checklist

Before starting an upgrade of Wallaroo, the following steps should be performed to provide a smooth transition from the previous version of Wallaroo to the new one.

Complete (Y/N)Action
Create Support Bundle
Notify Users of Downtime
Backup Wallaroo

Create Support Bundle

A support bundle creates a collection of logs, configurations, and other information for Wallaroo support staff. This should be generated before the upgrade procedure starts to preserve a set of current settings and information useful to track any potential issues during the upgrade process.

Support bundles are generated from one of the two methods.

At any time, the administration console can create troubleshooting bundles for Wallaroo technical support to assess product health and help with problems. Support bundles contain logs and configuration files which can be examined before downloading and transmitting to Wallaroo. The console also has a configurable redaction mechanism in cases where sensitive information such as passwords, tokens, or PII (Personally Identifiable Information) need to be removed from logs in the bundle.

Status Ready

Create Support Bundles via the Wallaroo Administrator Dashboard

This process is for kots based installations of Wallaroo.

This assumes that kubectl and kots have been installed in a terminal with administrative access to the Kubernetes cluster hosting the Wallaroo installation.

  1. Launch the Kots Administrative Dashboard with kubectl kots admin-console --namespace $WALLAROO_NAMESPACE, replacing $WALLAROO_NAMESPACE with the namespace the Wallaroo instance is installed in. For example: kubectl kots admin-console --namespace wallaroo.
  2. Log into the administration console with the Administrative Dashboard password set during the installation process.
  3. Select the Troubleshoot tab.
  4. Select Analyze Wallaroo.
  5. Select Download bundle to save the bundle file as a compressed archive. Depending on your browser settings the file download location can be specified.
  6. Send the file to Wallaroo technical support.

At any time, any existing bundle can be examined and downloaded from the Troubleshoot tab.

Create Support Bundles via the Command Line

To generate a support bundle via the command line for either kots or helm based installations of Wallaroo, the following applications are used.

  • kubectl
  • kubectl plugins:
    • krew: Install Krew
    • krew support-bundle: Install with kubectl krew install support-bundle.

This creates a collection of log files, configuration files and other details into a .tar.gz file in the same directory as the command is run from in the format support-bundle-YYYY-MM-DDTHH-MM-SS.tar.gz. This file is submitted to the Wallaroo support team for review.

kubectl support-bundle --load-cluster-specs  --interactive=false

Notify Users of Downtime

The following is a short list of users to notify before the Wallaroo Ops downtime. ALL users that interact with Wallaroo Ops should be informed; the following list is provided to help DevOps engineers to know what stakeholders to notify.

StakeholderDescriptionWhat Users will Experience
Wallaroo Dashboard UsersUsers that interact via the Wallaroo Dashboard.If users are active in the dashboard:

If a user attempts to go to the dashboard in their browser:
API usersThese users interact with Wallaroo via the Wallaroo MLOps API or related API services.MLOps API and other API services to the Wallaroo Ops instance will not be available during the upgrade and any requests will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.”
External SDK usersUsers who perform actions via the Wallaroo SDK that do not use the Wallaroo JupyterHub service.Wallaroo SDK connections will not be available during the upgrade and any attempts to use the SDK with return an error.
Wallaroo JupyterHub usersUsers who use the Wallaroo JupyterHub Service to run JupyterNotebooks with the Wallaroo Ops instance.If users are in JupyterHub at the time of the upgrade:

If users attempt to go to JupyterHub during the upgrade:
Deployed pipelineUsers who perform inferences through deployed pipelines.During the upgrade process, deployed pipelines are undeployed for the upgrade process. Once the upgrade process is complete, any previously deployed pipelines are automatically redeployed. Any inference requests to deployed pipelines will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.”
Edge and Multicloud Deployment UsersUsers and services that perform inference requests and other services via models deployed to multicloud and edge locations.Edge and multicloud deployments of Wallaroo are not interrupted. While an upgrade is in progress, no logs can be received by the Wallaroo Ops instance. Once connection is restored, edge locations upload their inference logs.
ML Workload Orchestration UsersScheduled ML Workload Orchestrations.ML Workload Orchestrations scheduled tasks will be interrupted during the upgrade process. Scheduled task runs will be missed while the upgrade is in progress and will run at their next scheduled time after the upgrade completes.

Backup Wallaroo

Before starting the upgrade procedure, backup the Wallaroo Ops instance. The following procedure summary is based on the provided Wallaroo Backup and Restore Guides.

Wallaroo Backup Procedure

  1. Before starting the backup, force the Plateau service to complete writing logs so they can be captured by the backup. This assumes that Wallaroo was installed in the namespace wallaroo.

    kubectl -n wallaroo scale --replicas=0 deploy/plateau
    kubectl -n wallaroo scale --replicas=1 deploy/plateau
    
  2. Set the $BACKUP_NAME. This must be all lowercase characters or numbers or -/. and must end in alphanumeric characters.

    BACKUP_NAME={give it your own name}
    
  3. Issue the following backup command. The --exclude-namespaces is used to exclude namespaces that are not required for the Wallaroo backup and restore. By default, these are the namespaces velero, default, kube-node-lease, kube-public, and kube-system.

    This process will back up all namespaces that are not excluded, including deployed Wallaroo pipelines. Add any other namespaces that should not be part of the backup to the --exclude-namespaces option as per your organization’s requirements.

    velero backup create $BACKUP_NAME --default-volumes-to-fs-backup --include-cluster-resources=true --exclude-namespaces velero,default,kube-node-lease,kube-public,kube-system
    
  4. To view the status of the backup, velero backup describe --details $BACKUP_NAME. Once the Completed field shows a date and time, the backup is complete.

    In progress backup.

    velero backup describe --details $BACKUP_NAME
    Name:         sample-doctest-backup-20240502
    Namespace:    velero
    Labels:       velero.io/storage-location=default
    Annotations:  velero.io/resource-timeout=10m0s
                  velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000
                  velero.io/source-cluster-k8s-major-version=1
                  velero.io/source-cluster-k8s-minor-version=28
    
    Phase:  InProgress
    
    
    Namespaces:
      Included:  *
      Excluded:  velero, default, kube-node-lease, kube-public, kube-system
    
    Resources:
      Included:        *
      Excluded:        <none>
      Cluster-scoped:  included
    
    Label selector:  <none>
    
    Or label selector:  <none>
    
    Storage Location:  default
    
    Velero-Native Snapshot PVs:  auto
    Snapshot Move Data:          false
    Data Mover:                  velero
    
    TTL:  720h0m0s
    
    CSISnapshotTimeout:    10m0s
    ItemOperationTimeout:  4h0m0s
    
    Hooks:  <none>
    
    Backup Format Version:  1.1.0
    
    Started:    2024-05-14 16:26:43 -0600 MDT
    Completed:  <n/a>
    
    Expiration:  2024-06-13 16:26:43 -0600 MDT
    
    Estimated total items to be backed up:  1073
    Items backed up so far:                 28
    
    Resource List:  <backup resource list not found>
    
    Backup Volumes:
      Velero-Native Snapshots: <none included>
    
      CSI Snapshots: <none included or not detectable>
    
      Pod Volume Backups - kopia:
        Completed:
          gmp-system/alertmanager-0: alertmanager-config, alertmanager-data
          gmp-system/collector-cdsm4: config-out, storage
          gmp-system/collector-fslhc: config-out, storage
          gmp-system/collector-p6f85: config-out, storage
          gmp-system/collector-q4djj: config-out, storage
          gmp-system/rule-evaluator-7874c6f478-672vs: config-out
          wallaroo/hub-65c45d4c7-nb9lp: pvc
          wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp
          wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir
          wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp
        In Progress:
          wallaroo/minio-cf97d78cb-pv82x: export
    

    Completed backup.

    
    velero backup describe --details $BACKUP_NAME
    Name:         sample-doctest-backup-20240502
    Namespace:    velero
    Labels:       velero.io/storage-location=default
    Annotations:  velero.io/resource-timeout=10m0s
                  velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000
                  velero.io/source-cluster-k8s-major-version=1
                  velero.io/source-cluster-k8s-minor-version=28
    

    Phase: Completed

    Warnings:
    Velero: <none>
    Cluster: <none>
    Namespaces:
    wallaroo: resource: /pods name: /kotsadm-b4f68468d-dzj5c message: /volume migrations is declared in pod wallaroo/kotsadm-b4f68468d-dzj5c but not mounted by any container, skipping

    Namespaces:
    Included: *
    Excluded: velero, default, kube-node-lease, kube-public, kube-system

    Resources:
    Included: *
    Excluded: <none>
    Cluster-scoped: included

    Label selector: <none>

    Or label selector: <none>

    Storage Location: default

    Velero-Native Snapshot PVs: auto
    Snapshot Move Data: false
    Data Mover: velero

    TTL: 720h0m0s

    CSISnapshotTimeout: 10m0s
    ItemOperationTimeout: 4h0m0s

    Hooks: <none>

    Backup Format Version: 1.1.0

    Started: 2024-05-14 16:26:43 -0600 MDT
    Completed: 2024-05-14 16:32:19 -0600 MDT

    Expiration: 2024-06-13 16:26:43 -0600 MDT

    Total items to be backed up: 719
    Items backed up: 719

    Resource List:
    admissionregistration.k8s.io/v1/MutatingWebhookConfiguration:
    - gmp-operator.gmp-system.monitoring.googleapis.com
    - neg-annotation.config.common-webhooks.networking.gke.io
    - pod-ready.config.common-webhooks.networking.gke.io
    - warden-mutating.config.common-webhooks.networking.gke.io

    …Other backed up resources

    warden.gke.io/v1/Audit:
    - autogke-default-linux-capabilities
    - autogke-disallow-hostnamespaces
    - autogke-disallow-privilege
    - autogke-no-host-port
    - autogke-no-write-mode-hostpath
    - autogke-node-affinity-selector-limitation
    - autogke-pod-affinity-limitation
    - autopilot-admission-webhook-config-limitation
    - autopilot-capacity-request-limitation
    - autopilot-external-ip-limitation
    - autopilot-no-ephemeral-containers
    - autopilot-persistent-volume-limitation
    - autopilot-volume-type-limitation

    Backup Volumes:
    Velero-Native Snapshots: <none included>

    CSI Snapshots: <none included>

    Pod Volume Backups - kopia:
    Completed:
    gmp-system/alertmanager-0: alertmanager-config, alertmanager-data
    gmp-system/collector-cdsm4: config-out, storage
    gmp-system/collector-fslhc: config-out, storage
    gmp-system/collector-p6f85: config-out, storage
    gmp-system/collector-q4djj: config-out, storage
    gmp-system/rule-evaluator-7874c6f478-672vs: config-out
    wallaroo/hub-65c45d4c7-nb9lp: pvc
    wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp
    wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir
    wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp
    wallaroo/minio-cf97d78cb-pv82x: export
    wallaroo/nats-0: nats-js, pid
    wallaroo/plateau-7dfbd89655-9xz6v: plateau-storage
    wallaroo/postgres-74d6948c48-mjmb5: postgres-storage
    wallaroo/prometheus-deployment-666d968bfd-cxp46: alert-config-volume, metrics-storage-volume
    wallaroo/wallsvc-0: socket-volume, spire-data

    HooksAttempted: 1
    HooksFailed: 0

Upgrade Procedure

Depending on the size and number of workspaces and artifacts, a typical upgrade can take 30-60 minutes. Select one of the following options based on the Wallaroo Install Process:

Upgrade via Kots

The following procedure is used to upgrade a Wallaroo Ops instance via kots.

Kubernetes and Kots Client Software Prerequisites

Before installing or upgrading Wallaroo, the administrative node managing the Kubernetes cluster will require these tools.

  • kubectl
    • For Kots based installs:
      • kots Version 1.107.2

    • For Helm installs:
      • helm: Install Helm
      • krew: Install Krew
      • krew preflight and krew support-bundle. Install with the following commands:
        • kubectl krew install support-bundle
        • kubectl krew install preflight

The following are quick guides for installing kubectl for macOS.

To install kubectl on a macOS system using Homebrew:

  1. Issue the brew install command:

    brew install kubectl
    
  2. Verify the installation:

    kubectl version --client
    

Upgrade via Kots Procedure

To upgrade a kots based installation of Wallaroo:

  1. From a terminal shell with administrative access to the Kubernetes cluster hosting Wallaroo, launch the Kots Administrative Dashboard via the following command:

    kubectl kots admin-console --namespace $NAMESPACE
    

    Replacing $NAMESPACE with the name of the namespace the Wallaroo Ops center is installed in, which is wallaroo by default. For example:

    kubectl kots admin-console --namespace wallaroo
    • Press Ctrl+C to exit
    • Go to http://localhost:8800 to access the Admin Console
    
  2. Access the Kots Administrative Dashboard via the domain name and port as provided in the previous step.

  3. From the Kots Administrative Dashboard:

    1. If there is a new version of Wallaroo to install based on your Wallaroo license type, it will be displayed under the Version (B) display as New Version Available. Select Check for updates to check for updated versions.

    2. Select the version to upgrade to.

    3. To perform a preflight check, select the preflight icon and verify the cluster meets the requirements.

    4. If ready to upgrade, select Deploy (C).

    5. Verify the upgrade process by selecting Yes, Deploy.

  4. During the upgrade process, the status indicator (A) changes from Ready to Unavailable. Selecting Details will show which services are available or are still being upgraded.

  5. When the upgrade process is complete, the status indicator will change to Ready. At this point, users can resume their normal operations.

Upgrade via Helm

The following procedure is used to upgrade a Wallaroo Ops instance via helm.

Helm Client Software Prerequisites

  • For Helm installs:
    • helm: Install Helm
    • krew: Install Krew
    • krew preflight and krew support-bundle. Install with the following commands:
      • kubectl krew install support-bundle
      • kubectl krew install preflight

Upgrade via Helm Procedure

To upgrade a helm based installation of Wallaroo:

  1. From a Wallaroo Support representative, retrieve the following:
    1. The license channel. This will be in the form of oci://registry.replicated.com/wallaroo/$CHANNEL/wallaroo, where $CHANNEL represents the channel type. For example, ``oci://registry.replicated.com/wallaroo/2024-1/wallaroo`.
    2. The version to be upgrade to. For example: 2024.1.0-5097.
    3. OCI Registry login. This will be in the format: helm registry login registry.replicated.com --username $YOURUSERNAME --password $YOURPASSWORD
    4. Helm Release Name: This was determined during the Wallaroo Install process.
  2. Prepare the local-values.yaml file that will store the essential configurations options. It is highly recommended to use the same local-values.yaml file used during the Wallaroo installation for minimum changes. The following is an example of the local-values.yaml file settings. See Wallaroo Helm Reference Guides for additional settings.
domainPrefix: "" # optional if using a DNS Prefix
domainSuffix: "wallaroo.example.com"

custTlsSecretName: cust-cert-secret

apilb:
  serviceType: LoadBalancer
  external_inference_endpoints_enabled: true
  ingress_mode: internal # internal (Default), external,or none

dashboard:
  clientName: "Wallaroo Helm Example" # Insert the name displayed in the Wallaroo Dashboard

kubernetes_distribution: ""   # Required. One of: aks, eks, gke, oke, or kurl.

From a terminal with helm and administrative access to the Kubernetes cluster Wallaroo Ops is installed to, perform the following:

  1. Login to the OCI registry, replacing $YOURUSERNAME and $YOURPASSWORD with the ones provided by Wallaroo:

    helm registry login registry.replicated.com --username $YOURUSERNAME --password $YOURPASSWORD
    
  2. Set the default Kubernetes namespace to the one used for the Wallaroo installation. By default, the namespace wallaroo is used. For example:

    kubectl config set-context --current --namespace wallaroo
    
  3. Perform the preflight check. Preflight verification is performed with the following command format. The variables $LICENSE_CHANNEL and $VERSION is supplied by your Wallaroo support representative.

    helm template --is-upgrade \
    oci://registry.replicated.com/wallaroo/$LICENSE_CHANNEL/wallaroo --version $VERSION \
    | kubectl preflight -
    

    For example, the $LICENSE_CHANNEL=2024-1 and the $VERSION=2024.1.0-5097

    helm template --is-upgrade \
    oci://registry.replicated.com/wallaroo/2024-1/wallaroo --version 2024.1.0-5097 \
    | kubectl preflight -
    

    This displays the Preflight Checks report. Verify that all checks are completed successfully before proceeding.

  4. Perform the upgrade with the following command, replacing the following:

    • $LICENSE_CHANNEL: The channel for the installation upgrade.
    • $VERSION: The version to upgrade to.
    • $RELEASE: The helm name of the installation release. This does not need to match the name of the release already used. By default, wallaroo.
    helm upgrade --install --version $VERSION \
    --wait --timeout 500s \
    --values local-values.yaml \
    --debug \
    $RELEASE \
    oci://registry.replicated.com/wallaroo/$LICENSE_CHANNEL/wallaroo
    

    For example, the $LICENSE_CHANNEL=2024-1, $VERSION=2024.1.0-5097, and $RELEASE=wallaroo, the command would be:

    helm upgrade --install --version 2024.1.0-5097 \
    --wait --timeout 500s \
    --values local-values.yaml \
    --debug \
    wallaroo \
    oci://registry.replicated.com/wallaroo/2024-1/wallaroo
    
  5. Once the installation is complete, verify the installation with the helm test $RELEASE command. With the settings above, this would be:

    helm test wallaroo
    

    A successful upgrade will resemble the following:

    NAME: wallaroo
    LAST DEPLOYED: Thu Apr 11 09:56:17 2024
    NAMESPACE: default
    STATUS: pending-upgrade
    REVISION: 2
    TEST SUITE:     wallaroo-fluent-bit-test-connection
    Last Started:   Thu Apr 11 10:03:52 2024
    Last Completed: Thu Apr 11 10:03:56 2024
    Phase:          Succeeded
    TEST SUITE:     nats-test-request-reply
    Last Started:   Thu Apr 11 10:03:44 2024
    Last Completed: Thu Apr 11 10:03:52 2024
    Phase:          Succeeded
    TEST SUITE:     wallaroo-test-connections-hook
    Last Started:   Thu Apr 11 10:03:56 2024
    Last Completed: Thu Apr 11 10:06:11 2024
    Phase:          Succeeded
    TEST SUITE:     wallaroo-test-objects-hook
    Last Started:   Thu Apr 11 10:06:12 2024
    Last Completed: Thu Apr 11 10:06:21 2024
    Phase:          Succeeded