How to Upgrade Your Wallaroo Ops
Table of Contents
Organizations have the ability to upgrade Wallaroo to newer versions while keeping their workspaces, model uploads, ML workload orchestrations, and other artifacts intact. This process is fast and easy, allowing users to seamlessly resume their work with Wallaroo Ops without any changes.
The duration of a standard upgrade may vary from 30 to 60 minutes, depending on the size and quantity of workspaces and artifacts. Please note that services to the Wallaroo Ops instance will be temporarily interrupted during the upgrade process.
Upgrade Wallaroo follows this process:
- Pre-upgrade Checklist: Actions that should be performed before the upgrade process is initiated.
- Upgrade Procedure: Steps for upgrading Wallaroo to a specified version.
Pre-Upgrade Checklist
Before starting an upgrade of Wallaroo, the following steps should be performed to provide a smooth transition from the previous version of Wallaroo to the new one.
Complete (Y/N) | Action |
---|---|
Create Support Bundle | |
Notify Users of Downtime | |
Backup Wallaroo |
Create Support Bundle
A support bundle creates a collection of logs, configurations, and other information for Wallaroo support staff. This should be generated before the upgrade procedure starts to preserve a set of current settings and information useful to track any potential issues during the upgrade process.
Support bundles are generated from one of the two methods.
At any time, the administration console can create troubleshooting bundles for Wallaroo technical support to assess product health and help with problems. Support bundles contain logs and configuration files which can be examined before downloading and transmitting to Wallaroo. The console also has a configurable redaction mechanism in cases where sensitive information such as passwords, tokens, or PII (Personally Identifiable Information) need to be removed from logs in the bundle.
Create Support Bundles via the Wallaroo Administrator Dashboard
This process is for kots
based installations of Wallaroo.
This assumes that kubectl
and kots
have been installed in a terminal with administrative access to the Kubernetes cluster hosting the Wallaroo installation.
- Launch the Kots Administrative Dashboard with
kubectl kots admin-console --namespace $WALLAROO_NAMESPACE
, replacing$WALLAROO_NAMESPACE
with the namespace the Wallaroo instance is installed in. For example:kubectl kots admin-console --namespace wallaroo
. - Log into the administration console with the Administrative Dashboard password set during the installation process.
- Select the Troubleshoot tab.
- Select Analyze Wallaroo.
- Select
Download bundle
to save the bundle file as a compressed archive. Depending on your browser settings the file download location can be specified. - Send the file to Wallaroo technical support.
At any time, any existing bundle can be examined and downloaded from the Troubleshoot
tab.
Create Support Bundles via the Command Line
To generate a support bundle via the command line for either kots
or helm
based installations of Wallaroo, the following applications are used.
kubectl
kubectl
plugins:krew
: Install Krewkrew support-bundle
: Install withkubectl krew install support-bundle
.
This creates a collection of log files, configuration files and other details into a .tar.gz file in the same directory as the command is run from in the format support-bundle-YYYY-MM-DDTHH-MM-SS.tar.gz
. This file is submitted to the Wallaroo support team for review.
kubectl support-bundle --load-cluster-specs --interactive=false
Notify Users of Downtime
The following is a short list of users to notify before the Wallaroo Ops downtime. ALL users that interact with Wallaroo Ops should be informed; the following list is provided to help DevOps engineers to know what stakeholders to notify.
Stakeholder | Description | What Users will Experience |
---|---|---|
Wallaroo Dashboard Users | Users that interact via the Wallaroo Dashboard. | If users are active in the dashboard: If a user attempts to go to the dashboard in their browser: |
API users | These users interact with Wallaroo via the Wallaroo MLOps API or related API services. | MLOps API and other API services to the Wallaroo Ops instance will not be available during the upgrade and any requests will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.” |
External SDK users | Users who perform actions via the Wallaroo SDK that do not use the Wallaroo JupyterHub service. | Wallaroo SDK connections will not be available during the upgrade and any attempts to use the SDK with return an error. |
Wallaroo JupyterHub users | Users who use the Wallaroo JupyterHub Service to run JupyterNotebooks with the Wallaroo Ops instance. | If users are in JupyterHub at the time of the upgrade: If users attempt to go to JupyterHub during the upgrade: |
Deployed pipeline | Users who perform inferences through deployed pipelines. | During the upgrade process, deployed pipelines are undeployed for the upgrade process. Once the upgrade process is complete, any previously deployed pipelines are automatically redeployed. Any inference requests to deployed pipelines will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.” |
Edge and Multicloud Deployment Users | Users and services that perform inference requests and other services via models deployed to multicloud and edge locations. | Edge and multicloud deployments of Wallaroo are not interrupted. While an upgrade is in progress, no logs can be received by the Wallaroo Ops instance. Once connection is restored, edge locations upload their inference logs. |
ML Workload Orchestration Users | Scheduled ML Workload Orchestrations. | ML Workload Orchestrations scheduled tasks will be interrupted during the upgrade process. Scheduled task runs will be missed while the upgrade is in progress and will run at their next scheduled time after the upgrade completes. |
Backup Wallaroo
Before starting the upgrade procedure, backup the Wallaroo Ops instance. The following procedure summary is based on the provided Wallaroo Backup and Restore Guides.
Wallaroo Backup Procedure
Before starting the backup, force the Inference Log Storage service to complete writing logs so they can be captured by the backup. This assumes that Wallaroo was installed in the namespace
wallaroo
.kubectl -n wallaroo scale --replicas=0 deploy/plateau kubectl -n wallaroo scale --replicas=1 deploy/plateau
Set the
$BACKUP_NAME
. This must be all lowercase characters or numbers or-/.
and must end in alphanumeric characters.BACKUP_NAME={give it your own name}
Issue the following backup command. The
--exclude-namespaces
is used to exclude namespaces that are not required for the Wallaroo backup and restore. By default, these are the namespacesvelero
,default
,kube-node-lease
,kube-public
, andkube-system
.This process will back up all namespaces that are not excluded, including deployed Wallaroo pipelines. Add any other namespaces that should not be part of the backup to the
--exclude-namespaces
option as per your organization’s requirements.velero backup create $BACKUP_NAME --default-volumes-to-fs-backup --include-cluster-resources=true --exclude-namespaces velero,default,kube-node-lease,kube-public,kube-system
To view the status of the backup,
velero backup describe --details $BACKUP_NAME
. Once theCompleted
field shows a date and time, the backup is complete.In progress backup.
velero backup describe --details $BACKUP_NAME Name: sample-doctest-backup-20240502 Namespace: velero Labels: velero.io/storage-location=default Annotations: velero.io/resource-timeout=10m0s velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=28 Phase: InProgress Namespaces: Included: * Excluded: velero, default, kube-node-lease, kube-public, kube-system Resources: Included: * Excluded: <none> Cluster-scoped: included Label selector: <none> Or label selector: <none> Storage Location: default Velero-Native Snapshot PVs: auto Snapshot Move Data: false Data Mover: velero TTL: 720h0m0s CSISnapshotTimeout: 10m0s ItemOperationTimeout: 4h0m0s Hooks: <none> Backup Format Version: 1.1.0 Started: 2024-05-14 16:26:43 -0600 MDT Completed: <n/a> Expiration: 2024-06-13 16:26:43 -0600 MDT Estimated total items to be backed up: 1073 Items backed up so far: 28 Resource List: <backup resource list not found> Backup Volumes: Velero-Native Snapshots: <none included> CSI Snapshots: <none included or not detectable> Pod Volume Backups - kopia: Completed: gmp-system/alertmanager-0: alertmanager-config, alertmanager-data gmp-system/collector-cdsm4: config-out, storage gmp-system/collector-fslhc: config-out, storage gmp-system/collector-p6f85: config-out, storage gmp-system/collector-q4djj: config-out, storage gmp-system/rule-evaluator-7874c6f478-672vs: config-out wallaroo/hub-65c45d4c7-nb9lp: pvc wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp In Progress: wallaroo/minio-cf97d78cb-pv82x: export
Completed backup.
velero backup describe --details $BACKUP_NAME Name: sample-doctest-backup-20240502 Namespace: velero Labels: velero.io/storage-location=default Annotations: velero.io/resource-timeout=10m0s velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=28
Phase: Completed
Warnings:
Velero: <none>
Cluster: <none>
Namespaces:
wallaroo: resource: /pods name: /kotsadm-b4f68468d-dzj5c message: /volume migrations is declared in pod wallaroo/kotsadm-b4f68468d-dzj5c but not mounted by any container, skippingNamespaces:
Included: *
Excluded: velero, default, kube-node-lease, kube-public, kube-systemResources:
Included: *
Excluded: <none>
Cluster-scoped: includedLabel selector: <none>
Or label selector: <none>
Storage Location: default
Velero-Native Snapshot PVs: auto
Snapshot Move Data: false
Data Mover: veleroTTL: 720h0m0s
CSISnapshotTimeout: 10m0s
ItemOperationTimeout: 4h0m0sHooks: <none>
Backup Format Version: 1.1.0
Started: 2024-05-14 16:26:43 -0600 MDT
Completed: 2024-05-14 16:32:19 -0600 MDTExpiration: 2024-06-13 16:26:43 -0600 MDT
Total items to be backed up: 719
Items backed up: 719Resource List:
admissionregistration.k8s.io/v1/MutatingWebhookConfiguration:
- gmp-operator.gmp-system.monitoring.googleapis.com
- neg-annotation.config.common-webhooks.networking.gke.io
- pod-ready.config.common-webhooks.networking.gke.io
- warden-mutating.config.common-webhooks.networking.gke.io…Other backed up resources
warden.gke.io/v1/Audit:
- autogke-default-linux-capabilities
- autogke-disallow-hostnamespaces
- autogke-disallow-privilege
- autogke-no-host-port
- autogke-no-write-mode-hostpath
- autogke-node-affinity-selector-limitation
- autogke-pod-affinity-limitation
- autopilot-admission-webhook-config-limitation
- autopilot-capacity-request-limitation
- autopilot-external-ip-limitation
- autopilot-no-ephemeral-containers
- autopilot-persistent-volume-limitation
- autopilot-volume-type-limitationBackup Volumes:
Velero-Native Snapshots: <none included>CSI Snapshots: <none included>
Pod Volume Backups - kopia:
Completed:
gmp-system/alertmanager-0: alertmanager-config, alertmanager-data
gmp-system/collector-cdsm4: config-out, storage
gmp-system/collector-fslhc: config-out, storage
gmp-system/collector-p6f85: config-out, storage
gmp-system/collector-q4djj: config-out, storage
gmp-system/rule-evaluator-7874c6f478-672vs: config-out
wallaroo/hub-65c45d4c7-nb9lp: pvc
wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp
wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir
wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp
wallaroo/minio-cf97d78cb-pv82x: export
wallaroo/nats-0: nats-js, pid
wallaroo/plateau-7dfbd89655-9xz6v: plateau-storage
wallaroo/postgres-74d6948c48-mjmb5: postgres-storage
wallaroo/prometheus-deployment-666d968bfd-cxp46: alert-config-volume, metrics-storage-volume
wallaroo/wallsvc-0: socket-volume, spire-dataHooksAttempted: 1
HooksFailed: 0
Upgrade Procedure
Depending on the size and number of workspaces and artifacts, a typical upgrade can take 30-60 minutes. Select one of the following options based on the Wallaroo Install Process:
- Install via
kots
: Select Upgrade via Kots - Install via
helm
: Select Upgrade via Helm
Upgrade via Kots
The following procedure is used to upgrade a Wallaroo Ops instance via kots
.
Kubernetes and Kots Client Software Prerequisites
Before installing or upgrading Wallaroo, the administrative node managing the Kubernetes cluster will require these tools.
- kubectl
- For Kots based installs:
- kots Version
1.107.2
- kots Version
- For Helm installs:
helm
: Install Helm- Minimum supported version: Helm 3.11.2
krew
: Install Krewkrew preflight
andkrew support-bundle
. Install with the following commands:kubectl krew install support-bundle
kubectl krew install preflight
- For Kots based installs:
The following are quick guides for installing kubectl
for macOS.
To install kubectl
on a macOS system using Homebrew:
Issue the
brew install
command:brew install kubectl
Verify the installation:
kubectl version --client
Upgrade via Kots Procedure
To upgrade a kots
based installation of Wallaroo:
From a terminal shell with administrative access to the Kubernetes cluster hosting Wallaroo, launch the Kots Administrative Dashboard via the following command:
kubectl kots admin-console --namespace $NAMESPACE
Replacing
$NAMESPACE
with the name of the namespace the Wallaroo Ops center is installed in, which iswallaroo
by default. For example:kubectl kots admin-console --namespace wallaroo • Press Ctrl+C to exit • Go to http://localhost:8800 to access the Admin Console
Access the Kots Administrative Dashboard via the domain name and port as provided in the previous step.
From the Kots Administrative Dashboard:
If there is a new version of Wallaroo to install based on your Wallaroo license type, it will be displayed under the Version (B) display as New Version Available. Select Check for updates to check for updated versions.
Select the version to upgrade to.
To perform a preflight check, select the preflight icon and verify the cluster meets the requirements.
If ready to upgrade, select Deploy (C).
Verify the upgrade process by selecting Yes, Deploy.
During the upgrade process, the status indicator (A) changes from Ready to Unavailable. Selecting Details will show which services are available or are still being upgraded.
When the upgrade process is complete, the status indicator will change to Ready. At this point, users can resume their normal operations.
Upgrade via Helm
The following procedure is used to upgrade a Wallaroo Ops instance via helm
.
Helm Client Software Prerequisites
- For Helm installs:
helm
: Install Helm- Minimum supported version: Helm 3.11.2
krew
: Install Krewkrew preflight
andkrew support-bundle
. Install with the following commands:kubectl krew install support-bundle
kubectl krew install preflight
Upgrade via Helm Procedure
To upgrade a helm
based installation of Wallaroo:
- From a Wallaroo Support representative, retrieve the following:
- The license channel. This will be in the form of
oci://registry.replicated.com/wallaroo/$CHANNEL/wallaroo
, where$CHANNEL
represents the channel type. For example, ``oci://registry.replicated.com/wallaroo/2024-2/wallaroo`. - The version to be upgrade to. For example:
2024.2.1
. - OCI Registry login. This will be in the format:
helm registry login registry.replicated.com --username $YOURUSERNAME --password $YOURPASSWORD
- Helm Release Name: This was determined during the Wallaroo Install process.
- The license channel. This will be in the form of
- Prepare the
local-values.yaml
file that will store the essential configurations options. It is highly recommended to use the samelocal-values.yaml
file used during the Wallaroo installation for minimum changes. The following is an example of thelocal-values.yaml
file settings. See Wallaroo Helm Reference Guides for additional settings.
wallarooDomain: "wallaroo.example.com" # change to match the domain name
custTlsSecretName: cust-cert-secret
apilb:
serviceType: LoadBalancer
external_inference_endpoints_enabled: true
ingress_mode: internal # internal (Default), external,or none
dashboard:
clientName: "Wallaroo Helm Example" # Insert the name displayed in the Wallaroo Dashboard
kubernetes_distribution: "" # Required. One of: aks, eks, gke, oke, or kurl.
From a terminal with helm
and administrative access to the Kubernetes cluster Wallaroo Ops is installed to, perform the following:
Login to the OCI registry, replacing
$YOURUSERNAME
and$YOURPASSWORD
with the ones provided by Wallaroo:helm registry login registry.replicated.com --username $YOURUSERNAME --password $YOURPASSWORD
Set the default Kubernetes namespace to the one used for the Wallaroo installation. By default, the namespace
wallaroo
is used. For example:kubectl config set-context --current --namespace wallaroo
Perform the preflight check. Preflight verification is performed with the following command format. The variables
$LICENSE_CHANNEL
and$VERSION
is supplied by your Wallaroo support representative.helm template --is-upgrade \ oci://registry.replicated.com/wallaroo/$LICENSE_CHANNEL/wallaroo --version $VERSION \ | kubectl preflight -
For example, the
$LICENSE_CHANNEL=2024-2
and the$VERSION=2024.2.1
helm template --is-upgrade \ oci://registry.replicated.com/wallaroo/2024-2/wallaroo --version 2024.2.1 \ | kubectl preflight -
This displays the Preflight Checks report. Verify that all checks are completed successfully before proceeding.
Perform the upgrade with the following command, replacing the following:
$LICENSE_CHANNEL
: The channel for the installation upgrade.$VERSION
: The version to upgrade to.$RELEASE
: Thehelm
name of the installation release. This does not need to match the name of the release already used. By default,wallaroo
.
helm upgrade --install --version $VERSION \ --wait --timeout 10m \ --values local-values.yaml \ --debug \ $RELEASE \ oci://registry.replicated.com/wallaroo/$LICENSE_CHANNEL/wallaroo
For example, the
$LICENSE_CHANNEL=2024-2
,$VERSION=2024.2.1
, and$RELEASE=wallaroo
, the command would be:helm upgrade --install --version 2024.2.1 \ --wait --timeout 10m \ --values local-values.yaml \ --debug \ wallaroo \ oci://registry.replicated.com/wallaroo/2024-2/wallaroo
Once the installation is complete, verify the installation with the
helm test $RELEASE
command. With the settings above, this would be:helm test wallaroo
A successful upgrade will resemble the following:
NAME: wallaroo LAST DEPLOYED: Thu Apr 11 09:56:17 2024 NAMESPACE: default STATUS: pending-upgrade REVISION: 2 TEST SUITE: wallaroo-fluent-bit-test-connection Last Started: Thu Apr 11 10:03:52 2024 Last Completed: Thu Apr 11 10:03:56 2024 Phase: Succeeded TEST SUITE: nats-test-request-reply Last Started: Thu Apr 11 10:03:44 2024 Last Completed: Thu Apr 11 10:03:52 2024 Phase: Succeeded TEST SUITE: wallaroo-test-connections-hook Last Started: Thu Apr 11 10:03:56 2024 Last Completed: Thu Apr 11 10:06:11 2024 Phase: Succeeded TEST SUITE: wallaroo-test-objects-hook Last Started: Thu Apr 11 10:06:12 2024 Last Completed: Thu Apr 11 10:06:21 2024 Phase: Succeeded