Upgrade Prerequisites
Table of Contents
Pre-Upgrade Checklist
Before starting an upgrade of Wallaroo, the following steps should be performed to provide a smooth transition from the previous version of Wallaroo to the new one.
- Create Support Bundle
- Notify Users of Downtime
- Backup Wallaroo
Create Support Bundle
A support bundle creates a collection of logs, configurations, and other information for Wallaroo support staff. This should be generated before the upgrade procedure starts to preserve a set of current settings and information useful to track any potential issues during the upgrade process.
Support bundles are generated from one of the two methods.
At any time, the administration console can create troubleshooting bundles for Wallaroo technical support to assess product health and help with problems. Support bundles contain logs and configuration files which can be examined before downloading and transmitting to Wallaroo. The console also has a configurable redaction mechanism in cases where sensitive information such as passwords, tokens, or PII (Personally Identifiable Information) need to be removed from logs in the bundle.

Create Support Bundles via the Wallaroo Administrator Dashboard
This process is for kots
based installations of Wallaroo.
This assumes that kubectl
and kots
have been installed in a terminal with administrative access to the Kubernetes cluster hosting the Wallaroo installation.
- Launch the Kots Administrative Dashboard with
kubectl kots admin-console --namespace $WALLAROO_NAMESPACE
, replacing$WALLAROO_NAMESPACE
with the namespace the Wallaroo instance is installed in. For example:kubectl kots admin-console --namespace wallaroo
. - Log into the administration console with the Administrative Dashboard password set during the installation process.
- Select the Troubleshoot tab.
- Select Analyze Wallaroo.
- Select
Download bundle
to save the bundle file as a compressed archive. Depending on your browser settings the file download location can be specified. - Send the file to Wallaroo technical support.
At any time, any existing bundle can be examined and downloaded from the Troubleshoot
tab.
Create Support Bundles via the Command Line
To generate a support bundle via the command line for either kots
or helm
based installations of Wallaroo, the following applications are used.
kubectl
kubectl
plugins:krew
: Install Krewkrew support-bundle
: Install withkubectl krew install support-bundle
.
This creates a collection of log files, configuration files and other details into a .tar.gz file in the same directory as the command is run from in the format support-bundle-YYYY-MM-DDTHH-MM-SS.tar.gz
. This file is submitted to the Wallaroo support team for review.
kubectl support-bundle --load-cluster-specs --interactive=false
Notify Users of Downtime
The following is a short list of users to notify before the Wallaroo Ops downtime. ALL users that interact with Wallaroo Ops should be informed; the following list is provided to help DevOps engineers to know what stakeholders to notify.
Stakeholder | Description | What Users will Experience |
---|---|---|
Wallaroo Dashboard Users | Users that interact via the Wallaroo Dashboard. | If users are active in the dashboard:![]() If a user attempts to go to the dashboard in their browser: ![]() |
API users | These users interact with Wallaroo via the Wallaroo MLOps API or related API services. | MLOps API and other API services to the Wallaroo Ops instance will not be available during the upgrade and any requests will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.” |
External SDK users | Users who perform actions via the Wallaroo SDK that do not use the Wallaroo JupyterHub service. | Wallaroo SDK connections will not be available during the upgrade and any attempts to use the SDK with return an error. |
Wallaroo JupyterHub users | Users who use the Wallaroo JupyterHub Service to run JupyterNotebooks with the Wallaroo Ops instance. | If users are in JupyterHub at the time of the upgrade:![]() If users attempt to go to JupyterHub during the upgrade: ![]() |
Deployed pipeline | Users who perform inferences through deployed pipelines. | During the upgrade process, deployed pipelines are undeployed for the upgrade process. Once the upgrade process is complete, any previously deployed pipelines are automatically redeployed. Any inference requests to deployed pipelines will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.” |
Edge and Multi-cloud Deployment Users | Users and services that perform inference requests and other services via models deployed to multicloud and edge locations. | Edge and multicloud deployments of Wallaroo are not interrupted. While an upgrade is in progress, no logs can be received by the Wallaroo Ops instance. Once connection is restored, edge locations upload their inference logs. |
Wallaroo Orchestration Users | Scheduled Wallaroo orchestrations. | Wallaroo orchestrations scheduled tasks will be interrupted during the upgrade process. Scheduled task runs will be missed while the upgrade is in progress and will run at their next scheduled time after the upgrade completes. |
Backup Wallaroo
Before starting the upgrade procedure, backup the Wallaroo Ops instance. The following procedure summary is based on the provided Wallaroo Backup and Restore Guides.
Wallaroo Backup Procedure
Before starting the backup, force the Plateau service to complete writing logs so they can be captured by the backup. This assumes that Wallaroo was installed in the namespace
wallaroo
.kubectl -n wallaroo scale --replicas=0 deploy/plateau kubectl -n wallaroo scale --replicas=1 deploy/plateau
Set the
$BACKUP_NAME
. This must be all lowercase characters or numbers or-/.
and must end in alphanumeric characters.BACKUP_NAME={give it your own name}
Issue the following backup command. The
--exclude-namespaces
is used to exclude namespaces that are not required for the Wallaroo backup and restore. By default, these are the namespacesvelero
,default
,kube-node-lease
,kube-public
, andkube-system
.This process will back up all namespaces that are not excluded, including deployed Wallaroo pipelines. Add any other namespaces that should not be part of the backup to the
--exclude-namespaces
option as per your organization’s requirements.velero backup create $BACKUP_NAME --default-volumes-to-fs-backup --include-cluster-resources=true --exclude-namespaces velero,default,kube-node-lease,kube-public,kube-system
To view the status of the backup,
velero backup describe --details $BACKUP_NAME
. Once theCompleted
field shows a date and time, the backup is complete.The following shows an
InProgress
backup:velero backup describe --details $BACKUP_NAME
Name: sample-doctest-backup-20240502 Namespace: velero Labels: velero.io/storage-location=default Annotations: velero.io/resource-timeout=10m0s velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=28 Phase: InProgress Namespaces: Included: * Excluded: velero, default, kube-node-lease, kube-public, kube-system Resources: Included: * Excluded: <none> Cluster-scoped: included Label selector: <none> Or label selector: <none> Storage Location: default Velero-Native Snapshot PVs: auto Snapshot Move Data: false Data Mover: velero TTL: 720h0m0s CSISnapshotTimeout: 10m0s ItemOperationTimeout: 4h0m0s Hooks: <none> Backup Format Version: 1.1.0 Started: 2024-05-14 16:26:43 -0600 MDT Completed: <n/a> Expiration: 2024-06-13 16:26:43 -0600 MDT Estimated total items to be backed up: 1073 Items backed up so far: 28 Resource List: <backup resource list not found> Backup Volumes: Velero-Native Snapshots: <none included> CSI Snapshots: <none included or not detectable> Pod Volume Backups - kopia: Completed: gmp-system/alertmanager-0: alertmanager-config, alertmanager-data gmp-system/collector-cdsm4: config-out, storage gmp-system/collector-fslhc: config-out, storage gmp-system/collector-p6f85: config-out, storage gmp-system/collector-q4djj: config-out, storage gmp-system/rule-evaluator-7874c6f478-672vs: config-out wallaroo/hub-65c45d4c7-nb9lp: pvc wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp In Progress: wallaroo/minio-cf97d78cb-pv82x: export
The following shows a
Completed
backup.velero backup describe --details $BACKUP_NAME
Name: sample-doctest-backup-20240502 Namespace: velero Labels: velero.io/storage-location=default Annotations: velero.io/resource-timeout=10m0s velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000 velero.io/source-cluster-k8s-major-version=1 velero.io/source-cluster-k8s-minor-version=28 Phase: Completed Warnings: Velero: <none> Cluster: <none> Namespaces: wallaroo: resource: /pods name: /kotsadm-b4f68468d-dzj5c message: /volume migrations is declared in pod wallaroo/kotsadm-b4f68468d-dzj5c but not mounted by any container, skipping Namespaces: Included: * Excluded: velero, default, kube-node-lease, kube-public, kube-system Resources: Included: * Excluded: <none> Cluster-scoped: included Label selector: <none> Or label selector: <none> Storage Location: default Velero-Native Snapshot PVs: auto Snapshot Move Data: false Data Mover: velero TTL: 720h0m0s CSISnapshotTimeout: 10m0s ItemOperationTimeout: 4h0m0s Hooks: <none> Backup Format Version: 1.1.0 Started: 2024-05-14 16:26:43 -0600 MDT Completed: 2024-05-14 16:32:19 -0600 MDT Expiration: 2024-06-13 16:26:43 -0600 MDT Total items to be backed up: 719 Items backed up: 719 Resource List: admissionregistration.k8s.io/v1/MutatingWebhookConfiguration: - gmp-operator.gmp-system.monitoring.googleapis.com - neg-annotation.config.common-webhooks.networking.gke.io - pod-ready.config.common-webhooks.networking.gke.io - warden-mutating.config.common-webhooks.networking.gke.io ...Other backed up resources warden.gke.io/v1/Audit: - autogke-default-linux-capabilities - autogke-disallow-hostnamespaces - autogke-disallow-privilege - autogke-no-host-port - autogke-no-write-mode-hostpath - autogke-node-affinity-selector-limitation - autogke-pod-affinity-limitation - autopilot-admission-webhook-config-limitation - autopilot-capacity-request-limitation - autopilot-external-ip-limitation - autopilot-no-ephemeral-containers - autopilot-persistent-volume-limitation - autopilot-volume-type-limitation Backup Volumes: Velero-Native Snapshots: <none included> CSI Snapshots: <none included> Pod Volume Backups - kopia: Completed: gmp-system/alertmanager-0: alertmanager-config, alertmanager-data gmp-system/collector-cdsm4: config-out, storage gmp-system/collector-fslhc: config-out, storage gmp-system/collector-p6f85: config-out, storage gmp-system/collector-q4djj: config-out, storage gmp-system/rule-evaluator-7874c6f478-672vs: config-out wallaroo/hub-65c45d4c7-nb9lp: pvc wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp wallaroo/minio-cf97d78cb-pv82x: export wallaroo/nats-0: nats-js, pid wallaroo/plateau-7dfbd89655-9xz6v: plateau-storage wallaroo/postgres-74d6948c48-mjmb5: postgres-storage wallaroo/prometheus-deployment-666d968bfd-cxp46: alert-config-volume, metrics-storage-volume wallaroo/wallsvc-0: socket-volume, spire-data HooksAttempted: 1 HooksFailed: 0