Upgrade Prerequisites


Table of Contents

Pre-Upgrade Checklist

Before starting an upgrade of Wallaroo, the following steps should be performed to provide a smooth transition from the previous version of Wallaroo to the new one.

  • Create Support Bundle
  • Notify Users of Downtime
  • Backup Wallaroo

Create Support Bundle

A support bundle creates a collection of logs, configurations, and other information for Wallaroo support staff. This should be generated before the upgrade procedure starts to preserve a set of current settings and information useful to track any potential issues during the upgrade process.

Support bundles are generated from one of the two methods.

At any time, the administration console can create troubleshooting bundles for Wallaroo technical support to assess product health and help with problems. Support bundles contain logs and configuration files which can be examined before downloading and transmitting to Wallaroo. The console also has a configurable redaction mechanism in cases where sensitive information such as passwords, tokens, or PII (Personally Identifiable Information) need to be removed from logs in the bundle.

Status Ready

Create Support Bundles via the Wallaroo Administrator Dashboard

This process is for kots based installations of Wallaroo.

This assumes that kubectl and kots have been installed in a terminal with administrative access to the Kubernetes cluster hosting the Wallaroo installation.

  1. Launch the Kots Administrative Dashboard with kubectl kots admin-console --namespace $WALLAROO_NAMESPACE, replacing $WALLAROO_NAMESPACE with the namespace the Wallaroo instance is installed in. For example: kubectl kots admin-console --namespace wallaroo.
  2. Log into the administration console with the Administrative Dashboard password set during the installation process.
  3. Select the Troubleshoot tab.
  4. Select Analyze Wallaroo.
  5. Select Download bundle to save the bundle file as a compressed archive. Depending on your browser settings the file download location can be specified.
  6. Send the file to Wallaroo technical support.

At any time, any existing bundle can be examined and downloaded from the Troubleshoot tab.

Create Support Bundles via the Command Line

To generate a support bundle via the command line for either kots or helm based installations of Wallaroo, the following applications are used.

  • kubectl
  • kubectl plugins:
    • krew: Install Krew
    • krew support-bundle: Install with kubectl krew install support-bundle.

This creates a collection of log files, configuration files and other details into a .tar.gz file in the same directory as the command is run from in the format support-bundle-YYYY-MM-DDTHH-MM-SS.tar.gz. This file is submitted to the Wallaroo support team for review.

kubectl support-bundle --load-cluster-specs  --interactive=false

Notify Users of Downtime

The following is a short list of users to notify before the Wallaroo Ops downtime. ALL users that interact with Wallaroo Ops should be informed; the following list is provided to help DevOps engineers to know what stakeholders to notify.

StakeholderDescriptionWhat Users will Experience
Wallaroo Dashboard UsersUsers that interact via the Wallaroo Dashboard.If users are active in the dashboard:

If a user attempts to go to the dashboard in their browser:
API usersThese users interact with Wallaroo via the Wallaroo MLOps API or related API services.MLOps API and other API services to the Wallaroo Ops instance will not be available during the upgrade and any requests will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.”
External SDK usersUsers who perform actions via the Wallaroo SDK that do not use the Wallaroo JupyterHub service.Wallaroo SDK connections will not be available during the upgrade and any attempts to use the SDK with return an error.
Wallaroo JupyterHub usersUsers who use the Wallaroo JupyterHub Service to run JupyterNotebooks with the Wallaroo Ops instance.If users are in JupyterHub at the time of the upgrade:

If users attempt to go to JupyterHub during the upgrade:
Deployed pipelineUsers who perform inferences through deployed pipelines.During the upgrade process, deployed pipelines are undeployed for the upgrade process. Once the upgrade process is complete, any previously deployed pipelines are automatically redeployed. Any inference requests to deployed pipelines will return a 503 (service unavailable) with the message “Wallaroo upgrade in progress.”
Edge and Multi-cloud Deployment UsersUsers and services that perform inference requests and other services via models deployed to multicloud and edge locations.Edge and multicloud deployments of Wallaroo are not interrupted. While an upgrade is in progress, no logs can be received by the Wallaroo Ops instance. Once connection is restored, edge locations upload their inference logs.
Wallaroo Orchestration UsersScheduled Wallaroo orchestrations.Wallaroo orchestrations scheduled tasks will be interrupted during the upgrade process. Scheduled task runs will be missed while the upgrade is in progress and will run at their next scheduled time after the upgrade completes.

Backup Wallaroo

Before starting the upgrade procedure, backup the Wallaroo Ops instance. The following procedure summary is based on the provided Wallaroo Backup and Restore Guides.

Wallaroo Backup Procedure

  1. Before starting the backup, force the Plateau service to complete writing logs so they can be captured by the backup. This assumes that Wallaroo was installed in the namespace wallaroo.

    kubectl -n wallaroo scale --replicas=0 deploy/plateau
    kubectl -n wallaroo scale --replicas=1 deploy/plateau
    
  2. Set the $BACKUP_NAME. This must be all lowercase characters or numbers or -/. and must end in alphanumeric characters.

    BACKUP_NAME={give it your own name}
    
  3. Issue the following backup command. The --exclude-namespaces is used to exclude namespaces that are not required for the Wallaroo backup and restore. By default, these are the namespaces velero, default, kube-node-lease, kube-public, and kube-system.

    This process will back up all namespaces that are not excluded, including deployed Wallaroo pipelines. Add any other namespaces that should not be part of the backup to the --exclude-namespaces option as per your organization’s requirements.

    velero backup create $BACKUP_NAME --default-volumes-to-fs-backup --include-cluster-resources=true --exclude-namespaces velero,default,kube-node-lease,kube-public,kube-system
    
  4. To view the status of the backup, velero backup describe --details $BACKUP_NAME. Once the Completed field shows a date and time, the backup is complete.

    The following shows an InProgress backup:

    velero backup describe --details $BACKUP_NAME
    
    Name:         sample-doctest-backup-20240502
    Namespace:    velero
    Labels:       velero.io/storage-location=default
    Annotations:  velero.io/resource-timeout=10m0s
                  velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000
                  velero.io/source-cluster-k8s-major-version=1
                  velero.io/source-cluster-k8s-minor-version=28
    
    Phase:  InProgress
    
    
    Namespaces:
      Included:  *
      Excluded:  velero, default, kube-node-lease, kube-public, kube-system
    
    Resources:
      Included:        *
      Excluded:        <none>
      Cluster-scoped:  included
    
    Label selector:  <none>
    
    Or label selector:  <none>
    
    Storage Location:  default
    
    Velero-Native Snapshot PVs:  auto
    Snapshot Move Data:          false
    Data Mover:                  velero
    
    TTL:  720h0m0s
    
    CSISnapshotTimeout:    10m0s
    ItemOperationTimeout:  4h0m0s
    
    Hooks:  <none>
    
    Backup Format Version:  1.1.0
    
    Started:    2024-05-14 16:26:43 -0600 MDT
    Completed:  <n/a>
    
    Expiration:  2024-06-13 16:26:43 -0600 MDT
    
    Estimated total items to be backed up:  1073
    Items backed up so far:                 28
    
    Resource List:  <backup resource list not found>
    
    Backup Volumes:
      Velero-Native Snapshots: <none included>
    
      CSI Snapshots: <none included or not detectable>
    
      Pod Volume Backups - kopia:
        Completed:
          gmp-system/alertmanager-0: alertmanager-config, alertmanager-data
          gmp-system/collector-cdsm4: config-out, storage
          gmp-system/collector-fslhc: config-out, storage
          gmp-system/collector-p6f85: config-out, storage
          gmp-system/collector-q4djj: config-out, storage
          gmp-system/rule-evaluator-7874c6f478-672vs: config-out
          wallaroo/hub-65c45d4c7-nb9lp: pvc
          wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp
          wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir
          wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp
        In Progress:
          wallaroo/minio-cf97d78cb-pv82x: export
    

    The following shows a Completed backup.

    velero backup describe --details $BACKUP_NAME
    
    Name:         sample-doctest-backup-20240502
    Namespace:    velero
    Labels:       velero.io/storage-location=default
    Annotations:  velero.io/resource-timeout=10m0s
                  velero.io/source-cluster-k8s-gitversion=v1.28.7-gke.1026000
                  velero.io/source-cluster-k8s-major-version=1
                  velero.io/source-cluster-k8s-minor-version=28
    
    Phase:  Completed
    
    
    Warnings:
      Velero:     <none>
      Cluster:    <none>
      Namespaces:
        wallaroo:   resource: /pods name: /kotsadm-b4f68468d-dzj5c message: /volume migrations is declared in pod wallaroo/kotsadm-b4f68468d-dzj5c but not mounted by any container, skipping
    
    Namespaces:
      Included:  *
      Excluded:  velero, default, kube-node-lease, kube-public, kube-system
    
    Resources:
      Included:        *
      Excluded:        <none>
      Cluster-scoped:  included
    
    Label selector:  <none>
    
    Or label selector:  <none>
    
    Storage Location:  default
    
    Velero-Native Snapshot PVs:  auto
    Snapshot Move Data:          false
    Data Mover:                  velero
    
    TTL:  720h0m0s
    
    CSISnapshotTimeout:    10m0s
    ItemOperationTimeout:  4h0m0s
    
    Hooks:  <none>
    
    Backup Format Version:  1.1.0
    
    Started:    2024-05-14 16:26:43 -0600 MDT
    Completed:  2024-05-14 16:32:19 -0600 MDT
    
    Expiration:  2024-06-13 16:26:43 -0600 MDT
    
    Total items to be backed up:  719
    Items backed up:              719
    
    Resource List:
      admissionregistration.k8s.io/v1/MutatingWebhookConfiguration:
        - gmp-operator.gmp-system.monitoring.googleapis.com
        - neg-annotation.config.common-webhooks.networking.gke.io
        - pod-ready.config.common-webhooks.networking.gke.io
        - warden-mutating.config.common-webhooks.networking.gke.io
    
      ...Other backed up resources
    
      warden.gke.io/v1/Audit:
        - autogke-default-linux-capabilities
        - autogke-disallow-hostnamespaces
        - autogke-disallow-privilege
        - autogke-no-host-port
        - autogke-no-write-mode-hostpath
        - autogke-node-affinity-selector-limitation
        - autogke-pod-affinity-limitation
        - autopilot-admission-webhook-config-limitation
        - autopilot-capacity-request-limitation
        - autopilot-external-ip-limitation
        - autopilot-no-ephemeral-containers
        - autopilot-persistent-volume-limitation
        - autopilot-volume-type-limitation
    
    Backup Volumes:
      Velero-Native Snapshots: <none included>
    
      CSI Snapshots: <none included>
    
      Pod Volume Backups - kopia:
        Completed:
          gmp-system/alertmanager-0: alertmanager-config, alertmanager-data
          gmp-system/collector-cdsm4: config-out, storage
          gmp-system/collector-fslhc: config-out, storage
          gmp-system/collector-p6f85: config-out, storage
          gmp-system/collector-q4djj: config-out, storage
          gmp-system/rule-evaluator-7874c6f478-672vs: config-out
          wallaroo/hub-65c45d4c7-nb9lp: pvc
          wallaroo/kotsadm-b4f68468d-dzj5c: backup, tmp
          wallaroo/kotsadm-minio-0: kotsadm-minio, minio-cert-dir, minio-config-dir
          wallaroo/kotsadm-rqlite-0: kotsadm-rqlite, tmp
          wallaroo/minio-cf97d78cb-pv82x: export
          wallaroo/nats-0: nats-js, pid
          wallaroo/plateau-7dfbd89655-9xz6v: plateau-storage
          wallaroo/postgres-74d6948c48-mjmb5: postgres-storage
          wallaroo/prometheus-deployment-666d968bfd-cxp46: alert-config-volume, metrics-storage-volume
          wallaroo/wallsvc-0: socket-volume, spire-data
    
    HooksAttempted:  1
    HooksFailed:     0