Manage JupyterHub Storage

How to manage JupyterHub Storage in Wallaroo

Targeted Role
Dev Ops

Organizations can manage their JupyterHub service storage space through the Wallaroo Administrative Console, and administrative access to the Kubernetes cluster hosting the Wallaroo instance through the command line tool kubectl.

This process increases the available memory and storage space available for each Wallaroo user that accesses the JupyterHub service included with Wallaroo. This helps when working with very large models that exceed 4 GB or more.

Prerequisites

  • A Wallaroo instance installed via kots. See Wallaroo Installation Guides for more details.
  • Administrative access to the Kubernetes cluster through the kubectl command with the kots plugin.

Increase Memory For All Users

To increase the amount of memory available for each user in the Wallaroo JupyterHub Service:

Disable JupyterHub Service

  1. Through kubectl, launch the Wallaroo Administrative Dashboard with the following:

    kubectl kots admin-console --namespace {Wallaroo Installed Namespace}
    

    Replace {Wallaroo Installed Namespace} with the Kubernetes namespace the Wallaroo instance is installed in. By default, this is wallaroo. For example:

    kubectl kots admin-console --namespace wallaroo
    

    By default this launches the Wallaroo Administrative Dashboard at http://localhost:8080

    • Press Ctrl+C to exit
    • Go to http://localhost:8800 to access the Admin Console
    

    Launch a browser and access the Wallaroo Administrative Dashboard at the URL shown.

  2. From the top navigation panel, select Config and scroll to Data Science Workspaces. Set Choose Environment to None.

    Set Data Science to None
  3. Scroll to the bottom of the Config page and select Save Config.

  4. Once the configuration is saved, select Go to updated version. The new configuration is at the top; select Deploy.

Delete Existing Labs

With the JupyterHub service disabled, the next step is to remove any existing labs so they can be recreated with the new memory specifications later. This is done through the kubectl tool.

  1. List all current labs with kubectl -n {Wallaroo Installed Namespace} get pods | grep jup. For example, if Wallaroo is installed in the default namespace wallaroo:

    kubectl -n wallaroo get pods | grep jup
    jupyter-ankush-2egarg-40wallaroo-2eai    1/1     Running     0               3d20h
    jupyter-john-2ehummel-40wallaroo-2eai    1/1     Running     0               4d20h
    
  2. For each lab, delete it with kubectl -n {Wallaroo Installed Namespace} delete pod/{Pod Name} For example, if Wallaroo is installed in the default namespace wallaroo:

    kubectl -n wallaroo delete pod/jupyter-ankush-2egarg-40wallaroo-2eai
    
  3. When all pods are deleted, proceed to the next step.

Update Lab Memory Storage

With the Labs deleted, update the lab memory storage. Reopen the Wallaroo Administrative Dashboard and complete the following steps.

  1. From the top navigation panel, select Config and scroll to Data Science Workspaces. Set Choose Environment to Workgroup Jupyter Hub.

  2. Set any of the following:

    1. Each Lab - Memory Limit in GB: Sets the amount of memory available to each lab. Typically this is the only one that needs updating.
    2. Each Lab - Memory guarantee in GB: Sets the minimum amount of memory allocated to each lab to guarantee the memory is allocated whether it is needed or not.
  3. Scroll to the bottom of the Config page and select Save Config.

    Set Data Science Enabled
  4. Once the configuration is saved, select Go to updated version. The new configuration is at the top; select Deploy.

Once the deployment is complete, the memory limit for each lab is increased.

Increase Storage Space Per Lab

To increase the storage capacity of a specific lab, update the PersistentVolumeClaim (PVC) for the lab through the following steps. This requires administrative access to the Kubernetes cluster hosting the Wallaroo instance.

  1. Verify AllowVolumeExpansion is enabled with:

    kubectl describe sc wallaroo-standard
    

    This returns a result like the following.

    Name:                  wallaroo-standard
    IsDefaultClass:        No
    Annotations:           kots.io/app-slug=wallaroo,meta.helm.sh/release-name=wallaroo,meta.helm.sh/release-namespace=wallaroo
    Provisioner:           pd.csi.storage.gke.io
    Parameters:            type=pd-balanced
    AllowVolumeExpansion:  True
    MountOptions:          <none>
    ReclaimPolicy:         Delete
    VolumeBindingMode:     WaitForFirstConsumer
    Events:                <none>
    
    1. If it is not enabled, enable it with the following. NOTE: Not all cloud providers allow for volume expansion. Check with your cloud provider to verify. The following command uses a vi editor.

      kubectl edit sc wallaroo-standard
      

      Enter i to enter inset mode.

      Add at the top under any commend lines:

      allowVolumeExpansion: true
      

      It will resemble the following:

      # Please edit the object below. Lines beginning with a '#' will be ignored,
      # and an empty file will abort the edit. If an error occurs while saving this file will be
      # reopened with the relevant failures.
      #
      allowVolumeExpansion: true
      apiVersion: storage.k8s.io/v1
      kind: StorageClass
      

      Press Esc to exit insert mode.

      Enter : to access the menu and enter wq for Write and Quit.

  2. Find the PVC for the specific lab with the command kubectl get pvc --namespace {Wallaroo Installed Namespace} | grep claim. For example, if Wallaroo is installed in the default namespace wallaroo:

    kubectl get pvc --namespace wallaroo | grep claim
    
    claim-ankush-2egarg-40wallaroo-2eai       Bound    pvc-bf62479c-3b19-46e3-aaed-33d6f644e394   47Gi       RWO            standard-rwo        3d23h
    claim-john-2ehummel-40wallaroo-2eai       Bound    pvc-41238b6c-6941-4941-91be-0f7eaba71ea2   47Gi       RWO            standard-rwo        4d23h
    prometheus-alert-config-volume-pv-claim   Bound    pvc-9e80ac4c-20e1-4ba6-aa66-df5d2b13fe73   1Gi        RWO            standard-rwo        4d23h
    

    The PVCs that contain the users names are the labs. Not that -2e correspond to the . character, while -2e corresponds to the 2 character. So claim-john-2ehummel-40wallaroo-2eai is claim-john.hummel@wallaroo.ai.

  3. Edit the PVC with the command kubectl edit pvc --namespace {Wallaroo Installed Namespace} {PVC Name}, replacing {Wallaroo Installed Namespace} with the Kubernetes the Wallaroo instance was installed to, and replacing {PVC Name} with the PVC to edit. For example, if Wallaroo is installed in the default namespace wallaroo:

    kubectl edit pvc --namespace wallaroo claim-john-2ehummel-40wallaroo-2eai
    
  4. Press i to enter Insert mode. Update spec:/resources:/requests:/storage: with the updated size. For example, the default setting is:

    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 50G
      storageClassName: standard-rwo
      volumeMode: Filesystem
    

    To expand to 100 Gi of storage, update storage accordingly:

    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 100G
      storageClassName: standard-rwo
      volumeMode: Filesystem
    
  5. Exit the Insert mode with the Esc key. Enter : to enter the menu, then wq to Write and Quit. The PVC will update.

  6. Verify the update with kubectl describe pvc --namespace {Wallaroo Install Namespace}. For example, the default install namespace is wallaroo, so the command would be kubectl describe pvc --namespace wallaroo. If Status is Bound, then the process is complete.

Troubleshooting

If the PersistentVolumeClaim (PVC) status still shows Waiting, check with the PersistentVolume (PV) settings and verify that the storage settings match the PVC.