Wallaroo MLOps API Essentials Guide: Wallaroo Dashboard Metrics Retrieval

How to use Wallaroo MLOps Api for metrics retrieval.

Wallaroo Dashboard Metrics Retrieval Tutorial

The following tutorial demonstrates using the Wallaroo MLOps API to retrieve Wallaroo metrics data. These requests are compliant with Prometheus API endpoints.

This tutorial is split into two sections:

  • Inference Data Generation: This section creates Wallaroo pipeline and inference requests to generate the log files and other data.
  • Wallaroo Dashboard Metrics Retrieval via the Wallaroo MLOps API: Details the Wallaroo MLOps API metrics retrieval endpoints and provides a demonstration of retrieving metrics data.

Prerequisites

This tutorial assumes the following:

  • A Wallaroo Ops environment is installed.
  • The Wallaroo SDK is installed. These examples use the Wallaroo SDK to generate the initial inferences information for the metrics requests.

Wallaroo Dashboard Metrics Retrieval via the Wallaroo MLOps API

The Wallaroo MLOps API allows for metrics retrieval. These are used to track:

  • Inference result performance.
  • Deployed replicas.
  • Inference Latency.

These inference endpoints are compliant with Prometheus endpoints.

Query Metric Request Endpoints

  • Endpoints:
    • /v1/api/metrics/query (GET)
    • /v1/api/metrics/query (POST)

For full details, see the Wallaroo MLOps API Reference Guide

Query Metric Request Parameters

ParameterTypeDescription
queryStringThe Prometheus expression query string.
timeStringThe evaluation timestamp in either RFC3339 format or Unix timestamp.
timeoutStringThe evaluation timeout in duration format (5m for 5 minutes, etc).

Query Metric Request Returns

Field TypeDescription
status StringThe status of the request of either success or error.
data DictThe response data.
 data.resultTypeStringThe type of query result.
 data.resultStringDateTime of the model’s creation.
errorType StringThe error type if status is error.
errorType StringThe error messages if status is error.
warnings Array[String]An array of error messages.

Query Range Metric Endpoints

  • Endpoints
    • /v1/api/metrics/query_range (GET)
    • /v1/api/metrics/query_range (POST)

Returns a list of models added to a specific workspace.

Query Range Metric Request Parameters

ParameterTypeDescription
queryStringThe Prometheus expression query string.
startStringThe starting timestamp in either RFC3339 format or Unix timestamp, inclusive.
endStringThe ending timestamp in either RFC3339 format or Unix timestamp.
stepStringQuery resolution step width in either duration format or as a float number of seconds.
timeoutStringThe evaluation timeout in duration format (5m for 5 minutes, etc).

Query Range Metric Request Returns

Field TypeDescription
status StringThe status of the request of either success or error.
data DictThe response data.
 resultTypeStringThe type of query result. For query range, always matrix.
 resultStringDateTime of the model’s creation.
errorType StringThe error type if status is error.
errorType StringThe error messages if status is error.
warnings Array[String]An array of error messages.

Inference Data Generation

This part of the tutorial generates the inference results used for the rest of the tutorial.

Import libraries

The first step is to import the libraries required.

import json
import numpy as np
import pandas as pd

import pytz
import datetime

import requests
from requests.auth import HTTPBasicAuth

import wallaroo

Connect to the Wallaroo Instance

A connection to Wallaroo is established via the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.

wl = wallaroo.Client()

Create Workspace

Next create the Wallaroo workspace and set it as the default workspace for this session - from this point on, model uploads and other commands will default to this workspace.

The workspace id is stored for further use.

workspace = wl.get_workspace(name="metric-retrieval-tutorial", create_if_not_exist=True)
wl.set_current_workspace(workspace)
{'name': 'metric-retrieval-tutorial', 'id': 1713, 'archived': False, 'created_by': '7d603858-88e0-472e-8f71-e41094afd7ec', 'created_at': '2025-08-05T18:41:42.646046+00:00', 'models': [], 'pipelines': []}

Upload Model

For this example, the model ccfraud.onnx is used. This is a credit card fraud model is trained to detect credit card fraud based on a 0 to 1 model: The closer to 0 the less likely the transactions indicate fraud, while the closer to 1 the more likely the transactions indicate fraud.

This model is included in the Wallaroo Native Runtimes, so requires no additional settings at model upload. For more details on supported models, see Wallaroo Supported Models

model_name = "ccfraud-model"
model_file_name = "./models/ccfraud.onnx"
ccfraud_model = (wl.upload_model(name=model_name, 
                                 path=model_file_name, 
                                 framework=wallaroo.framework.Framework.ONNX)
                                 .configure(tensor_fields=["tensor"])
                )

Deploy Model

Models are deployed through the following process:

  • Create a Wallaroo pipeline
  • Add the model as a pipeline step
  • Define a deployment configuration. This defines what resources are allocated for the pipeline’s exclusive use.

For more details of this process, see ML Operations: Inference

# create the pipeline
pipeline_name = "metrics-retrieval-tutorial-pipeline"
pipeline = wl.build_pipeline(pipeline_name)

# add the model as a pipeline step

pipeline.add_model_step(ccfraud_model)

# set the deployment configuration for 0.5 cpu, 1 replica, 1 Gi RAM
deploy_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(0.5).memory("1Gi").build()

# deploy the pipeline
pipeline.deploy(deployment_config=deploy_config, wait_for_status=False)
# saved for later steps
deploy = pipeline._deployment
# wait until deployment is complete before continuing
import time
time.sleep(15)

while pipeline.status()['status'] != 'Running':
    time.sleep(15)
    print("Waiting for deployment.")
    pipeline.status()['status']
pipeline.status()['status']
Waiting for deployment.
'Running'

Sample Inferences

The following sample inferences are used to generate inference logs records. Metric retrieval works best with a longer history of inference results; feel free to rerun this section as needed to create additional records for further testing.

The following will run for one minute.

import time
timeout = time.time() + 60   # 1 minutes from now
while True:
    if time.time() > timeout:
        break
    pipeline.infer_from_file("./data/cc_data_10k.arrow")

Retrieve Pipeline Logs

The following retrieves the inference log results for the pipeline.

pipeline.logs()
Warning: There are more logs available. Please set a larger limit or request a file using export_logs.
timein.tensorout.dense_1anomaly.count
02025-08-05 18:44:22.769[-0.12405868, 0.73698884, 1.0311689, 0.5991753...[0.0010648072]0
12025-08-05 18:44:22.769[-2.1694233, -3.1647356, 1.2038506, -0.2649221...[0.00024175644]0
22025-08-05 18:44:22.769[-0.24798988, 0.40499672, 0.49408177, -0.37252...[0.00150159]0
32025-08-05 18:44:22.769[-0.2260837, 0.12802614, -0.8732004, -2.089788...[0.00037947297]0
42025-08-05 18:44:22.769[-0.90164274, -0.50116056, 1.2045985, 0.407885...[0.0001988411]0
......
952025-08-05 18:44:22.769[-0.1093998, -0.031678658, 0.9885652, -0.68602...[0.00020942092]0
962025-08-05 18:44:22.769[0.44973943, -0.35288164, 0.5224735, 0.910402,...[0.00031492114]0
972025-08-05 18:44:22.769[0.82174337, -0.50793207, -1.358988, 0.3713617...[0.00081187487]0
982025-08-05 18:44:22.769[1.0252348, 0.37717652, -1.4182774, 0.7057443,...[0.001860708]0
992025-08-05 18:44:22.769[-0.36498702, 0.11005125, 0.7734325, 1.0163404...[0.00064843893]0

100 rows × 4 columns

Wallaroo Dashboard Metrics Retrieval via the Wallaroo MLOps API

The following queries are supported through the Metrics endpoints. The following references are used here:

  • pipelineID: The pipeline’s string identifier, retrieved from the Wallaroo SDK with wallaroo.pipeline.Pipeline.name(). For example:

    pipeline.name()
    
    sample-pipeline-name
    
  • deployment_id: The Kubernetes namespace for the deployment.

Supported Queries

Note that each of these queries use the /v1/metrics/api/v1/query_range endpoint. Note as per the section

English NameDescriptionParameterized QueryExample Query
Requests per secondNumber of processed requests per second to a pipeline.sum by (pipeline_name) (rate(latency_histogram_ns_count{pipeline_name="{pipelineID}"}[{step}s]))sum by (deploy_id) (rate(latency_histogram_ns_count{deploy_id="deployment_id"}[10s]))
Cluster inference rateNumber of inferences processed per second. This notably differs from requests per second when batch inference requests are made.sum by (pipeline_name) (rate(tensor_throughput_batch_count{pipeline_name="{pipelineID}"}[{step}s]))sum by (deploy_id) (rate(tensor_throughput_batch_count{deploy_id="deployment_id"}[10s]))
P50 inference latencyHistogram for P90 total inference time spent per message in an engine, includes transport to and from the sidekick in the case there is one.histogram_quantile(0.50, sum(rate(latency_histogram_ns_bucket{{deploy_id="{deploy_id}"}}[{step_interval}])) by (le)) / 1e6histogram_quantile(0.50, sum(rate(latency_histogram_ns_bucket{deploy_id="deployment_id"}[10s])) by (le)) / 1e6
P95 inference latencyHistogram for P95 total inference time spent per message in an engine, includes transport to and from the sidekick in the case there is one.histogram_quantile(0.95, sum(rate(latency_histogram_ns_bucket{{deploy_id="{deploy_id}"}}[{step_interval}])) by (le)) / 1e6histogram_quantile(0.95, sum(rate(latency_histogram_ns_bucket{deploy_id="deployment_id"}[10s])) by (le)) / 1e6
P99 inference latencyHistogram for P99 total inference time spent per message in an engine, includes transport to and from the sidekick in the case there is one.histogram_quantile(0.99, sum(rate(latency_histogram_ns_bucket{{deploy_id="{deploy_id}"}}[{step_interval}])) by (le)) / 1e6histogram_quantile(0.99, sum(rate(latency_histogram_ns_bucket{deploy_id="deployment_id"}[10s])) by (le)) / 1e6
Engine replica countNumber of engine replicas currently running in a pipelinecount(container_memory_usage_bytes{namespace="{pipeline_namespace}", container="engine"}) or vector(0)count(container_memory_usage_bytes{namespace="deployment_id", container="engine"}) or vector(0)
Sidekick replica countNumber of sidekick replicas currently running in a pipelinecount(container_memory_usage_bytes{namespace="{pipeline_namespace}", container=~"engine-sidekick-.*"}) or vector(0)count(container_memory_usage_bytes{namespace="deployment_id", container=~"engine-sidekick-.*"}) or vector(0)
Output tokens per second (TPS)LLM output tokens per second: this is the number of tokens generated per second for a LLM deployed in Wallaroo with vLLMsum by (kubernetes_namespace) (rate(vllm:generation_tokens_total{kubernetes_namespace="{pipeline_namespace}"}[{step_interval}]))sum by (kubernetes_namespace) (rate(vllm:generation_tokens_total{kubernetes_namespace="deployment_id"}[10s]))
P99 Time to first token (TTFT)P99 time to first token: P99 for time to generate the first token for LLMs deployed in Wallaroo with vLLMhistogram_quantile(0.99, sum(rate(vllm:time_to_first_token_seconds_bucket{kubernetes_namespace="{pipeline_namespace}"}[{step_interval}])) by (le)) * 1000histogram_quantile(0.99, sum(rate(vllm:time_to_first_token_seconds_bucket{kubernetes_namespace="deployment_id"}[10s])) by (le)) * 1000
P95 Time to first token (TTFT)P95 time to first token: P95 for time to generate the first token for LLMs deployed in Wallaroo with vLLMhistogram_quantile(0.95, sum(rate(vllm:time_to_first_token_seconds_bucket{kubernetes_namespace="{pipeline_namespace}"}[{step_interval}])) by (le)) * 1000histogram_quantile(0.95, sum(rate(vllm:time_to_first_token_seconds_bucket{kubernetes_namespace="deployment_id"}[10s])) by (le)) * 1000
P50 Time to first token (TTFT)P50 time to first token: P50 for time to generate the first token for LLMs deployed in Wallaroo with vLLMhistogram_quantile(0.50, sum(rate(vllm:time_to_first_token_seconds_bucket{kubernetes_namespace="{pipeline_namespace}"}[{step_interval}])) by (le)) * 1000histogram_quantile(0.50, sum(rate(vllm:time_to_first_token_seconds_bucket{kubernetes_namespace="deployment_id"}[10s])) by (le)) * 1000

TTFT Query Example

The following example uses the P99 Time to first token (TTFT) query for an LLM deployed with OpenAI Compatibility in Wallaroo


# this will also format the timezone in the parsing section
timezone = "US/Central"

selected_timezone = pytz.timezone(timezone)

# Define the start and end times of 10:00 to 10:15
data_start = selected_timezone.localize(datetime.datetime(2025, 7, 14, 10, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 7, 14, 10, 15, 00))

# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query_range"

import time
# Retrieve the token 
headers = wl.auth.auth_header()

# Convert to UTC and get the Unix timestamps
start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())    

pipeline_name = "llama-3-1-8b-pipeline" # the name of the pipeline
deploy_id = 210 # the deployment id
step = "5m" # the step of the calculation

query_ttft = f'histogram_quantile(0.99, sum(rate(vllm:time_to_first_token_seconds_bucket{{kubernetes_namespace="{pipeline_name}-{deploy_id}"}}[{step}])) by (le)) * 1000'
print(query_ttft)

#request parameters
params_ttft = {
    'query': query_ttft,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response_rps = requests.get(query_url, headers=headers, params=params_ttft)

if response_rps.status_code == 200:
    #print("Requests Per Second Data:")
    result = response_rps.json()
    print(result)
else:
    print("Failed to fetch TTFT data:", response_rps.status_code, response_rps.text)
histogram_quantile(0.99, sum(rate(vllm:time_to_first_token_seconds_bucket{kubernetes_namespace="llama-3-1-8b-pipeline-210"}[5m])) by (le)) * 1000
{'status': 'success', 'data': {'resultType': 'matrix', 'result': [{'metric': {}, 'values': [[1752505500, '48.45656000000012'], [1752505800, '39.800000000000004'], [1752506100, 'NaN']]}]}}
# set prometheus requirements
pipeline_id = pipeline_name # the name of the pipeline
step = "1m" # the step of the calculation

# this will also format the timezone in the parsing section
timezone = "US/Central"

selected_timezone = pytz.timezone(timezone)

# Define the start and end times
data_start = selected_timezone.localize(datetime.datetime(2025, 8, 4, 9, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 8, 6, 9, 59, 59))

# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query_range"
# Retrieve the token 
headers = wl.auth.auth_header()

# Convert to UTC and get the Unix timestamps
start_timestamp = int(data_start.astimezone(pytz.UTC).timestamp())
end_timestamp = int(data_end.astimezone(pytz.UTC).timestamp())    

query_rps = f'sum by (pipeline_name) (rate(latency_histogram_ns_count{{pipeline_name="{pipeline_id}"}}[{step}]))'
#request parameters
params_rps = {
    'query': query_rps,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response_rps = requests.get(query_url, headers=headers, params=params_rps)

if response_rps.status_code == 200:
    print("Requests Per Second Data:")
    display(response_rps.json())
else:
    print("Failed to fetch RPS data:", response_rps.status_code, response_rps.text)
Requests Per Second Data:
{'status': 'success', 'data': {'resultType': 'matrix', 'result': []}}

The following shows the query inference rate.

query_inference_rate = f'sum by (pipeline_name) (rate(tensor_throughput_batch_count{{pipeline_name="{pipeline_id}"}}[{step}]))'

# inference rte
params_inference_rate = {
    'query': query_inference_rate,
    'start': start_timestamp,
    'end': end_timestamp,
    'step': step
}

response_inference_rate = requests.get(query_url, headers=headers, params=params_inference_rate)

if response_inference_rate.status_code == 200:
    print("Cluster Inference Rate Data:")
    display(response_inference_rate.json())
else:
    print("Failed to fetch Inference Rate data:", response_inference_rate.status_code, response_inference_rate.text)
Cluster Inference Rate Data:
{'status': 'success',
 'data': {'resultType': 'matrix',
  'result': [{'metric': {'pipeline_name': 'metrics-retrieval-tutorial-pipeline'},
    'values': [[1754419440, '6274.9353'],
     [1754419500, '4474.472727272727'],
     ...]}]}}

Wallaroo Dashboard Metrics Retrieval Tutorials

The following tutorials demonstrate creating metrics data and retrieving it using the Wallaroo MLOps API.

Wallaroo Admin Dashboard Metrics Retrieval via the Wallaroo MLOps API

The following queries are available for resource consumption and available through the Admin Dashboard. Note where each request either uses the query endpoint or the query_range endpoint. For examples of these queries, see the Wallaroo Repository.

Supported Queries

Note that each of these queries use the /v1/metrics/api/v1/query endpoint.

Query NameDescriptionExample Query
Total CPU RequestedNumber of CPUs requested in the Wallaroo clustersum(wallaroo_kube_pod_resource_requests{resource="cpu"})
Total CPU allocatedTotal number of available CPUs in the Wallaroo clustersum(kube_node_status_capacity{resource="cpu"})
Total GPU RequestedNumber of GPUs requested in the Wallaroo clustersum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})
Total GPU AllocatedTotal number of available GPUs in the Wallaroo clustersum(kube_node_status_capacity{resource=~"nvidia_com_gpu|qualcomm_com_qaic"})
Total Memory RequestedAmount of memory requested in the Wallaroo cluster.sum(wallaroo_kube_pod_resource_requests{resource="memory"})
Total Memory AllocatedTotal amount of memory available in the Wallaroo cluster.sum(kube_node_status_capacity{resource="memory"})
Total Inference Log Storage usedAmount of inference log storage used.kubelet_volume_stats_used_bytes{persistentvolumeclaim="plateau-managed-disk"}
Total Inference Log Storage allocatedTotal amount of inference log storage available.kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="plateau-managed-disk"}
Total Artifact Storage usedAmount of model and orchestration artifact storage used.kubelet_volume_stats_capacity_bytes{persistentvolumeclaim="minio"}
Total Artifact Storage allocatedTotal amount of model and orchestration artifact storage available.kubelet_volume_stats_used_bytes{persistentvolumeclaim="minio"}
Average GPU usage over timeAverage GPU usage over the defined time range in the Wallaroo cluster.avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[{duration}] {offset})
Average GPU requested over timeAverage number of GPU requested over the defined time range in the Wallaroo clusteravg_over_time(sum(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[{duration}] {offset})
Average CPU usage over timeAverage CPU usage over the defined time range in the Wallaroo cluster.avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource=”cpu”})[{duration}] {offset})
Average CPU requested over timeAverage CPU requests over the defined time range in the Wallaroo clusteravg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="cpu"})[{duration}] {offset})
Average Memory usage over timeAverage memory usage over the defined time range in the Wallaroo cluster.avg_over_time(sum(wallaroo_kube_pod_resource_usage{resource="memory"})[{duration}] {offset})
Average Memory requests over timeAverage memory requests over the defined time range in the Wallaroo cluster.avg_over_time(sum(wallaroo_kube_pod_resource_requests{resource="memory"})[{duration}] {offset})
Average pipelines CPU usage over timeAverage CPU usage over the defined time range for an individual Wallaroo pipeline.avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_usage{resource="cpu"})[{duration}] {offset})
Average pipelines CPU requested over timeAverage number of CPUs requested over the defined time range for an individual Wallaroo pipeline.avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource="cpu"})[{duration}] {offset})
Average pipelines GPU usage over timeAverage GPU usage over the defined time range for an individual Wallaroo pipeline.avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[{duration}] {offset})
Average pipelines GPU requested over timeAverage number of GPUs requested over the defined time range for an individual Wallaroo pipeline.avg_over_time(sum by(namespace)(wallaroo_kube_pod_resource_requests{resource=~"nvidia.com/gpu|qualcomm.com/qaic"})[{duration}] {offset})
Average pipelines Mem usage over timeAverage memory usage over the defined time range for an individual Wallaroo pipeline.avg_over_time(sum by(namespace) (wallaroo_kube_pod_resource_usage{resource="memory"})[{duration}] {offset})
Average pipelines Mem requested over timeAverage amount of memory requested over the defined time range for an individual Wallaroo pipeline.avg_over_time(sum by (namespace)(wallaroo_kube_pod_resource_requests{resource="memory"})[{duration}] {offset})
Pipeline inference log storageInference log storage used at the end of the defined time range for an individual Wallaroo pipelinesum by(topic) (topic_bytes@{timestamp})

Metrics for a Specified Time Range in the Past Format

For queries that retrieve metric data between a range of dates in the past, the following example demonstrates how to use start date, end date, and the offset period.

This example uses three variables parameterized and inserted using the Python variable string replacement method:

  • date_start: The date starting the metric analysis period.
  • date_end: The end date of the metric analysis period.
  • current_time: The current time.

These values are then converted into the following:

  • duration: The amount of time in seconds between date_start and date_end.
  • offset: The amount of time in seconds between date_end and current_time.

For example, if the period to measure is between 12/1/2025 12 AM to 12/3/2025 12 AM, and the current time is December 15, 2025:

  • Duration is the period from December 1 12:00 AM to December 3 12:00 Am (3 days aka 72 hours aka 259,200 seconds)
  • Offset is the period from December 15, 2025 to December 3, 2025 (12 days).

The following example show retrieving the average CPU usage over a period of time for the dates 12/1/2025 to 12/3/2025.

# this is the URL to get this metric
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"
# Retrieve the token 
headers = wl.auth.auth_header()

data_start = selected_timezone.localize(datetime.datetime(2025, 12, 1, 13, 0, 0))
data_end = selected_timezone.localize(datetime.datetime(2025, 12, 3, 15, 59, 59))
current_time = datetime.datetime.now(selected_timezone)

duration = int((data_end-data_start).total_seconds())
offset = int((current_time-data_end).total_seconds())

query_avg_cpu_usage = f'avg_over_time(sum(wallaroo_kube_pod_resource_usage{{resource="cpu"}})[{duration}s:] offset {offset}s)'

#request parameters
params_avg_cpu_usage = {
    'query': query_avg_cpu_usage
}

response_avg_cpu_usage = requests.get(query_url, headers=headers, params=params_avg_cpu_usage)

if response_avg_cpu_usage.status_code == 200:
    print("Average CPU usage over time:")
    display(response_avg_cpu_usage.json())
else:
    print("Failed to fetch Avg CPU usage data:", response_avg_cpu_usage.status_code, response_avg_cpu_usage.text)

Total CPU Requested Example

The following example demonstrates using the Metric Query with the following attributes:

  • Total CPU Requested
  • query
  • sum(wallaroo_kube_pod_resource_requests{resource="cpu"})
  • Number of CPUs requested in the Wallaroo cluster
# this is the URL to get prometheus metrics
query_url = f"{wl.api_endpoint}/v1/metrics/api/v1/query"

# Retrieve the token 
headers = wl.auth.auth_header()

query = 'sum(wallaroo_kube_pod_resource_requests{resource="cpu"})'

#request parameters
params_rps = {
    'query': query,
}

response = requests.get(query_url, headers=headers, params=params_rps)

if response.status_code == 200:
    print("Query Response:")
    display(response.json())
else:
    print("Failed to fetch query response:", response.status_code, response.text)

Results:

Query Response:

{'status': 'success',
 'data': {'resultType': 'vector',
  'result': [{'metric': {}, 'value': [1764100020.302, '9.306000000000001']}]}}

Tutorials