Production Features

Model inference and results management


Inference Logging

How to retrieve inference logs.

Observability with Inference Performance Metrics

How to use Wallaroo’s metrics to track inference performance and resource requirements.

Autoscaling

Autoscale triggers reduces latency for LLM inference requests by adding additional resources and scaling them down based on scale up and scale down settings.
For access to these sample models and a demonstration on using LLMs with Wallaroo:

Inference Request Security

How to authenticate to Wallaroo Ops for inference requests.