Production Features

Model inference and results management

Observability with Inference Performance Metrics

How to use Wallaroo’s metrics to track inference performance and resource requirements.

Autoscale triggers reduces latency for LLM inference requests by adding additional resources and scaling them down based on scale up and scale down settings.
For access to these sample models and a demonstration on using LLMs with Wallaroo:

Contact your Wallaroo Support Representative OR
Schedule Your Wallaroo.AI Demo Today

Inference Request Security

How to authenticate to Wallaroo Ops for inference requests.

Production Features

Inference Logging

Observability with Inference Performance Metrics

Autoscaling

Inference Request Security