Inference Logging
How to retrieve inference logs.
How to retrieve inference logs.
How to use Wallaroo’s metrics to track inference performance and resource requirements.
Autoscale triggers reduces latency for LLM inference requests by adding additional resources and scaling them down based on scale up and scale down settings.
For access to these sample models and a demonstration on using LLMs with Wallaroo:
How to authenticate to Wallaroo Ops for inference requests.