Managed LLM Inference Endpoints (MaaS) in Wallaroo

Wallaroo provides the ability to leverage LLMs deployed in external cloud through Wallaroo’s Arbitrary Python aka BYOP models. As a result, Wallaroo users are able to:

Perform inference requests submitted to Wallaroo against Managed Inference Endpoints: LLMs deployed in other services such as:
- Google Vertex
- OpenAI
- Azure ML Studio
Monitor LLM inference endpoints:
- With Inference results stored in Wallaroo logs
- In real time via LLM Validation Listeners
- On demand or a set schedule with LLM Monitoring Listeners
- Using Wallaroo Assays that automatically score LLMs for drift against a known baseline.

This allows organizations to use Wallaroo as a centralized location for inference requests, edge and multi-cloud deployments, real time and scheduled monitoring.

The following provides examples how to:

Deploy BYOP models with Managed Inference Endpoints.
Inference through the Wallaroo deployed BYOP models.
Monitor the inference results through in-line or offline Wallaroo LLM listeners.

For access to these sample models and a demonstration on using LLMs with Wallaroo:

Contact your Wallaroo Support Representative OR
Schedule Your Wallaroo.AI Demo Today