Managed LLM Inference Endpoints (MaaS) in Wallaroo
Wallaroo provides the ability to leverage LLMs deployed in external cloud through Wallaroo’s Arbitrary Python aka BYOP models. As a result, Wallaroo users are able to:
- Perform inference requests submitted to Wallaroo against Managed Inference Endpoints: LLMs deployed in other services such as:
- Google Vertex
- OpenAI
- Azure ML Studio
- Monitor LLM inference endpoints:
- With Inference results stored in Wallaroo logs
- In real time via LLM Validation Listeners
- On demand or a set schedule with LLM Monitoring Listeners
- Using Wallaroo Assays that automatically score LLMs for drift against a known baseline.
This allows organizations to use Wallaroo as a centralized location for inference requests, edge and multi-cloud deployments, real time and scheduled monitoring.
The following provides examples how to:
- Deploy BYOP models with Managed Inference Endpoints.
- Inference through the Wallaroo deployed BYOP models.
- Monitor the inference results through in-line or offline Wallaroo LLM listeners.
For access to these sample models and a demonstration on using LLMs with Wallaroo:
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today