Managed LLM Inference Endpoints (MaaS) in Wallaroo


Wallaroo provides the ability to leverage LLMs deployed in external cloud through Wallaroo’s Arbitrary Python aka BYOP models. As a result, Wallaroo users are able to:

  • Perform inference requests submitted to Wallaroo against Managed Inference Endpoints: LLMs deployed in other services such as:
    • Google Vertex
    • OpenAI
    • Azure ML Studio
  • Monitor LLM inference endpoints:

This allows organizations to use Wallaroo as a centralized location for inference requests, edge and multi-cloud deployments, real time and scheduled monitoring.

The following provides examples how to:

  • Deploy BYOP models with Managed Inference Endpoints.
  • Inference through the Wallaroo deployed BYOP models.
  • Monitor the inference results through in-line or offline Wallaroo LLM listeners.

For access to these sample models and a demonstration on using LLMs with Wallaroo: