How to Upload and Deploy LLM Models in Wallaroo


Deploying a LLM to Wallaroo takes two steps:

  • Upload the LLM: The first step is to upload the LLM to Wallaroo. The provided guides help organizations upload LLMs which vary in size from a few hundred megabytes to several gigabytes.
  • Deploy the LLM: LLMs typically require a large number of resources to perform inference results; the following deployment guides help organizations deploy LLMs in a variety of resource environments and requirements.

The following LLM models have been tested with Wallaroo. The majority are Hugging Face LLMs packaged as a Wallaroo BYOP framework models.

The following LLM models have been tested with Wallaroo. The majority are Hugging Face LLMs packaged as a Wallaroo BYOP framework models. These models leverage the llamacpp library.

For access to these sample models and a demonstration on using LLMs with Wallaroo:


Deploy on Intel x86

Deploy on GPU

Deploy on ARM

Deploy on IBM Power10

Managed LLM Inference Endpoints (MaaS) in Wallaroo

Deploy LLMs with OpenAI Compatibility

Wallaroo provides OpenAI compatibility for improved interactive user experiences with LLM-based applications while taking advantage of Wallaroo’s ability to maximize throughput and optimizing latency. AI developers can seamlessly migrate their applications from OpenAI endpoints to Wallaroo endpoints endpoints to Wallaroo on-prem endpoints, in connected and air-gapped environments, without losing any functionality..
For access to sample models and a demonstration on using LLMs with Wallaroo: