How to Upload and Deploy LLM Models in Wallaroo
Deploying a LLM to Wallaroo takes two steps:
- Upload the LLM: The first step is to upload the LLM to Wallaroo. The provided guides help organizations upload LLMs which vary in size from a few hundred megabytes to several gigabytes.
- Deploy the LLM: LLMs typically require a large number of resources to perform inference results; the following deployment guides help organizations deploy LLMs in a variety of resource environments and requirements.
The following LLM models have been tested with Wallaroo. The majority are Hugging Face LLMs packaged as a Wallaroo BYOP framework models.
The following LLM models have been tested with Wallaroo. The majority are Hugging Face LLMs packaged as a Wallaroo BYOP framework models. These models leverage the llamacpp library.
- Llama
- Llama v2 7B Chat Quantized with llamacpp aka Llama 2 7B Chat - GGML on ARM and X86
- Llama v3 8B Instruct Quantized with llamacpp aka 4-bit Quantized Llama 3 Model on ARM and X86
- Llama v2 7B standard on 1 GPU
- Llama v2 7B chat on 1 GPU
- Llama v2 7B instruct on 1 GPU
- Llama v2 70B quantized on 1 GPU
- Llama v3 8B standard on 1 GPU
- Llama v2 8B instruct on 1 GPU
- IBM-Granite
For access to these sample models and a demonstration on using LLMs with Wallaroo:
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today
Deploy LLMs with OpenAI Compatibility
Wallaroo provides OpenAI compatibility for improved interactive user experiences with LLM-based applications while taking advantage of Wallaroo’s ability to maximize throughput and optimizing latency. AI developers can seamlessly migrate their applications from OpenAI endpoints to Wallaroo endpoints endpoints to Wallaroo on-prem endpoints, in connected and air-gapped environments, without losing any functionality..
For access to sample models and a demonstration on using LLMs with Wallaroo:
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today