How to Upload and Deploy LLM Models in Wallaroo
Deploying a LLM to Wallaroo takes two steps:
- Upload the LLM: The first step is to upload the LLM to Wallaroo. The provided guides help organizations upload LLMs which vary in size from a few hundred megabytes to several gigabytes.
- Deploy the LLM: LLMs typically require a large number of resources to perform inference results; the following deployment guides help organizations deploy LLMs in a variety of resource environments and requirements.
The following LLM models have been tested with Wallaroo. The majority are Hugging Face LLMs packaged as a Wallaroo BYOP framework models.
The following LLM models have been tested with Wallaroo. The majority are Hugging Face LLMs packaged as a Wallaroo BYOP framework models. These models leverage the llamacpp library.
- Llama
- Llama v2 7B Chat Quantized with llamacpp aka Llama 2 7B Chat - GGML on ARM and X86
- Llama v3 8B Instruct Quantized with llamacpp aka 4-bit Quantized Llama 3 Model on ARM and X86
- Llama v2 7B standard on 1 GPU
- Llama v2 7B chat on 1 GPU
- Llama v2 7B instruct on 1 GPU
- Llama v2 70B quantized on 1 GPU
- Llama v3 8B standard on 1 GPU
- Llama v2 8B instruct on 1 GPU
- IBM-Granite
For access to these sample models and a demonstration on using LLMs with Wallaroo:
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today