Wallaroo.AI (Version 2025.2)
2025.2 (Current Version)
2025.1
2024.4
2024.3
2023.2
Inference
Inference Tutorials
High Performance Tutorials
High Performance Tutorials
The following tutorials demonstrate optimizing LLM performance through Wallaroo.
Continuous Batching for Llama 3.1 8B with vLLM
Dynamic Batching with Llama 3 8B Instruct vLLM Tutorial
Dynamic Batching with Llama 3 8B with Llama.cpp CPUs Tutorial
Llama 3 8B Instruct with vLLM
Quantized Llava 34B with Llama.cpp
Continuous Batching for Custom Llama with vLLM