Wallaroo.AI (Version 2024.3)
  • 2024.3
    • 2025.1 (Current Version)
    • 2024.4
2024.3
  • 2025.1 (Current Version)
  • 2024.4
  • Home
Wallaroo Feature
  • Deploy16
  • Edge5
  • Observability1
  • Observe14
  • Optimization1
  • Parallel Infer1
  • Run Anywhere14
  • Serve7
Models
  • Aloha2
  • Ccfraud1
  • Hf-Summarization2
  • Houseprice2
  • Houseprice-Prediction5
  • Linear-Regression1
  • Llamav21
  • Mobilenet3
  • Python ARIMA1
  • R-Cnn1
  • Resnet4
  • Resnet501
  • U-Net1
  • Whisper-Large-V21
  • Yolov8n3
Tags
  • MLOps API1
  • Wallaroo SDK1
  1. LLM Operations
  2. LLM Tutorials
  3. LLM Performance Optimizations

LLM Performance Optimizations


The following tutorials demonstrate optimizing LLM performance through Wallaroo.


Autoscaling with Llama 3 8B and Llama.cpp

Dynamic Batching with Llama 3 8B Instruct vLLM Tutorial

Dynamic Batching with Llama 3 8B with Llama.cpp CPUs Tutorial

Llama 3 8B Instruct with vLLM

Quantized Llava 34B with Llama.cpp

© 2025 Wallaroo Labs, Inc.