Wallaroo.AI (Version 2024.4)
  • 2024.4 (Current Version)
    • 2024.3
2024.4 (Current Version)
  • 2024.3
  • Home
Wallaroo Feature
  • Deploy12
  • Edge6
  • Observability1
  • Observe8
  • Optimization1
  • Parallel Infer1
  • Run Anywhere17
  • Serve11
Models
  • Aloha2
  • Ccfraud1
  • Hf Summarizer1
  • Hf-Summarization2
  • Houseprice2
  • Houseprice-Prediction2
  • Hugging Face1
  • Linear-Regression1
  • Llamav21
  • Llm1
  • Mobilenet1
  • Python ARIMA1
  • R-Cnn1
  • Resnet1
  • Resnet501
  • U-Net3
  • Whisper-Large-V21
  • Yolov82
  • Yolov8n2
Tags
  • Wallaroo SDK1
  1. LLM Operations
  2. LLM Tutorials
  3. LLM Performance Optimizations

LLM Performance Optimizations


The following tutorials demonstrate optimizing LLM performance through Wallaroo.


Autoscaling with Llama 3 8B and Llama.cpp

Dynamic Batching with Llama 3 8B Instruct vLLM Tutorial

Dynamic Batching with Llama 3 8B with Llama.cpp CPUs Tutorial

Llama 3 8B Instruct with vLLM

Quantized Llava 34B with Llama.cpp

© 2025 Wallaroo Labs, Inc.