Qualcomm

The following tutorials demonstrate deploying different LLMs with Qualcomm QIAC AI acceleration.

Deploy Llama with Continuous Batching Using Native vLLM Framework and QAIC AI Acceleration

Deploy RAG Llama with QAIC

Deploy Llama with Continuous Batching Using Native vLLM Framework with QAIC using OpenAI Inference

Deploy RAG Llama with OpenAI compatibility on QAIC