Edge and Multi-cloud

How to run the Wallaroo Inference Server in any cloud, any architecture, any platform.

Edge and Multi-cloud Inference Anywhere provides the ability to deploy models and perform inferences on them in any environment (edge or multicloud), on any hardware. The inferences in these environments are observed for drift detection, the deployed models updated when new versions or entire new sets of models are created, and are deployed with or without GPUs.

The following hardware and AI Accelerators are supported.

Accelerator	ARM Support	X64/X86 Support	Intel GPU	Nvidia GPU	Description
`None`	N/A	N/A	N/A	N/A	The default acceleration, used for all scenarios and architectures.
`AIO`	√	X	X	X	AIO acceleration for Ampere Optimized trained models, only available with ARM processors.
`Jetson`	√	X	X	√	Nvidia Jetson acceleration used with edge deployments with ARM processors.
`CUDA`	√	√	X	√	NVIDIA CUDA acceleration supported by both ARM and X64/X86 processors. Intended for deployment with Nvidia GPUs. See Nvidia Jetson Deployment Scenario for additional requirements.
`OpenVINO`	X	√	√	X	Intel OpenVino acceleration. AI Accelerator from Intel compatible with x86/64 architectures. Aimed at edge and multi-cloud deployments either with or without Intel GPUs.
`QAIC`	X	√	X	X	Qualcomm Cloud AI. AI acceleration compatible with x86/64 architectures. For details on LLM deployment optimizations with QAIC, see LLM Inference with Qualcomm QAIC

The following guides describe how to:

Publish a model for deployment on edge and multicloud environments.
Deploy and perform inferences on edge and multicloud environments.
Use observability features to track model drift and model performance from the Wallaroo Ops and through the edge deployments.
Use inference acceleration to deploy models on different architectures and AI accelerators.

How to perform inferences on deployed models in edge and multicloud environments.

How to observe edge and multicloud deployed models for performance, model drift, and related issues.

How to update and manage edge and multicloud models.

How to run the Wallaroo Inference Server on diverse hardware architectures and their associated acceleration libraries.