Inference on Any Hardware

How to run the Wallaroo Inference Server on diverse hardware architectures and their associated acceleration libraries.

Wallaroo provides the ability to deploy models and perform inferences on them in any environment (edge or multicloud), on any hardware. The inferences in these environments are observed for drift detection, the deployed models updated when new versions or entire new sets of models are created, and are deployed with or without GPUs.

The following hardware and AI Accelerators are supported.

AcceleratorARM SupportX64/X86 SupportDescription
NoneThe default acceleration, used for all scenarios and architectures.
AIOXAIO acceleration for Ampere Optimized trained models, only available with ARM processors.
JetsonXNvidia Jetson acceleration used with edge deployments with ARM processors.
CUDANvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with GPUs.

The following guides describe how to:

  • Publish a model for deployment on edge and multicloud environments.
  • Deploy and perform inferences on edge and multicloud environments.
  • Use observability features to track model drift and model performance from the Wallaroo Ops and through the edge deployments.
  • Use inference acceleration to deploy models on different architectures and AI accelerators.

Inference on ARM Architecture

How to deploy ML models with ARM processors and infrastructure.

Inference with GPUs

How to use package models to run on GPUs

Inference with Acceleration Libraries

How to use package models to run with hardware accelerators