Wallaroo SDK Essentials Guide: Model Uploads and Registrations

How to create and manage Wallaroo Models Uploads through the Wallaroo SDK

Upload Model

ML Models are uploaded to Wallaroo Ops through the wallaroo.client.upload_model method.

IMPORTANT NOTICE

Models uploaded through the Wallaroo SDK upload the workspace assigned as the current workspace in the SDK session, assigned as the user’s Default Workspace by default. See Wallaroo SDK Essentials Guide: Workspace Management for full details on creating and working with workspaces.

Upload Model Parameters

wallaroo.client.upload_model has the following parameters.

Parameter	Type	Description
`name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`path`	`string` (Required)	The path to the model file being uploaded.
`framework`	`string` (Required)	The framework of the model from `wallaroo.framework`
`input_schema`	`pyarrow.lib.Schema` Native Wallaroo Runtimes: (Optional) Containerized Wallaroo Runtimes: (Required)	The input schema in Apache Arrow schema format.
`output_schema`	`pyarrow.lib.Schema` Native Wallaroo Runtimes: (Optional) Containerized Wallaroo Runtimes: (Required)	The output schema in Apache Arrow schema format.
`convert_wait`	`bool` (Optional)	True: Waits in the script for the model conversion completion. False: Proceeds with the script without waiting for the model conversion process to display complete.
`arch`	wallaroo.engine_config.Architecture (Optional)	The architecture the model is deployed to. If a model is intended for deployment to an architecture other than X86, it must be specified during this step. Values include: `X86` (Default): x86 based architectures. `ARM`: ARM based architectures. `Power10`: Power10 based architectures.
`accel`	`wallaroo.engine_config.Acceleration` (Optional)	The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step. `wallaroo.engine_config.Acceleration._None` (Default): No accelerator is assigned. This works for all infrastructures. `wallaroo.engine_config.Acceleration.AIO`: AIO acceleration for Ampere Optimized trained models, only available with ARM processors. `wallaroo.engine_config.Acceleration.Jetson`: Nvidia Jetson acceleration used with edge deployments with ARM processors. See Nvidia Jetson Deployment Scenario for additional requirements. `wallaroo.engine_config.Acceleration.CUDA`: NVIDIA CUDA acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with Nvidia GPUs. `wallaroo.engine_config.Acceleration.OpenVINO`: Intel OpenVino acceleration. AI Accelerator from Intel compatible with x86/64 architectures. Aimed at edge and multi-cloud deployments either with or without Intel GPUs.

Model Architecture Inheritance

Deployment configurations inherit the model’s architecture setting. This is set during model upload by specifying the arch parameter. By default, models uploaded to Wallaroo default to the x86 architecture.

The following model operations inherit the model’s architecture setting.

Model Deployment: Model deployment and Model Deployment Deployment Configuration inherit the the model’s architecture. No specification of the architecture is required for model deployment.
Pipeline Publishing: The Wallaroo engine set when a pipeline is containerized and published to an Open Container Initiative (OCI) Registry inherits the model’s architecture setting.

The following example shows uploading a model set with the architecture set to ARM, and how the deployment inherits that architecture without additional deployment configuration changes. For this example, an ONNX model is uploaded.

import wallaroo

housing_model_control_arm = (wl.upload_model(model_name_arm, 
                                        model_file_name, 
                                        framework=Framework.ONNX,
                                        arch=wallaroo.engine_config.Architecture.ARM)
                                        .configure(tensor_fields=["tensor"])
                        )

display(housing_model_control_arm)

Name	house-price-estimator-arm
Version	163ff0a9-0f1a-4229-bbf2-a19e4385f10f
File Name	rf_model.onnx
SHA	e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6
Status	ready
Image Path	None
Architecture	arm
Acceleration	None
Updated At	2024-04-Mar 20:34:00

Note that the deployment configuration settings, no architecture is specified. When pipeline_arm is displayed, we see the arch setting inherited the model’s arch setting.

pipeline_arm = wl.build_pipeline(arm_pipeline_name)

# set the model step with the ARM targeted model
pipeline_arm.add_model_step(housing_model_control_arm)

#minimum deployment config for this model
deploy_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(1).memory("1Gi").build()

pipeline_arm.deploy(deployment_config = deploy_config)

    Waiting for deployment - this will take up to 45s .......... ok

display(pipeline_arm)

name	architecture-demonstration-arm
created	2024-03-04 20:34:08.895396+00:00
last_updated	2024-03-04 21:52:01.894671+00:00
deployed	True
arch	arm
accel	None
tags
versions	55d834b4-92c8-4a93-b78b-6a224f17f9c1, 98821b85-401a-4ab5-af8e-1b3126727069, 74571863-9eb0-47aa-8b5a-3bdaa7aa9f03, b72fb0db-e4b4-4936-a7cb-3d0fb7827a6f, 3ae70818-10f3-4f61-a998-dee5e2f00daf
steps	house-price-estimator-arm
published	True

Model Accelerator Inheritance

Models deployed to Wallaroo and edge deployments include AI hardware accelerator support. The type of accelerator is set using the wallaroo.client.model_upload(accel: wallaroo.engine_config.Accelerator | None) parameter.

Once uploaded, model deployment configurations for deployments and publishes inherit the model’s accelerator.

The following accelerators are supported.

Accelerator	ARM Support	X64/X86 Support	Intel GPU	Nvidia GPU	Description
`None`	N/A	N/A	N/A	N/A	The default acceleration, used for all scenarios and architectures.
`AIO`	√	X	X	X	AIO acceleration for Ampere Optimized trained models, only available with ARM processors.
`Jetson`	√	X	X	√	Nvidia Jetson acceleration used with edge deployments with ARM processors.
`CUDA`	√	√	X	√	NVIDIA CUDA acceleration supported by both ARM and X64/X86 processors. Intended for deployment with Nvidia GPUs. See Nvidia Jetson Deployment Scenario for additional requirements.
`OpenVINO`	X	√	√	X	Intel OpenVino acceleration. AI Accelerator from Intel compatible with x86/64 architectures. Aimed at edge and multi-cloud deployments either with or without Intel GPUs.
`QAIC`	X	√	X	X	Qualcomm Cloud AI. AI acceleration compatible with x86/64 architectures. For details on LLM deployment optimizations with QAIC, see LLM Inference with Qualcomm QAIC

The following model operations inherit the model’s accelerator setting.

Model Deployment: Model deployment and Model Deployment Deployment Configuration inherit the the model’s architecture. No specification of the architecture is required for model deployment.
Pipeline Publishing: The Wallaroo engine set when a pipeline is containerized and published to an Open Container Initiative (OCI) Registry inherits the model’s architecture setting.

Upload Model Returns

wallaroo.client.upload_model returns the model version. The model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.

Note that models are uploaded to the current workspace assigned in the SDK session. By default, this is the user’s Default Workspace.

Field	Type	Description
`id`	Integer	The numerical identifier of the model version.
`name`	String	The name of the model.
`version`	String	The model version as a unique UUID.
`file_name`	String	The file name of the model as stored in Wallaroo.
`image_path`	String	The image used to deploy the model in the Wallaroo engine.
`last_update_time`	DateTime	When the model was last updated.

Upload Model Examples

The following examples demonstrate uploading different model types. Models uploaded to Wallaroo fall under two runtimes:

Wallaroo Native Runtimes: The following model frameworks are always deployed in the Wallaroo Native Runtime. When these model frameworks are uploaded to Wallaroo, the model name, file path, and model framework are required.
- ONNX
- Tensorflow
Wallaroo Containerized Runtimes: The following model frameworks may be deployed in either the Wallaroo Native Runtime, or the Wallaroo Containerized Runtime. When these models are uploaded to Wallaroo, the model name, file path, model framework, input and output schemas are required.
When uploaded, Wallaroo will attempt to convert Non-Native Runtimes to a Wallaroo Native Runtime. If it can not be converted, then it will be packed into a Wallaroo Containerized Runtime.

Native Runtime Upload

The following demonstrates uploading a ONNX model to a Wallaroo Ops instance. For Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX for full details on uploading ONNX models and model configurations.

ONNX models are deployed in the Wallaroo Native Runtime and require the following fields when uploaded via the wallaroo.client.Client.upload_model method:

Parameter	Type	Description
`name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`path`	`string` (Required)	The path to the model file being uploaded.
`framework`	`string` (Required)	The framework of the model from `wallaroo.framework`

The following example demonstrates uploading the ONNX file via the Wallaroo SDK.


model = wl.upload_model(
            name = 'sample-model',
            path = './models/sample_model.onnx',
            framework = wallaroo.framework.Framework.ONNX
        )

pipeline.add_model_step(model)

deploy_config = wallaroo.DeploymentConfigBuilder()\
                .replica_count(1)\
                .cpus(0.5)\
                .memory("1Gi")\
                .build()
pipeline.deploy(deployment_config=deploy_config)

smoke_test = pd.DataFrame.from_records([
    {
        "dense_input":[
            1.0678324729,
            0.2177810266,
            -1.7115145262,
            0.682285721,
            1.0138553067,
            -0.4335000013,
            0.7395859437,
            -0.2882839595,
            -0.447262688,
            0.5146124988,
            0.3791316964,
            0.5190619748,
            -0.4904593222,
            1.1656456469,
            -0.9776307444,
            -0.6322198963,
            -0.6891477694,
            0.1783317857,
            0.1397992467,
            -0.3554220649,
            0.4394217877,
            1.4588397512,
            -0.3886829615,
            0.4353492889,
            1.7420053483,
            -0.4434654615,
            -0.1515747891,
            -0.2668451725,
            -1.4549617756
        ]
    }
])
result = pipeline.infer(smoke_test)
display(result)

	time	in.dense_input	out.dense_1	anomaly.count
0	2023-10-17 16:13:56.169	[1.0678324729, 0.2177810266, -1.7115145262, 0.682285721, 1.0138553067, -0.4335000013, 0.7395859437, -0.2882839595, -0.447262688, 0.5146124988, 0.3791316964, 0.5190619748, -0.4904593222, 1.1656456469, -0.9776307444, -0.6322198963, -0.6891477694, 0.1783317857, 0.1397992467, -0.3554220649, 0.4394217877, 1.4588397512, -0.3886829615, 0.4353492889, 1.7420053483, -0.4434654615, -0.1515747891, -0.2668451725, -1.4549617756]	[0.0014974177]	0

Wallaroo Containerized Upload

Models uploaded to Wallaroo that may require containerization before deploying in Wallaroo require the following parameters when uploaded via the Wallaroo SDK method wallaroo.client.Client.upload_model.

Parameter	Type	Description
`name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`path`	`string` (Required)	The path to the model file being uploaded.
`framework`	`string` (Required)	The framework of the model from `wallaroo.framework`
`input_schema`	`pyarrow.lib.Schema` (Required)	The input schema in Apache Arrow schema format.
`output_schema`	`pyarrow.lib.Schema` (Required)	The output schema in Apache Arrow schema format.
`convert_wait`	`bool` (Optional)	True: Waits in the script for the model conversion completion. False: Proceeds with the script without waiting for the model conversion process to display complete.

The following demonstrates uploading an PyTorch model to a Wallaroo Ops instance. In this example, the ML model is converted to the Wallaroo Native Runtime.

input_schema = pa.schema(
    [
        pa.field('input', pa.list_(pa.float32(), list_size=10))
    ]
)

output_schema = pa.schema(
[
    pa.field('output', pa.list_(pa.float32(), list_size=1))
]
)

model = wl.upload_model('pt-single-io-model', 
                        "./models/model-auto-conversion_pytorch_single_io_model.pt", 
                        framework=Framework.PYTORCH, 
                        input_schema=input_schema, 
                        output_schema=output_schema
                       )
display(model)

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Ready

model.config().runtime()

'onnx'

The following example demonstrates uploading a BYOP model. After it is uploaded, it is converted to a Wallaroo Containerized Runtime.

input_schema = pa.schema([
    pa.field('images', pa.list_(
        pa.list_(
            pa.list_(
                pa.int64(),
                list_size=3
            ),
            list_size=32
        ),
        list_size=32
    )),
])

output_schema = pa.schema([
    pa.field('predictions', pa.int64()),
])

model = wl.upload_model('vgg16-clustering', 
                       './models/model-auto-conversion-BYOP-vgg16-clustering.zip', 
                        framework=Framework.CUSTOM, 
                        input_schema=input_schema, 
                        output_schema=output_schema, 
                        convert_wait=True)

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime..........................successful

Ready

model.config().runtime()

'flight'

Register a Containerized MLFlow Model

Parameter	Description
Web Site	https://mlflow.org
Supported Libraries	mlflow==1.30.0

For models that do not fall under the supported model frameworks, organizations can use containerized MLFlow ML Models.

This guide details how to add ML Models from a model registry service into Wallaroo.

Wallaroo supports both public and private containerized model registries. See the Wallaroo Private Containerized Model Container Registry Guide for details on how to configure a Wallaroo instance with a private model registry.

Wallaroo users can register their trained MLFlow ML Models from a containerized model container registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.

As of this time, Wallaroo only supports MLFlow 1.30.0 containerized models. For information on how to containerize an MLFlow model, see the MLFlow Documentation.

Containerized MLFlow models are not uploaded, but registered from a container registry service. This is performed through the wallaroo.client.register_model_image(options), and wallaroo.model_version.configure(options) method.

IMPORTANT NOTICE

Models registered through the Wallaroo SDK are associated with the current workspace in the SDK session, assigned as the user’s Default Workspace by default. See Wallaroo SDK Essentials Guide: Workspace Management for full details on creating and working with workspaces.

Register a Containerized MLFlow Model Parameters

The following parameters must be set for wallaroo.client.register_model_image(options) and wallaroo.model_version.configure(options) for a Containerized MLFlow model to be registered in Wallaroo.

Register Model Image Parameters

Parameter	Type	Description
`model_name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`image`	`string` (Required)	The URL to the containerized MLFlow model in the MLFlow Registry..

Model Version Configuration Parameters

Model version configurations are updated with the wallaroo.model_version.config and include the following parameters.

Parameter	Type	Description
`tensor_fields`	(List[string]) (Optional)	A list of alternate input fields. For example, if the model accepts the input fields `['variable1', 'variable2']`, `tensor_fields` allows those inputs to be overridden to `['square_feet', 'house_age']`, or other values as required. These only apply to `ONNX` models.
`batch_config`	(List[string]) (Optional)	Batch config is either `None` for multiple-input inferences, or `single` to accept an inference request with only one row of data.

For model version configuration for MLFlow models, the following must be defined:

runtime: Set as mlflow.
input_schema: The input schema from the Apache Arrow pyarrow.lib.Schema format.
output_schema: The output schema from the Apache Arrow pyarrow.lib.Schema format.

Register a Containerized MLFlow Model Returns

wallaroo.client.register_model_image(options) returns the model version. The model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.

Note that models are uploaded to the current workspace assigned in the SDK session. By default, this is the user’s Default Workspace.

Field	Type	Description
`id`	Integer	The numerical identifier of the model version.
`name`	String	The name of the model.
`version`	String	The model version as a unique UUID.
`file_name`	String	The file name of the model as stored in Wallaroo.
`image_path`	String	The image used to deploy the model in the Wallaroo engine.
`last_update_time`	DateTime	When the model was last updated.

Register a Containerized MLFlow Model Example

The following example demonstrates registering a Statsmodel model stored in a MLFLow container with a Wallaroo instance.

sm_input_schema = pa.schema([
  pa.field('temp', pa.float32()),
  pa.field('holiday', pa.uint8()),
  pa.field('workingday', pa.uint8()),
  pa.field('windspeed', pa.float32())
])

sm_output_schema = pa.schema([
    pa.field('predicted_mean', pa.float32())
])

sm_model = wl.register_model_image(
    name="mlflow-statmodels",
    image="ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1"
    ).configure("mlflow", 
            input_schema=sm_input_schema, 
            output_schema=sm_output_schema
    )

sm_model

Name	mlflowstatmodels
Version	eb1bcec8-63fe-4a82-98ea-fc4945786973
File Name	none
SHA	3afd13d9c5070679e284050cd099e84aa2e5cb7c08a788b21d6cb2397615d018
Status	ready
Image Path	ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1
Architecture	None
Updated At	2024-30-Jan 16:11:55

MLFlow Data Formats

When using containerized MLFlow models with Wallaroo, the inputs and outputs must be named. For example, the following output:

[-12.045839810372835]

Would need to be wrapped with the data values named:

[{"prediction": -12.045839810372835}]

A short sample code for wrapping data may be:

output_df = pd.DataFrame(prediction, columns=["prediction"])
return output_df

Get Model Config

The model versions configuration defines how the model is used in the Wallaroo Inference Engine. Settings include:

The runtime
Input and output schemas

The model version configuration is retrieved with the method wallaroo.model_version.ModelVersion.config().

Get Model Config Parameters

N/A

Get Model Config Returns

The method wallaroo.model_version.ModelVersion.config() returns wallaroo.model_config.ModelConfig. The following fields are part of the model config object.

Method	Return Type	Description
id()	Integer	The id of model version the configuration is assigned to.
to_yaml()	String	A YAML output of the model configuration options that are not None.
tensor_fields()	List[String]	A list of tensor field names that override the default model fields. Only applies to `onnx` models.
model_version()	wallaroo.model_version.ModelVersion	The model version the model configuration is assigned to.
runtime()	String*	The model runtime as defined by `wallaroo.framework.Framework`

Get Model Config Example

The following examples retrieves the model runtime from a model version.

import wallaroo

# get the most recent model version
model_config = sample_model.versions()[-1].config()

print(model_config.runtime())

onnx

Wallaroo SDK Essentials Guide: Model Uploads and Registrations

Upload Model

IMPORTANT NOTICE

Upload Model Parameters

Model Architecture Inheritance

Model Accelerator Inheritance

Upload Model Returns

Upload Model Examples

Native Runtime Upload

Wallaroo Containerized Upload

Register a Containerized MLFlow Model

IMPORTANT NOTICE

IMPORTANT NOTICE

Register a Containerized MLFlow Model Parameters

Register Model Image Parameters

Model Version Configuration Parameters

Register a Containerized MLFlow Model Returns

Register a Containerized MLFlow Model Example

MLFlow Data Formats

Get Model Config

Get Model Config Parameters

Get Model Config Returns

Get Model Config Example

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Containerized MLFlow

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Custom Model

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Model Registry Services

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: PyTorch

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: SKLearn

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Hugging Face

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: TensorFlow

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: TensorFlow Keras

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: XGBoost