Wallaroo SDK Essentials Guide: Model Uploads and Registrations

How to create and manage Wallaroo Models Uploads through the Wallaroo SDK

Table of Contents

Upload Model

ML Models are uploaded to Wallaroo Ops through the wallaroo.client.upload_model method.

Upload Model Parameters

wallaroo.client.upload_model has the following parameters.

ParameterTypeDescription
namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)The framework of the model from wallaroo.framework
input_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Containerized Wallaroo Runtimes: (Required)
The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema
  • Native Wallaroo Runtimes: (Optional)
  • Containerized Wallaroo Runtimes: (Required)
The output schema in Apache Arrow schema format.
convert_waitbool (Optional)
  • True: Waits in the script for the model conversion completion.
  • False: Proceeds with the script without waiting for the model conversion process to display complete.
archwallaroo.engine_config.Architecture (Optional)The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include:
  • X86 (Default): x86 based architectures.
  • ARM: ARM based architectures.
accelwallaroo.engine_config.Acceleration (Optional)The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
  • wallaroo.engine_config.Acceleration._None (Default): No accelerator is assigned. This works for all infrastructures.
  • wallaroo.engine_config.Acceleration.AIO: AIO acceleration for Ampere Optimized trained models, only available with ARM processors.
  • wallaroo.engine_config.Acceleration.Jetson: Nvidia Jetson acceleration used with edge deployments with ARM processors.
  • wallaroo.engine_config.Acceleration.CUDA: Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. This is intended for deployment with Nvidia GPUs.
  • wallaroo.engine_config.Acceleration.OpenVINO: Intel OpenVino acceleration. AI Accelerator from Intel compatible with x86/64 architectures. Aimed at edge and multi-cloud deployments either with or without Intel GPUs.

Model Architecture Inheritance

Deployment configurations inherit the model’s architecture setting. This is set during model upload by specifying the arch parameter. By default, models uploaded to Wallaroo default to the x86 architecture.

The following model operations inherit the model’s architecture setting.

The following example shows uploading a model set with the architecture set to ARM, and how the deployment inherits that architecture without additional deployment configuration changes. For this example, an ONNX model is uploaded.

import wallaroo

housing_model_control_arm = (wl.upload_model(model_name_arm, 
                                        model_file_name, 
                                        framework=Framework.ONNX,
                                        arch=wallaroo.engine_config.Architecture.ARM)
                                        .configure(tensor_fields=["tensor"])
                        )

display(housing_model_control_arm)
Namehouse-price-estimator-arm
Version163ff0a9-0f1a-4229-bbf2-a19e4385f10f
File Namerf_model.onnx
SHAe22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6
Statusready
Image PathNone
Architecturearm
AccelerationNone
Updated At2024-04-Mar 20:34:00

Note that the deployment configuration settings, no architecture is specified. When pipeline_arm is displayed, we see the arch setting inherited the model’s arch setting.

pipeline_arm = wl.build_pipeline(arm_pipeline_name)

# set the model step with the ARM targeted model
pipeline_arm.add_model_step(housing_model_control_arm)

#minimum deployment config for this model
deploy_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(1).memory("1Gi").build()

pipeline_arm.deploy(deployment_config = deploy_config)

    Waiting for deployment - this will take up to 45s .......... ok

display(pipeline_arm)
namearchitecture-demonstration-arm
created2024-03-04 20:34:08.895396+00:00
last_updated2024-03-04 21:52:01.894671+00:00
deployedTrue
archarm
accelNone
tags
versions55d834b4-92c8-4a93-b78b-6a224f17f9c1, 98821b85-401a-4ab5-af8e-1b3126727069, 74571863-9eb0-47aa-8b5a-3bdaa7aa9f03, b72fb0db-e4b4-4936-a7cb-3d0fb7827a6f, 3ae70818-10f3-4f61-a998-dee5e2f00daf
stepshouse-price-estimator-arm
publishedTrue

Model Accelerator Inheritance

Models deployed to Wallaroo and edge deployments include AI hardware accelerator support. The type of accelerator is set using the wallaroo.client.model_upload(accel: wallaroo.engine_config.Accelerator | None) parameter.

Once uploaded, model deployment configurations for deployments and publishes inherit the model’s accelerator.

The following accelerators are supported.

AcceleratorARM SupportX64/X86 SupportIntel GPUNvidia GPUDescription
NoneN/AN/AN/AN/AThe default acceleration, used for all scenarios and architectures.
AIOXXXAIO acceleration for Ampere Optimized trained models, only available with ARM processors.
JetsonXXNvidia Jetson acceleration used with edge deployments with ARM processors.
CUDAXNvidia Cuda acceleration supported by both ARM and X64/X86 processors. Intended for deployment with Nvidia GPUs.
OpenVINOXXIntel OpenVino acceleration. AI Accelerator from Intel compatible with x86/64 architectures. Aimed at edge and multi-cloud deployments either with or without Intel GPUs.

The following model operations inherit the model’s accelerator setting.

Upload Model Returns

wallaroo.client.upload_model returns the model version. The model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.

  • Note that models are uploaded to the current workspace assigned in the SDK session. By default, this is the user’s Default Workspace.
FieldTypeDescription
idIntegerThe numerical identifier of the model version.
nameStringThe name of the model.
versionStringThe model version as a unique UUID.
file_nameStringThe file name of the model as stored in Wallaroo.
image_pathStringThe image used to deploy the model in the Wallaroo engine.
last_update_timeDateTimeWhen the model was last updated.

Upload Model Examples

The following examples demonstrate uploading different model types. Models uploaded to Wallaroo fall under two runtimes:

  • Wallaroo Native Runtimes: The following model frameworks are always deployed in the Wallaroo Native Runtime. When these model frameworks are uploaded to Wallaroo, the model name, file path, and model framework are required.

  • Wallaroo Containerized Runtimes: The following model frameworks may be deployed in either the Wallaroo Native Runtime, or the Wallaroo Containerized Runtime. When these models are uploaded to Wallaroo, the model name, file path, model framework, input and output schemas are required.

    When uploaded, Wallaroo will attempt to convert Non-Native Runtimes to a Wallaroo Native Runtime. If it can not be converted, then it will be packed into a Wallaroo Containerized Runtime.

Native Runtime Upload

The following demonstrates uploading a ONNX model to a Wallaroo Ops instance. For Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX for full details on uploading ONNX models and model configurations.

ONNX models are deployed in the Wallaroo Native Runtime and require the following fields when uploaded via the wallaroo.client.Client.upload_model method:

ParameterTypeDescription
namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)The framework of the model from wallaroo.framework

The following example demonstrates uploading the ONNX file via the Wallaroo SDK.


model = wl.upload_model(
            name = 'sample-model',
            path = './models/sample_model.onnx',
            framework = wallaroo.framework.Framework.ONNX
        )

pipeline.add_model_step(model)

deploy_config = wallaroo.DeploymentConfigBuilder()\
                .replica_count(1)\
                .cpus(0.5)\
                .memory("1Gi")\
                .build()
pipeline.deploy(deployment_config=deploy_config)

smoke_test = pd.DataFrame.from_records([
    {
        "dense_input":[
            1.0678324729,
            0.2177810266,
            -1.7115145262,
            0.682285721,
            1.0138553067,
            -0.4335000013,
            0.7395859437,
            -0.2882839595,
            -0.447262688,
            0.5146124988,
            0.3791316964,
            0.5190619748,
            -0.4904593222,
            1.1656456469,
            -0.9776307444,
            -0.6322198963,
            -0.6891477694,
            0.1783317857,
            0.1397992467,
            -0.3554220649,
            0.4394217877,
            1.4588397512,
            -0.3886829615,
            0.4353492889,
            1.7420053483,
            -0.4434654615,
            -0.1515747891,
            -0.2668451725,
            -1.4549617756
        ]
    }
])
result = pipeline.infer(smoke_test)
display(result)
 timein.dense_inputout.dense_1anomaly.count
02023-10-17 16:13:56.169[1.0678324729, 0.2177810266, -1.7115145262, 0.682285721, 1.0138553067, -0.4335000013, 0.7395859437, -0.2882839595, -0.447262688, 0.5146124988, 0.3791316964, 0.5190619748, -0.4904593222, 1.1656456469, -0.9776307444, -0.6322198963, -0.6891477694, 0.1783317857, 0.1397992467, -0.3554220649, 0.4394217877, 1.4588397512, -0.3886829615, 0.4353492889, 1.7420053483, -0.4434654615, -0.1515747891, -0.2668451725, -1.4549617756][0.0014974177]0
Wallaroo Containerized Upload

Models uploaded to Wallaroo that may require containerization before deploying in Wallaroo require the following parameters when uploaded via the Wallaroo SDK method wallaroo.client.Client.upload_model.

ParameterTypeDescription
namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)The framework of the model from wallaroo.framework
input_schemapyarrow.lib.Schema (Required)The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema (Required)The output schema in Apache Arrow schema format.
convert_waitbool (Optional)
  • True: Waits in the script for the model conversion completion.
  • False: Proceeds with the script without waiting for the model conversion process to display complete.

The following demonstrates uploading an PyTorch model to a Wallaroo Ops instance. In this example, the ML model is converted to the Wallaroo Native Runtime.

input_schema = pa.schema(
    [
        pa.field('input', pa.list_(pa.float32(), list_size=10))
    ]
)

output_schema = pa.schema(
[
    pa.field('output', pa.list_(pa.float32(), list_size=1))
]
)

model = wl.upload_model('pt-single-io-model', 
                        "./models/model-auto-conversion_pytorch_single_io_model.pt", 
                        framework=Framework.PYTORCH, 
                        input_schema=input_schema, 
                        output_schema=output_schema
                       )
display(model)

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Ready

model.config().runtime()

'onnx'

The following example demonstrates uploading a BYOP model. After it is uploaded, it is converted to a Wallaroo Containerized Runtime.

input_schema = pa.schema([
    pa.field('images', pa.list_(
        pa.list_(
            pa.list_(
                pa.int64(),
                list_size=3
            ),
            list_size=32
        ),
        list_size=32
    )),
])

output_schema = pa.schema([
    pa.field('predictions', pa.int64()),
])

model = wl.upload_model('vgg16-clustering', 
                       './models/model-auto-conversion-BYOP-vgg16-clustering.zip', 
                        framework=Framework.CUSTOM, 
                        input_schema=input_schema, 
                        output_schema=output_schema, 
                        convert_wait=True)

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime..........................successful

Ready

model.config().runtime()

'flight'

Register a Containerized MLFlow Model

ParameterDescription
Web Sitehttps://mlflow.org
Supported Librariesmlflow==1.30.0

For models that do not fall under the supported model frameworks, organizations can use containerized MLFlow ML Models.

This guide details how to add ML Models from a model registry service into Wallaroo.

Wallaroo supports both public and private containerized model registries. See the Wallaroo Private Containerized Model Container Registry Guide for details on how to configure a Wallaroo instance with a private model registry.

Wallaroo users can register their trained MLFlow ML Models from a containerized model container registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.

As of this time, Wallaroo only supports MLFlow 1.30.0 containerized models. For information on how to containerize an MLFlow model, see the MLFlow Documentation.

Containerized MLFlow models are not uploaded, but registered from a container registry service. This is performed through the wallaroo.client.register_model_image(options), and wallaroo.model_version.configure(options) method.

Register a Containerized MLFlow Model Parameters

The following parameters must be set for wallaroo.client.register_model_image(options) and wallaroo.model_version.configure(options) for a Containerized MLFlow model to be registered in Wallaroo.

Register Model Image Parameters
ParameterTypeDescription
model_namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
imagestring (Required)The URL to the containerized MLFlow model in the MLFlow Registry..
Model Version Configuration Parameters

Model version configurations are updated with the wallaroo.model_version.config and include the following parameters.

ParameterTypeDescription
tensor_fields(List[string]) (Optional)A list of alternate input fields. For example, if the model accepts the input fields ['variable1', 'variable2'], tensor_fields allows those inputs to be overridden to ['square_feet', 'house_age'], or other values as required. These only apply to ONNX models.
batch_config(List[string]) (Optional)Batch config is either None for multiple-input inferences, or single to accept an inference request with only one row of data.

For model version configuration for MLFlow models, the following must be defined:

  • runtime: Set as mlflow.
  • input_schema: The input schema from the Apache Arrow pyarrow.lib.Schema format.
  • output_schema: The output schema from the Apache Arrow pyarrow.lib.Schema format.

Register a Containerized MLFlow Model Returns

wallaroo.client.register_model_image(options) returns the model version. The model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.

  • Note that models are uploaded to the current workspace assigned in the SDK session. By default, this is the user’s Default Workspace.
FieldTypeDescription
idIntegerThe numerical identifier of the model version.
nameStringThe name of the model.
versionStringThe model version as a unique UUID.
file_nameStringThe file name of the model as stored in Wallaroo.
image_pathStringThe image used to deploy the model in the Wallaroo engine.
last_update_timeDateTimeWhen the model was last updated.

Register a Containerized MLFlow Model Example

The following example demonstrates registering a Statsmodel model stored in a MLFLow container with a Wallaroo instance.

sm_input_schema = pa.schema([
  pa.field('temp', pa.float32()),
  pa.field('holiday', pa.uint8()),
  pa.field('workingday', pa.uint8()),
  pa.field('windspeed', pa.float32())
])

sm_output_schema = pa.schema([
    pa.field('predicted_mean', pa.float32())
])

sm_model = wl.register_model_image(
    name="mlflow-statmodels",
    image="ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1"
    ).configure("mlflow", 
            input_schema=sm_input_schema, 
            output_schema=sm_output_schema
    )

sm_model
Namemlflowstatmodels
Versioneb1bcec8-63fe-4a82-98ea-fc4945786973
File Namenone
SHA3afd13d9c5070679e284050cd099e84aa2e5cb7c08a788b21d6cb2397615d018
Statusready
Image Pathghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1
ArchitectureNone
Updated At2024-30-Jan 16:11:55

MLFlow Data Formats

When using containerized MLFlow models with Wallaroo, the inputs and outputs must be named. For example, the following output:

[-12.045839810372835]

Would need to be wrapped with the data values named:

[{"prediction": -12.045839810372835}]

A short sample code for wrapping data may be:

output_df = pd.DataFrame(prediction, columns=["prediction"])
return output_df

Get Model Config

The model versions configuration defines how the model is used in the Wallaroo Inference Engine. Settings include:

  • The runtime
  • Input and output schemas

The model version configuration is retrieved with the method wallaroo.model_version.ModelVersion.config().

Get Model Config Parameters

N/A

Get Model Config Returns

The method wallaroo.model_version.ModelVersion.config() returns wallaroo.model_config.ModelConfig. The following fields are part of the model config object.

MethodReturn TypeDescription
id()IntegerThe id of model version the configuration is assigned to.
to_yaml()StringA YAML output of the model configuration options that are not None.
tensor_fields()List[String]A list of tensor field names that override the default model fields. Only applies to onnx models.
model_version()wallaroo.model_version.ModelVersionThe model version the model configuration is assigned to.
runtime()String*The model runtime as defined by wallaroo.framework.Framework

Get Model Config Example

The following examples retrieves the model runtime from a model version.

import wallaroo

# get the most recent model version
model_config = sample_model.versions()[-1].config()

print(model_config.runtime())

onnx

Wallaroo MLOps Upload Generate Command

The method wallaroo.client.Client.generate_upload_model_api_command generates a curl script for uploading models to Wallaroo via the Wallaroo MLOps API. The generated curl script is based on the Wallaroo SDK user’s current workspace. This is useful for environments that do not have the Wallaroo SDK installed, or uploading very large models (10 gigabytes or more).

The command assumes that other upload parameters are set to default. For details on uploading models via the Wallaroo MLOps API, see Wallaroo MLOps API Essentials Guide: Model Upload and Registrations.

This method takes the following parameters:

ParameterTypeDescription
base_urlString (Required)The Wallaroo domain name. For example: wallaroo.example.com.
nameString (Required)The name to assign the model at upload. This must match DNS naming conventions.
pathString (Required)Path to the ML or LLM model file.
frameworkString (Required)The framework from wallaroo.framework.Framework For a complete list, see Wallaroo Supported Models.
input_schemaString (Required)The model’s input schema in PyArrow.Schema format.
output_schemaString (Required)The model’s output schema in PyArrow.Schema format.

This outputs a curl command in the following format (indentions added for emphasis). The sections marked with {} represent the variable names that are injected into the script from the above parameter or from the current SDK session:

  • {Current Workspace['id']}: The value of the id for the current workspace.
  • {Bearer Token}: The bearer token used to authentication to the Wallaroo MLOps API.
curl --progress-bar -X POST \
      -H "Content-Type: multipart/form-data" \
      -H "Authorization: Bearer {Bearer Token}"
      -F "metadata={"name": {name}, "visibility": "private", "workspace_id": {Current Workspace['id']}, "conversion": {"arch": "x86", "accel": "none", "framework": "custom", "python_version": "3.8", "requirements": []}, \
      "input_schema": "{base64 version of input_schema}", \
      "output_schema": "base64 version of the output_schema"};type=application/json" \
      -F "file=@{path};type=application/octet-stream" \
      https://{base_url}/v1/api/models/upload_and_convert

Once generated, users can use the script to upload the model via the Wallaroo MLOps API.

The following example shows setting the parameters above and generating the model upload API command.

import wallaroo
import pyarrow as pa

# set the input and output schemas

input_schema = pa.schema([
    pa.field("text", pa.string())
])

output_schema = pa.schema([
    pa.field("generated_text", pa.string())
])

# use the generate model upload api command

wl.generate_upload_model_api_command(
    base_url='https://example.wallaroo.ai/',
    name='sample_model_name', 
    path='llama_byop.zip',
    framework=Framework.CUSTOM,
    input_schema=input_schema,
    output_schema=output_schema)

The output of this command is:

curl --progress-bar -X POST -H "Content-Type: multipart/form-data" -H "Authorization: Bearer abc123" -F "metadata={"name": "sample_model_name", "visibility": "private", "workspace_id": 20, "conversion": {"arch": "x86", "accel": "none", "framework": "custom", "python_version": "3.8", "requirements": []}, "input_schema": "/////3AAAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFEAAAABwAAAAEAAAAAAAAAAQAAAB0ZXh0AAAAAAQABAAEAAAA", "output_schema": "/////3gAAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFEAAAACQAAAAEAAAAAAAAAA4AAABnZW5lcmF0ZWRfdGV4dAAABAAEAAQAAAA="};type=application/json" -F "file=@llama_byop.zip;type=application/octet-stream"        https://example.wallaroo.ai/v1/api/models/upload_and_convert'

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX

How to upload and use ONNX ML Models with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Arbitrary Python

How to upload and use Containerized MLFlow with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Containerized MLFlow

How to upload and use Containerized MLFlow with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Model Registry Services

How to upload and use Registry ML Models with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models

How to upload and use Python Models as Wallaroo Pipeline Steps

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: PyTorch

How to upload and use PyTorch ML Models with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: SKLearn

How to upload and use SKLearn ML Models with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Hugging Face

How to upload and use Hugging Face ML Models with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: TensorFlow

How to upload and use TensorFlow ML Models with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: TensorFlow Keras

How to upload and use TensorFlow Keras ML Models with Wallaroo

Wallaroo SDK Essentials Guide: Model Uploads and Registrations: XGBoost

How to upload and use XGBoost ML Models with Wallaroo