Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX
How to upload and use ONNX ML Models with Wallaroo
ML Models are uploaded to Wallaroo Ops through the wallaroo.client.upload_model
method.
wallaroo.client.upload_model
has the following parameters.
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | The framework of the model from wallaroo.framework |
input_schema | pyarrow.lib.Schema
| The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema
| The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) |
|
arch | wallaroo.engine_config.Architecture (Optional) | The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include:
|
accel | wallaroo.engine_config.Acceleration (Optional) | The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
|
Deployment configurations inherit the model’s architecture setting. This is set during model upload by specifying the arch
parameter. By default, models uploaded to Wallaroo default to the x86 architecture.
The following model operations inherit the model’s architecture setting.
The following example shows uploading a model set with the architecture set to ARM
, and how the deployment inherits that architecture without additional deployment configuration changes. For this example, an ONNX model is uploaded.
import wallaroo
housing_model_control_arm = (wl.upload_model(model_name_arm,
model_file_name,
framework=Framework.ONNX,
arch=wallaroo.engine_config.Architecture.ARM)
.configure(tensor_fields=["tensor"])
)
display(housing_model_control_arm)
Name | house-price-estimator-arm |
Version | 163ff0a9-0f1a-4229-bbf2-a19e4385f10f |
File Name | rf_model.onnx |
SHA | e22a0831aafd9917f3cc87a15ed267797f80e2afa12ad7d8810ca58f173b8cc6 |
Status | ready |
Image Path | None |
Architecture | arm |
Acceleration | None |
Updated At | 2024-04-Mar 20:34:00 |
Note that the deployment configuration settings, no architecture is specified. When pipeline_arm
is displayed, we see the arch
setting inherited the model’s arch
setting.
pipeline_arm = wl.build_pipeline(arm_pipeline_name)
# set the model step with the ARM targeted model
pipeline_arm.add_model_step(housing_model_control_arm)
#minimum deployment config for this model
deploy_config = wallaroo.DeploymentConfigBuilder().replica_count(1).cpus(1).memory("1Gi").build()
pipeline_arm.deploy(deployment_config = deploy_config)
Waiting for deployment - this will take up to 45s .......... ok
display(pipeline_arm)
name | architecture-demonstration-arm |
---|---|
created | 2024-03-04 20:34:08.895396+00:00 |
last_updated | 2024-03-04 21:52:01.894671+00:00 |
deployed | True |
arch | arm |
accel | None |
tags | |
versions | 55d834b4-92c8-4a93-b78b-6a224f17f9c1, 98821b85-401a-4ab5-af8e-1b3126727069, 74571863-9eb0-47aa-8b5a-3bdaa7aa9f03, b72fb0db-e4b4-4936-a7cb-3d0fb7827a6f, 3ae70818-10f3-4f61-a998-dee5e2f00daf |
steps | house-price-estimator-arm |
published | True |
Models deployed to Wallaroo and edge deployments include AI hardware accelerator support. The type of accelerator is set using the wallaroo.client.model_upload(accel: wallaroo.engine_config.Accelerator | None)
parameter.
Once uploaded, model deployment configurations for deployments and publishes inherit the model’s accelerator.
The following accelerators are supported.
Accelerator | ARM Support | X64/X86 Support | Intel GPU | Nvidia GPU | Description |
---|---|---|---|---|---|
None | N/A | N/A | N/A | N/A | The default acceleration, used for all scenarios and architectures. |
AIO | √ | X | X | X | AIO acceleration for Ampere Optimized trained models, only available with ARM processors. |
Jetson | √ | X | X | √ | Nvidia Jetson acceleration used with edge deployments with ARM processors. |
CUDA | √ | √ | X | √ | Nvidia Cuda acceleration supported by both ARM and X64/X86 processors. Intended for deployment with Nvidia GPUs. |
OpenVINO | X | √ | √ | X | Intel OpenVino acceleration. AI Accelerator from Intel compatible with x86/64 architectures. Aimed at edge and multi-cloud deployments either with or without Intel GPUs. |
The following model operations inherit the model’s accelerator setting.
wallaroo.client.upload_model
returns the model version. The model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.
Field | Type | Description |
---|---|---|
id | Integer | The numerical identifier of the model version. |
name | String | The name of the model. |
version | String | The model version as a unique UUID. |
file_name | String | The file name of the model as stored in Wallaroo. |
image_path | String | The image used to deploy the model in the Wallaroo engine. |
last_update_time | DateTime | When the model was last updated. |
The following examples demonstrate uploading different model types. Models uploaded to Wallaroo fall under two runtimes:
Wallaroo Native Runtimes: The following model frameworks are always deployed in the Wallaroo Native Runtime. When these model frameworks are uploaded to Wallaroo, the model name, file path, and model framework are required.
Wallaroo Containerized Runtimes: The following model frameworks may be deployed in either the Wallaroo Native Runtime, or the Wallaroo Containerized Runtime. When these models are uploaded to Wallaroo, the model name, file path, model framework, input and output schemas are required.
When uploaded, Wallaroo will attempt to convert Non-Native Runtimes to a Wallaroo Native Runtime. If it can not be converted, then it will be packed into a Wallaroo Containerized Runtime.
The following demonstrates uploading a ONNX model to a Wallaroo Ops instance. For Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX for full details on uploading ONNX models and model configurations.
ONNX models are deployed in the Wallaroo Native Runtime and require the following fields when uploaded via the wallaroo.client.Client.upload_model
method:
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | The framework of the model from wallaroo.framework |
The following example demonstrates uploading the ONNX file via the Wallaroo SDK.
model = wl.upload_model(
name = 'sample-model',
path = './models/sample_model.onnx',
framework = wallaroo.framework.Framework.ONNX
)
pipeline.add_model_step(model)
deploy_config = wallaroo.DeploymentConfigBuilder()\
.replica_count(1)\
.cpus(0.5)\
.memory("1Gi")\
.build()
pipeline.deploy(deployment_config=deploy_config)
smoke_test = pd.DataFrame.from_records([
{
"dense_input":[
1.0678324729,
0.2177810266,
-1.7115145262,
0.682285721,
1.0138553067,
-0.4335000013,
0.7395859437,
-0.2882839595,
-0.447262688,
0.5146124988,
0.3791316964,
0.5190619748,
-0.4904593222,
1.1656456469,
-0.9776307444,
-0.6322198963,
-0.6891477694,
0.1783317857,
0.1397992467,
-0.3554220649,
0.4394217877,
1.4588397512,
-0.3886829615,
0.4353492889,
1.7420053483,
-0.4434654615,
-0.1515747891,
-0.2668451725,
-1.4549617756
]
}
])
result = pipeline.infer(smoke_test)
display(result)
time | in.dense_input | out.dense_1 | anomaly.count | |
---|---|---|---|---|
0 | 2023-10-17 16:13:56.169 | [1.0678324729, 0.2177810266, -1.7115145262, 0.682285721, 1.0138553067, -0.4335000013, 0.7395859437, -0.2882839595, -0.447262688, 0.5146124988, 0.3791316964, 0.5190619748, -0.4904593222, 1.1656456469, -0.9776307444, -0.6322198963, -0.6891477694, 0.1783317857, 0.1397992467, -0.3554220649, 0.4394217877, 1.4588397512, -0.3886829615, 0.4353492889, 1.7420053483, -0.4434654615, -0.1515747891, -0.2668451725, -1.4549617756] | [0.0014974177] | 0 |
Models uploaded to Wallaroo that may require containerization before deploying in Wallaroo require the following parameters when uploaded via the Wallaroo SDK method wallaroo.client.Client.upload_model
.
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | The framework of the model from wallaroo.framework |
input_schema | pyarrow.lib.Schema (Required) | The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema (Required) | The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) |
|
The following demonstrates uploading an PyTorch model to a Wallaroo Ops instance. In this example, the ML model is converted to the Wallaroo Native Runtime.
input_schema = pa.schema(
[
pa.field('input', pa.list_(pa.float32(), list_size=10))
]
)
output_schema = pa.schema(
[
pa.field('output', pa.list_(pa.float32(), list_size=1))
]
)
model = wl.upload_model('pt-single-io-model',
"./models/model-auto-conversion_pytorch_single_io_model.pt",
framework=Framework.PYTORCH,
input_schema=input_schema,
output_schema=output_schema
)
display(model)
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Ready
model.config().runtime()
'onnx'
The following example demonstrates uploading a BYOP model. After it is uploaded, it is converted to a Wallaroo Containerized Runtime.
input_schema = pa.schema([
pa.field('images', pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=32
),
list_size=32
)),
])
output_schema = pa.schema([
pa.field('predictions', pa.int64()),
])
model = wl.upload_model('vgg16-clustering',
'./models/model-auto-conversion-BYOP-vgg16-clustering.zip',
framework=Framework.CUSTOM,
input_schema=input_schema,
output_schema=output_schema,
convert_wait=True)
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime..........................successful
Ready
model.config().runtime()
'flight'
Parameter | Description |
---|---|
Web Site | https://mlflow.org |
Supported Libraries | mlflow==1.30.0 |
For models that do not fall under the supported model frameworks, organizations can use containerized MLFlow ML Models.
This guide details how to add ML Models from a model registry service into Wallaroo.
Wallaroo supports both public and private containerized model registries. See the Wallaroo Private Containerized Model Container Registry Guide for details on how to configure a Wallaroo instance with a private model registry.
Wallaroo users can register their trained MLFlow ML Models from a containerized model container registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.
As of this time, Wallaroo only supports MLFlow 1.30.0 containerized models. For information on how to containerize an MLFlow model, see the MLFlow Documentation.
Containerized MLFlow models are not uploaded, but registered from a container registry service. This is performed through the wallaroo.client.register_model_image(options)
, and wallaroo.model_version.configure(options)
method.
The following parameters must be set for wallaroo.client.register_model_image(options)
and wallaroo.model_version.configure(options)
for a Containerized MLFlow model to be registered in Wallaroo.
Parameter | Type | Description |
---|---|---|
model_name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
image | string (Required) | The URL to the containerized MLFlow model in the MLFlow Registry.. |
Model version configurations are updated with the wallaroo.model_version.config
and include the following parameters.
Parameter | Type | Description |
---|---|---|
tensor_fields | (List[string]) (Optional) | A list of alternate input fields. For example, if the model accepts the input fields ['variable1', 'variable2'] , tensor_fields allows those inputs to be overridden to ['square_feet', 'house_age'] , or other values as required. These only apply to ONNX models. |
batch_config | (List[string]) (Optional) | Batch config is either None for multiple-input inferences, or single to accept an inference request with only one row of data. |
For model version configuration for MLFlow models, the following must be defined:
runtime
: Set as mlflow
.input_schema
: The input schema from the Apache Arrow pyarrow.lib.Schema
format.output_schema
: The output schema from the Apache Arrow pyarrow.lib.Schema
format.wallaroo.client.register_model_image(options)
returns the model version. The model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.
Field | Type | Description |
---|---|---|
id | Integer | The numerical identifier of the model version. |
name | String | The name of the model. |
version | String | The model version as a unique UUID. |
file_name | String | The file name of the model as stored in Wallaroo. |
image_path | String | The image used to deploy the model in the Wallaroo engine. |
last_update_time | DateTime | When the model was last updated. |
The following example demonstrates registering a Statsmodel model stored in a MLFLow container with a Wallaroo instance.
sm_input_schema = pa.schema([
pa.field('temp', pa.float32()),
pa.field('holiday', pa.uint8()),
pa.field('workingday', pa.uint8()),
pa.field('windspeed', pa.float32())
])
sm_output_schema = pa.schema([
pa.field('predicted_mean', pa.float32())
])
sm_model = wl.register_model_image(
name="mlflow-statmodels",
image="ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1"
).configure("mlflow",
input_schema=sm_input_schema,
output_schema=sm_output_schema
)
sm_model
Name | mlflowstatmodels |
---|---|
Version | eb1bcec8-63fe-4a82-98ea-fc4945786973 |
File Name | none |
SHA | 3afd13d9c5070679e284050cd099e84aa2e5cb7c08a788b21d6cb2397615d018 |
Status | ready |
Image Path | ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1 |
Architecture | None |
Updated At | 2024-30-Jan 16:11:55 |
When using containerized MLFlow models with Wallaroo, the inputs and outputs must be named. For example, the following output:
[-12.045839810372835]
Would need to be wrapped with the data values named:
[{"prediction": -12.045839810372835}]
A short sample code for wrapping data may be:
output_df = pd.DataFrame(prediction, columns=["prediction"])
return output_df
The model versions configuration defines how the model is used in the Wallaroo Inference Engine. Settings include:
The model version configuration is retrieved with the method wallaroo.model_version.ModelVersion.config()
.
N/A
The method wallaroo.model_version.ModelVersion.config()
returns wallaroo.model_config.ModelConfig
. The following fields are part of the model config object.
Method | Return Type | Description |
---|---|---|
id() | Integer | The id of model version the configuration is assigned to. |
to_yaml() | String | A YAML output of the model configuration options that are not None. |
tensor_fields() | List[String] | A list of tensor field names that override the default model fields. Only applies to onnx models. |
model_version() | wallaroo.model_version.ModelVersion | The model version the model configuration is assigned to. |
runtime() | String* | The model runtime as defined by wallaroo.framework.Framework |
The following examples retrieves the model runtime from a model version.
import wallaroo
# get the most recent model version
model_config = sample_model.versions()[-1].config()
print(model_config.runtime())
onnx
The method wallaroo.client.Client.generate_upload_model_api_command
generates a curl
script for uploading models to Wallaroo via the Wallaroo MLOps API. The generated curl
script is based on the Wallaroo SDK user’s current workspace. This is useful for environments that do not have the Wallaroo SDK installed, or uploading very large models (10 gigabytes or more).
The command assumes that other upload parameters are set to default. For details on uploading models via the Wallaroo MLOps API, see Wallaroo MLOps API Essentials Guide: Model Upload and Registrations.
This method takes the following parameters:
Parameter | Type | Description |
---|---|---|
base_url | String (Required) | The Wallaroo domain name. For example: wallaroo.example.com . |
name | String (Required) | The name to assign the model at upload. This must match DNS naming conventions. |
path | String (Required) | Path to the ML or LLM model file. |
framework | String (Required) | The framework from wallaroo.framework.Framework For a complete list, see Wallaroo Supported Models. |
input_schema | String (Required) | The model’s input schema in PyArrow.Schema format. |
output_schema | String (Required) | The model’s output schema in PyArrow.Schema format. |
This outputs a curl
command in the following format (indentions added for emphasis). The sections marked with {}
represent the variable names that are injected into the script from the above parameter or from the current SDK session:
{Current Workspace['id']}
: The value of the id
for the current workspace.{Bearer Token}
: The bearer token used to authentication to the Wallaroo MLOps API.curl --progress-bar -X POST \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Bearer {Bearer Token}"
-F "metadata={"name": {name}, "visibility": "private", "workspace_id": {Current Workspace['id']}, "conversion": {"arch": "x86", "accel": "none", "framework": "custom", "python_version": "3.8", "requirements": []}, \
"input_schema": "{base64 version of input_schema}", \
"output_schema": "base64 version of the output_schema"};type=application/json" \
-F "file=@{path};type=application/octet-stream" \
https://{base_url}/v1/api/models/upload_and_convert
Once generated, users can use the script to upload the model via the Wallaroo MLOps API.
The following example shows setting the parameters above and generating the model upload API command.
import wallaroo
import pyarrow as pa
# set the input and output schemas
input_schema = pa.schema([
pa.field("text", pa.string())
])
output_schema = pa.schema([
pa.field("generated_text", pa.string())
])
# use the generate model upload api command
wl.generate_upload_model_api_command(
base_url='https://example.wallaroo.ai/',
name='sample_model_name',
path='llama_byop.zip',
framework=Framework.CUSTOM,
input_schema=input_schema,
output_schema=output_schema)
The output of this command is:
curl --progress-bar -X POST -H "Content-Type: multipart/form-data" -H "Authorization: Bearer abc123" -F "metadata={"name": "sample_model_name", "visibility": "private", "workspace_id": 20, "conversion": {"arch": "x86", "accel": "none", "framework": "custom", "python_version": "3.8", "requirements": []}, "input_schema": "/////3AAAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFEAAAABwAAAAEAAAAAAAAAAQAAAB0ZXh0AAAAAAQABAAEAAAA", "output_schema": "/////3gAAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAEAAAAUAAAAEAAUAAgABgAHAAwAAAAQABAAAAAAAAEFEAAAACQAAAAEAAAAAAAAAA4AAABnZW5lcmF0ZWRfdGV4dAAABAAEAAQAAAA="};type=application/json" -F "file=@llama_byop.zip;type=application/octet-stream" https://example.wallaroo.ai/v1/api/models/upload_and_convert'
How to upload and use ONNX ML Models with Wallaroo
How to upload and use Containerized MLFlow with Wallaroo
How to upload and use Containerized MLFlow with Wallaroo
How to upload and use Registry ML Models with Wallaroo
How to upload and use Python Models as Wallaroo Pipeline Steps
How to upload and use PyTorch ML Models with Wallaroo
How to upload and use SKLearn ML Models with Wallaroo
How to upload and use Hugging Face ML Models with Wallaroo
How to upload and use TensorFlow ML Models with Wallaroo
How to upload and use TensorFlow Keras ML Models with Wallaroo
How to upload and use XGBoost ML Models with Wallaroo