Wallaroo SDK Essentials Guide: Model Uploads and Registrations: ONNX
Table of Contents
Model Naming Requirements
Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be ASCII alpha-numeric characters or dash (-) only. .
and _
are not allowed.
Wallaroo natively supports Open Neural Network Exchange (ONNX) models into the Wallaroo engine.
Parameter | Description |
---|---|
Web Site | https://onnx.ai/ |
Supported Libraries | See table below. |
Framework | Framework.ONNX aka onnx |
Runtime | Native aka onnx |
The following ONNX versions models are supported:
Wallaroo Version | ONNX Version | ONNX IR Version | ONNX OPset Version | ONNX ML Opset Version |
---|---|---|---|---|
2023.2.1 (July 2023) | 1.12.1 | 8 | 17 | 3 |
2023.2 (May 2023) | 1.12.1 | 8 | 17 | 3 |
2023.1 (March 2023) | 1.12.1 | 8 | 17 | 3 |
2022.4 (December 2022) | 1.12.1 | 8 | 17 | 3 |
After April 2022 until release 2022.4 (December 2022) | 1.10.* | 7 | 15 | 2 |
Before April 2022 | 1.6.* | 7 | 13 | 2 |
For the most recent release of Wallaroo 2023.2.1, the following native runtimes are supported:
- If converting another ML Model to ONNX (PyTorch, XGBoost, etc) using the onnxconverter-common library, the supported
DEFAULT_OPSET_NUMBER
is 17.
Using different versions or settings outside of these specifications may result in inference issues and other unexpected behavior.
ONNX models always run in the native runtime space.
Data Schemas
ONNX models deployed to Wallaroo have the following data requirements.
- Equal rows constraint: The number of input rows and output rows must match.
- All inputs are tensors: The inputs are tensor arrays with the same shape.
- Data Type Consistency: Data types within each tensor are of the same type.
Equal Rows Constraint
Inference performed through ONNX models are assumed to be in batch format, where each input row corresponds to an output row. This is reflected in the in
fields returned for an inference. In the following example, each input row for an inference is related directly to the inference output.
df = pd.read_json('./data/cc_data_1k.df.json')
display(df.head())
result = ccfraud_pipeline.infer(df.head())
display(result)
INPUT
tensor | |
---|---|
0 | [-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] |
1 | [-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] |
2 | [-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] |
3 | [-1.0603297501, 2.3544967095000002, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192000001, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526000001, 1.9870535692, 0.7005485718000001, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] |
4 | [0.5817662108, 0.09788155100000001, 0.1546819424, 0.4754101949, -0.19788623060000002, -0.45043448540000003, 0.016654044700000002, -0.0256070551, 0.0920561602, -0.2783917153, 0.059329944100000004, -0.0196585416, -0.4225083157, -0.12175388770000001, 1.5473094894000001, 0.2391622864, 0.3553974881, -0.7685165301, -0.7000849355000001, -0.1190043285, -0.3450517133, -1.1065114108, 0.2523411195, 0.0209441826, 0.2199267436, 0.2540689265, -0.0450225094, 0.10867738980000001, 0.2547179311] |
OUTPUT
time | in.tensor | out.dense_1 | check_failures | |
---|---|---|---|---|
0 | 2023-11-17 20:34:17.005 | [-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] | [0.99300325] | 0 |
1 | 2023-11-17 20:34:17.005 | [-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] | [0.99300325] | 0 |
2 | 2023-11-17 20:34:17.005 | [-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] | [0.99300325] | 0 |
3 | 2023-11-17 20:34:17.005 | [-1.0603297501, 2.3544967095, -3.5638788326, 5.1387348926, -1.2308457019, -0.7687824608, -3.5881228109, 1.8880837663, -3.2789674274, -3.9563254554, 4.0993439118, -5.6539176395, -0.8775733373, -9.131571192, -0.6093537873, -3.7480276773, -5.0309125017, -0.8748149526, 1.9870535692, 0.7005485718, 0.9204422758, -0.1041491809, 0.3229564351, -0.7418141657, 0.0384120159, 1.0993439146, 1.2603409756, -0.1466244739, -1.4463212439] | [0.99300325] | 0 |
4 | 2023-11-17 20:34:17.005 | [0.5817662108, 0.097881551, 0.1546819424, 0.4754101949, -0.1978862306, -0.4504344854, 0.0166540447, -0.0256070551, 0.0920561602, -0.2783917153, 0.0593299441, -0.0196585416, -0.4225083157, -0.1217538877, 1.5473094894, 0.2391622864, 0.3553974881, -0.7685165301, -0.7000849355, -0.1190043285, -0.3450517133, -1.1065114108, 0.2523411195, 0.0209441826, 0.2199267436, 0.2540689265, -0.0450225094, 0.1086773898, 0.2547179311] | [0.0010916889] | 0 |
All Inputs Are Tensors
All inputs into an ONNX model must be tensors. This requires that the shape of each element is the same. For example, the following is a proper input:
t [
[2.35, 5.75],
[3.72, 8.55],
[5.55, 97.2]
]
Another example is a 2,2,3 tensor, where the shape of each element is (3,), and each element has 2 rows.
t = [
[2.35, 5.75, 19.2],
[3.72, 8.55, 10.5]
],
[
[5.55, 7.2, 15.7],
[9.6, 8.2, 2.3]
]
In this example each element has a shape of (2,)
. Tensors with elements of different shapes, known as ragged tensors, are not supported. For example:
t = [
[2.35, 5.75],
[3.72, 8.55, 10.5],
[5.55, 97.2]
])
**INVALID SHAPE**
For models that require ragged tensor or other shapes, see other data formatting options such as Bring Your Own Predict models.
Data Type Consistency
All inputs into an ONNX model must have the same internal data type. For example, the following is valid because all of the data types within each element are float32
.
t = [
[2.35, 5.75],
[3.72, 8.55],
[5.55, 97.2]
]
The following is invalid, as it mixes floats and strings in each element:
t = [
[2.35, "Bob"],
[3.72, "Nancy"],
[5.55, "Wani"]
]
The following inputs are valid, as each data type is consistent within the elements.
df = pd.DataFrame({
"t": [
[2.35, 5.75, 19.2],
[5.55, 7.2, 15.7],
],
"s": [
["Bob", "Nancy", "Wani"],
["Jason", "Rita", "Phoebe"]
]
})
df
t | s | |
---|---|---|
0 | [2.35, 5.75, 19.2] | [Bob, Nancy, Wani] |
1 | [5.55, 7.2, 15.7] | [Jason, Rita, Phoebe] |
Upload ONNX Model to Wallaroo
Open Neural Network eXchange(ONNX) is the default model runtime supported by Wallaroo. ONNX models are uploaded to the current workspace through the Wallaroo Client upload_model(name, path, framework, input_schema, output_schema).configure(options)
. When uploading a default ML Model that matches the default Wallaroo runtime, the configure(options)
can be left empty or the framework onnx
specified.
Uploading ONNX Models
ONNX models are uploaded to Wallaroo through the Wallaroo Client upload_model
method.
Upload ONNX Model Parameters
The following parameters are required for ONNX models. Note that while some fields are considered as optional for the upload_model
method, they are required for proper uploading of a ONNX model to Wallaroo.
For ONNX models, the input_schema
and output_schema
are not required so are not listed here.
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | Set as the Framework.ONNX . |
input_schema | pyarrow.lib.Schema (Optional) | The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema (Optional) | The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) (Default: True) | Not required for native runtimes.
|
Once the upload process starts, the model is containerized by the Wallaroo instance. This process may take up to 10 minutes.
Upload ONNX Model Return
The following is returned with a successful model upload and conversion.
Field | Type | Description |
---|---|---|
name | string | The name of the model. |
version | string | The model version as a unique UUID. |
file_name | string | The file name of the model as stored in Wallaroo. |
SHA | string | The hash value of the model file. |
Status | string | The status of the model. Values include: |
image_path | string | The image used to deploy the model in the Wallaroo engine. |
last_update_time | DateTime | When the model was last updated. |
For example:
model_name = "embedder-o"
model_path = "./embedder.onnx"
embedder = wl.upload_model(model_name, model_path, Framework=Framework.ONNX).configure("onnx")
ONNX Conversion Tips
When converting from one ML model type to an ONNX
ML model, the input
and output
fields should be specified so users anticipate the exact field names used in their code. This prevents conversion naming formats from creating unintended names, and sets consistent field names that can be relied upon in future code updates.
The following example shows naming the input
and output
names when converting from a PyTorch model to an ONNX model. Note that the input
fields are set to data
, and the output
fields are set to output_names = ["bounding-box", "classification","confidence"]
.
input_names = ["data"]
output_names = ["bounding-box", "classification","confidence"]
torch.onnx.export(model,
tensor,
pytorchModelPath+'.onnx',
input_names=input_names,
output_names=output_names,
opset_version=17,
)
See the documentation for the specific ML model being converting from to ONNX for complete details.
Pipeline Deployment Configurations
Pipeline configurations are dependent on whether the model is converted to the Native Runtime space, or Containerized Model Runtime space.
This model will always run in the native runtime space.
Native Runtime Pipeline Deployment Configuration Example
The following configuration allocates 0.25 CPU and 1 Gi RAM to the native runtime models for a pipeline.
deployment_config = DeploymentConfigBuilder()
.cpus(0.25)
.memory('1Gi')
.build()