Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models

How to upload and use Python Models as Wallaroo Pipeline Steps

Model Naming Requirements

Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be lower case ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

Python scripts are uploaded to Wallaroo and and treated like an ML Models in Pipeline steps. These will be referred to as Python steps.

Python steps can include:

Preprocessing steps to prepare the data received to be handed to ML Model deployed as another Pipeline step.
Postprocessing steps to take data output by a ML Model as part of a Pipeline step, and prepare the data to be received by some other data store or entity.
A model contained within a Python script.

In all of these, the requirements for uploading a Python step as a ML Model in Wallaroo are the same.

Parameter	Description
Web Site	https://www.python.org/
Supported Libraries	`python==3.10`
Framework	`Framework.PYTHON` aka `python`

Python models uploaded to Wallaroo are executed Wallaroo Containerized Runtime.

Note that Python models - aka “Python steps” - are standalone python scripts that use the python libraries. These are commonly used for data formatting such as the pre and post-processing steps, and are also appropriate for simple models (such as ARIMA Statsmodels). A Wallaroo Python model can be composed of one or more Python script that matches the Wallaroo requirements.

This is contrasted with Custom Model, also known as Bring Your Own Predict (BYOP) that allow for custom model inference methods with supporting scripts and artifacts. These are used with pre-trained models (PyTorch, Tensorflow, etc) along with their supporting artifacts such as other Python modules, scripts, model files, etc.

Python Models Requirements

Python scripts packaged as Python models in Wallaroo have the following requirements.

At least one .py Python script file with the following:
- Must be compatible with Python version 3.10.
- Imports the mac.types.InferenceData included with the Wallaroo SDK. For example:
```
from mac.types import InferenceData
```
- Includes the following method as the entry point for Wallaroo model inferencing:
```
def process_data(input_data: InferenceData) -> InferenceData:
    # additional code block here
```
  - Only one implementation of process_data(input_data: InferenceData) -> InferenceData is allowed. There can be as many Python scripts included in the .zip file as needed, but only one can have this method as the entry point.
    - The process_data function must return a dictionary where the keys are strings and the values are NumPy arrays. In the case of single values (scalars) these must be single-element arrays. For example:
      def process_data(input_data: InferenceData) -> InferenceData: # return a dictionary with the field output that transforms the input field `variable` to its value to the 10th power. return { 'output' : np.rint(np.power(10, input_data["variable"])) }
  - InferenceData represents a dictionary of numpy arrays where the first dimension is always the batch size. The type annotations set in the input_schema and output_schema for the model when uploaded must be present and correct.
  - process_data accepts and returns InferenceData. Any other implementations will return an error.
- (Optional): A requirements.txt file that includes any additional Python libraries required by the Python script with the following requirements:
  - The Python libraries must match the targeted infrastructure. For details on uploading models to a specific infrastructures such as ARM, see Automated Model Packaging.
  - The Python libraries must be compatible with Python version python==3.10.11.

The Python script, optional requirements.txt file, and artifacts are packaged in a .zip file with the Python script and optional requirements.txt file in the root folder. For example, the sample files stored in the folder preprocess-step:

/preprocss-step
    sample-script.py
    requirements.txt
    /artifacts
        datalist.csv

The files are packaged into a .zip file. For example, the following packages the contents of the folder preprocess_step into preprocess_step.zip.

zip -r preprocess_step.zip preprocess_step/*

In the example below, the Python model is used as a pre processing step for another ML model. It accepts as an input the InferenceData submitted as part of an inference request. It then formats the data and outputs a dictionary of numpy arrays with the field tensor. This data is then able to be passed to the next model in a pipeline step.

import datetime
import logging

import numpy as np
import pandas as pd

import wallaroo

from mac.types import InferenceData

logger = logging.getLogger(__name__)

_vars = [
    "bedrooms",
    "bathrooms",
    "sqft_living",
    "sqft_lot",
    "floors",
    "waterfront",
    "view",
    "condition",
    "grade",
    "sqft_above",
    "sqft_basement",
    "lat",
    "long",
    "sqft_living15",
    "sqft_lot15",
    "house_age",
    "renovated",
    "yrs_since_reno",
]


def process_data(input_data: InferenceData) -> InferenceData:
    input_df = pd.DataFrame(input_data)
    thisyear = datetime.datetime.now().year
    input_df["house_age"] = thisyear - input_df["yr_built"]
    input_df["renovated"] = np.where((input_df["yr_renovated"] > 0), 1, 0)
    input_df["yrs_since_reno"] = np.where(
        input_df["renovated"],
        input_df["yr_renovated"] - input_df["yr_built"],
        0,
    )
    input_df = input_df.loc[:, _vars]

    return {"tensor": input_df.to_numpy(dtype=np.float32)}

In line with other Wallaroo inference results, the outputs of a Python step that returns a pandas DataFrame or Arrow Table will be listed in the out. metadata, with all inference outputs listed as out.{variable 1}, out.{variable 2}, etc. For example, a postprocessing Python step that is the final model step in a pipeline with the output field output is included in the out dataset as the field out.output in the Wallaroo inference result.

	time	in.tensor	out.output	anomaly.count
0	2023-06-20 20:23:28.395	[0.6878518042, 0.1760734021, -0.869514083, 0.3..	[12.886651039123535]	0

Python Libraries

Python libraries required by the included Python script are specified in the requirements.txt file included in the .zip file. These requirements and the versions of libraries should be exactly the same between creating the model and deploying it in Wallaroo.

The requirements.txt file specifies:

Python Libraries available through PyPi.org and the specific version. For example:
```
requests == 2.32.2
```
Python Wheels as Python model artifacts: Python Wheels as Python model artifacts are included in the .zip file and are referred to in the Python model’s requirements.txt file based on the relative path within the .zip file.
For example, if the Python model’s .zip file includes the Python Wheel libraries/custom_wheel.whl
```
├── libraries
│   └── custom_wheel.whl
├── main.py
└── requirements.txt
```
Then the requirements.txt file included with the Python model’s .zip file refers to this Python Wheel as:
```
libraries/custom_wheel.whl
```
External Python Wheels: Python Wheels that are available from external sources (aka - not included as Python model artifacts):
- Must be referred to by the full URL.
- Must be available from the Wallaroo instance.
For example, to include the Python Wheel hosted at https://example.wallaroo.ai/libraries/custom_wheel.whl, the requirements.txt file included with the Python model’s .zip file refers to this Python Wheel as:
```
https://example.wallaroo.ai/libraries/custom_wheel.whl
```
Extra Index URL: For Python libraries that require the --extra-index-url flag:
- Set the --extra-index-url flag with the full URL to the extra index. This must be available from the Wallaroo instance.
- In the next line, specify the Python library and version.
- Repeat the steps above for each Python library with an extra index URL.
For example, to include the extra index URL https://download.pytorch.org/whl/cu117 for the torchvision Python library, the requirements.txt file included with the Python model’s .zip file refers to this Python Wheel as:
```
--extra-index-url https://download.pytorch.org/whl/cu117
torchvision==0.15.0
```

Upload Python Models via the Wallaroo SDK

Python step models are uploaded to Wallaroo through the wallaroo.client.upload_model() method.

Upload Python Model Parameters

Parameter	Type	Description
`name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`path`	`string` (Required)	The path to the model file being uploaded. This must be a `.zip` file as defined in Python Models Requirements.
`framework`	`string` (Required)	Set as the `Framework.Python`.
`input_schema`	`pyarrow.lib.Schema` (Required)	The input schema in Apache Arrow schema format.
`output_schema`	`pyarrow.lib.Schema` (Required)	The output schema in Apache Arrow schema format.
`convert_wait`	`bool` (Optional) (Default: True)	True: Waits in the script for the model conversion completion. False: Proceeds with the script without waiting for the model conversion process to display complete.

Upload Python Model Returns

upload_model returns a wallaroo.model_version.ModelVersion object with the following fields.

Field	Type	Description
`name`	String	The name of the model.
`version`	String	The model version as a unique UUID.
`file_name`	String	The file name of the model as stored in Wallaroo.
`SHA`	String	The hash value of the model file.
`Status`	String	The status of the model.
`image_path`	String	The image used to deploy the model in the Wallaroo engine.
`last_update_time`	DateTime	When the model was last updated.

Upload Python Models Example

The following example is of uploading a Python step ML Model to a Wallaroo instance.

input_schema = pa.schema([
    pa.field('id', pa.int64()),
    pa.field('date', pa.string()),
    pa.field('list_price', pa.float64()),
    pa.field('bedrooms', pa.int64()),
    pa.field('bathrooms', pa.float64()),
    pa.field('sqft_living', pa.int64()),
    pa.field('sqft_lot', pa.int64()),
    pa.field('floors', pa.float64()),
    pa.field('waterfront', pa.int64()),
    pa.field('view', pa.int64()),
    pa.field('condition', pa.int64()),
    pa.field('grade', pa.int64()),
    pa.field('sqft_above', pa.int64()),
    pa.field('sqft_basement', pa.int64()),
    pa.field('yr_built', pa.int64()),
    pa.field('yr_renovated', pa.int64()),
    pa.field('zipcode', pa.int64()),
    pa.field('lat', pa.float64()),
    pa.field('long', pa.float64()),
    pa.field('sqft_living15', pa.int64()),
    pa.field('sqft_lot15', pa.int64()),
    pa.field('sale_price', pa.float64())
])

output_schema = pa.schema([
    pa.field('tensor', pa.list_(pa.float32(), list_size=18))
])

preprocess_model = wl.upload_model("preprocess-step", "./models/preprocess_step.zip", \
                                   framework=wallaroo.framework.Framework.PYTHON, \
                                   input_schema=input_schema, output_schema=output_schema)
display(preprocess_model)


Name	preprocess-step
Version	d0cb7d27-5c83-45c6-a231-e16c2c5818b9
File Name	preprocess_step.zip
SHA	c09bbca6748ff23d83f48f57446c3ad6b5758c403936157ab731b3c269c0afb9
Status	ready
Image Path	None
Architecture	x86
Acceleration	none
Updated At	2024-03-Apr 18:11:34