Python Model Shape Upload to Wallaroo with the Wallaroo SDK Tutorial
How to upload a Python Step Shape model to Wallaroo via the SDK
The following tutorials cover how to upload sample Python Step Shape models into a Wallaroo instance.
Python scripts are uploaded to Wallaroo and and treated like an ML Models in Pipeline steps. These will be referred to as Python steps.
Python steps can include:
In all of these, the requirements for uploading a Python script as a ML Model in Wallaroo are the same.
Parameter | Description |
---|---|
Web Site | https://www.python.org/ |
Supported Libraries | python==3.8 |
Framework | Framework.PYTHON aka python |
Python models uploaded to Wallaroo are executed Wallaroo Containerized Runtime.
Note that Python models - aka “Python steps” - are standalone python scripts that use the python libraries. These are commonly used for data formatting such as the pre and post-processing steps, and are also appropriate for simple models (such as ARIMA Statsmodels). A Wallaroo Python model can be composed of one or more Python script that matches the Wallaroo requirements.
This is contrasted with Custom Models, also known as Bring Your Own Predict (BYOP) that allow for custom model inference methods with supporting scripts and artifacts. These are used with pre-trained models (PyTorch, Tensorflow, etc) along with their supporting artifacts such as other Python modules, scripts, model files, etc.
Python scripts packaged as Python models in Wallaroo have the following requirements.
.py
Python script file with the following:Must be compatible with Python version 3.8.
Imports the mac.types.InferenceData
included with the Wallaroo SDK. For example:
from mac.types import InferenceData
Includes the following method as the entry point for Wallaroo model inferencing:
def process_data(input_data: InferenceData) -> InferenceData:
# additional code block here
Only one implementation of process_data(input_data: InferenceData) -> InferenceData
is allowed. There can be as many Python scripts included in the .zip file as needed, but only one can have this method as the entry point.
The process_data
function must return a dictionary where the keys are strings and the values are NumPy arrays. In the case of single values (scalars) these must be single-element arrays. For example:
def process_data(input_data: InferenceData) -> InferenceData:
# return a dictionary with the field output that transforms the input field `variable` to its value to the 10th power.
return {
'output' : np.rint(np.power(10, input_data["variable"]))
}
InferenceData
represents a dictionary of numpy arrays where the first dimension is always the batch size. The type annotations set in the input_schema
and output_schema
for the model when uploaded must be present and correct.
process_data
accepts and returns InferenceData
. Any other implementations will return an error.
(Optional): A requirements.txt
file that includes any additional Python libraries required by the Python script with the following requirements:
ARM
, see Automated Model Packaging.The Python script, optional requirements.txt
file, and artifacts are packaged in a .zip
file with the Python script and optional requirements.txt
file in the root folder. For example, the sample files stored in the folder preprocess-step
:
/preprocss-step
sample-script.py
requirements.txt
/artifacts
datalist.csv
The files are packaged into a .zip
file. For example, the following packages the contents of the folder preprocess_step
into preprocess_step.zip
.
zip -r preprocess_step.zip preprocess_step/*
In the example below, the Python model is used as a pre processing step for another ML model. It accepts as an input the InferenceData
submitted as part of an inference request. It then formats the data and outputs a dictionary of numpy arrays with the field tensor
. This data is then able to be passed to the next model in a pipeline step.
import datetime
import logging
import numpy as np
import pandas as pd
import wallaroo
from mac.types import InferenceData
logger = logging.getLogger(__name__)
_vars = [
"bedrooms",
"bathrooms",
"sqft_living",
"sqft_lot",
"floors",
"waterfront",
"view",
"condition",
"grade",
"sqft_above",
"sqft_basement",
"lat",
"long",
"sqft_living15",
"sqft_lot15",
"house_age",
"renovated",
"yrs_since_reno",
]
def process_data(input_data: InferenceData) -> InferenceData:
input_df = pd.DataFrame(input_data)
thisyear = datetime.datetime.now().year
input_df["house_age"] = thisyear - input_df["yr_built"]
input_df["renovated"] = np.where((input_df["yr_renovated"] > 0), 1, 0)
input_df["yrs_since_reno"] = np.where(
input_df["renovated"],
input_df["yr_renovated"] - input_df["yr_built"],
0,
)
input_df = input_df.loc[:, _vars]
return {"tensor": input_df.to_numpy(dtype=np.float32)}
In line with other Wallaroo inference results, the outputs of a Python step that returns a pandas DataFrame or Arrow Table will be listed in the out.
metadata, with all inference outputs listed as out.{variable 1}
, out.{variable 2}
, etc. For example, a postprocessing Python step that is the final model step in a pipeline with the output field output
is included in the out
dataset as the field out.output
in the Wallaroo inference result.
time | in.tensor | out.output | anomaly.count | |
---|---|---|---|---|
0 | 2023-06-20 20:23:28.395 | [0.6878518042, 0.1760734021, -0.869514083, 0.3.. | [12.886651039123535] | 0 |
Python libraries required by the included Python script are specified in the requirements.txt
file included in the .zip file. These requirements and the versions of libraries should be exactly the same between creating the model and deploying it in Wallaroo.
The requirements.txt
file specifies:
Python Libraries available through PyPi.org and the specific version. For example:
requests == 2.32.2
Python Wheels as Python model artifacts: Python Wheels as Python model artifacts are included in the .zip file and are referred to in the Python model’s requirements.txt
file based on the relative path within the .zip file.
For example, if the Python model’s .zip file includes the Python Wheel libraries/custom_wheel.whl
├── libraries
│ └── custom_wheel.whl
├── main.py
└── requirements.txt
Then the requirements.txt
file included with the Python model’s .zip file refers to this Python Wheel as:
libraries/custom_wheel.whl
External Python Wheels: Python Wheels that are available from external sources (aka - not included as Python model artifacts):
For example, to include the Python Wheel hosted at https://example.wallaroo.ai/libraries/custom_wheel.whl
, the requirements.txt
file included with the Python model’s .zip file refers to this Python Wheel as:
https://example.wallaroo.ai/libraries/custom_wheel.whl
Extra Index URL: For Python libraries that require the --extra-index-url
flag:
--extra-index-url
flag with the full URL to the extra index. This must be available from the Wallaroo instance.For example, to include the extra index URL https://download.pytorch.org/whl/cu117
for the torchvision
Python library, the requirements.txt
file included with the Python model’s .zip file refers to this Python Wheel as:
--extra-index-url https://download.pytorch.org/whl/cu117
torchvision==0.15.0
How to upload a Python Step Shape model to Wallaroo via the SDK