Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Python Models
Model Naming Requirements
Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be lower case ASCII alpha-numeric characters or dash (-) only. .
and _
are not allowed.
Python scripts are uploaded to Wallaroo and and treated like an ML Models in Pipeline steps. These will be referred to as Python steps.
Python steps can include:
- Preprocessing steps to prepare the data received to be handed to ML Model deployed as another Pipeline step.
- Postprocessing steps to take data output by a ML Model as part of a Pipeline step, and prepare the data to be received by some other data store or entity.
- A model contained within a Python script.
In all of these, the requirements for uploading a Python step as a ML Model in Wallaroo are the same.
Parameter | Description |
---|---|
Web Site | https://www.python.org/ |
Supported Libraries | python==3.8 |
Framework | Framework.PYTHON aka python |
Python models uploaded to Wallaroo are executed Wallaroo Containerized Runtime.
Note that Python models - aka “Python steps” - are standalone python scripts that use the python libraries. These are commonly used for data formatting such as the pre and post-processing steps, and are also appropriate for simple models (such as ARIMA Statsmodels). A Wallaroo Python model can be composed of one or more Python script that matches the Wallaroo requirements.
This is contrasted with Arbitrary Python models, also known as Bring Your Own Predict (BYOP) that allow for custom model inference methods with supporting scripts and artifacts. These are used with pre-trained models (PyTorch, Tensorflow, etc) along with their supporting artifacts such as other Python modules, scripts, model files, etc.
Python Models Requirements
Python scripts packaged as Python models in Wallaroo have the following requirements.
- At least one
.py
Python script file with the following:Must be compatible with Python version 3.8.
Imports the
mac.types.InferenceData
included with the Wallaroo SDK. For example:from mac.types import InferenceData
Includes the following method as the entry point for Wallaroo model inferencing:
def process_data(input_data: InferenceData) -> InferenceData: # additional code block here
Only one implementation of
process_data(input_data: InferenceData) -> InferenceData
is allowed. There can be as many Python scripts included in the .zip file as needed, but only one can have this method as the entry point.The
process_data
function must return a dictionary where the keys are strings and the values are NumPy arrays. In the case of single values (scalars) these must be single-element arrays. For example:def process_data(input_data: InferenceData) -> InferenceData: # return a dictionary with the field output that transforms the input field `variable` to its value to the 10th power. return { 'output' : np.rint(np.power(10, input_data["variable"])) }
InferenceData
represents a dictionary of numpy arrays where the first dimension is always the batch size. The type annotations set in theinput_schema
andoutput_schema
for the model when uploaded must be present and correct.process_data
accepts and returnsInferenceData
. Any other implementations will return an error.
(Optional): A
requirements.txt
file that includes any additional Python libraries required by the Python script with the following requirements:- The Python libraries must match the targeted infrastructure. For details on uploading models to a specific infrastructures such as
ARM
, see Automated Model Packaging. - The Python libraries must be compatible with Python version python==3.8.6.
- The Python libraries must match the targeted infrastructure. For details on uploading models to a specific infrastructures such as
The Python script, optional requirements.txt
file, and artifacts are packaged in a .zip
file with the Python script and optional requirements.txt
file in the root folder. For example, the sample files stored in the folder preprocess-step
:
/preprocss-step
sample-script.py
requirements.txt
/artifacts
datalist.csv
The files are packaged into a .zip
file. For example, the following packages the contents of the folder preprocess_step
into preprocess_step.zip
.
zip -r preprocess_step.zip preprocess_step/*
In the example below, the Python model is used as a pre processing step for another ML model. It accepts as an input the InferenceData
submitted as part of an inference request. It then formats the data and outputs a dictionary of numpy arrays with the field tensor
. This data is then able to be passed to the next model in a pipeline step.
import datetime
import logging
import numpy as np
import pandas as pd
import wallaroo
from mac.types import InferenceData
logger = logging.getLogger(__name__)
_vars = [
"bedrooms",
"bathrooms",
"sqft_living",
"sqft_lot",
"floors",
"waterfront",
"view",
"condition",
"grade",
"sqft_above",
"sqft_basement",
"lat",
"long",
"sqft_living15",
"sqft_lot15",
"house_age",
"renovated",
"yrs_since_reno",
]
def process_data(input_data: InferenceData) -> InferenceData:
input_df = pd.DataFrame(input_data)
thisyear = datetime.datetime.now().year
input_df["house_age"] = thisyear - input_df["yr_built"]
input_df["renovated"] = np.where((input_df["yr_renovated"] > 0), 1, 0)
input_df["yrs_since_reno"] = np.where(
input_df["renovated"],
input_df["yr_renovated"] - input_df["yr_built"],
0,
)
input_df = input_df.loc[:, _vars]
return {"tensor": input_df.to_numpy(dtype=np.float32)}
In line with other Wallaroo inference results, the outputs of a Python step that returns a pandas DataFrame or Arrow Table will be listed in the out.
metadata, with all inference outputs listed as out.{variable 1}
, out.{variable 2}
, etc. For example, a postprocessing Python step that is the final model step in a pipeline with the output field output
is included in the out
dataset as the field out.output
in the Wallaroo inference result.
time | in.tensor | out.output | anomaly.count | |
---|---|---|---|---|
0 | 2023-06-20 20:23:28.395 | [0.6878518042, 0.1760734021, -0.869514083, 0.3.. | [12.886651039123535] | 0 |
Upload Python Models via the Wallaroo SDK
Python step models are uploaded to Wallaroo through the wallaroo.client.upload_model()
method.
Upload Python Model Parameters
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. This must be a .zip file as defined in Python Models Requirements. |
framework | string (Required) | Set as the Framework.Python . |
input_schema | pyarrow.lib.Schema (Required) | The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema (Required) | The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) (Default: True) |
|
arch | wallaroo.engine_config.Architecture | The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include: X86 (Default): x86 based architectures. ARM : ARM based architectures. |
accel | wallaroo.engine_config.Acceleration (Optional) | The AI hardware accelerator used. If a model is intended for use with a hardware accelerator, it should be assigned at this step.
|
Upload Python Model Returns
upload_model
returns a wallaroo.model_version.ModelVersion
object with the following fields.
Field | Type | Description |
---|---|---|
name | String | The name of the model. |
version | String | The model version as a unique UUID. |
file_name | String | The file name of the model as stored in Wallaroo. |
SHA | String | The hash value of the model file. |
Status | String | The status of the model. |
image_path | String | The image used to deploy the model in the Wallaroo engine. |
last_update_time | DateTime | When the model was last updated. |
Upload Python Models Example
The following example is of uploading a Python step ML Model to a Wallaroo instance.
input_schema = pa.schema([
pa.field('id', pa.int64()),
pa.field('date', pa.string()),
pa.field('list_price', pa.float64()),
pa.field('bedrooms', pa.int64()),
pa.field('bathrooms', pa.float64()),
pa.field('sqft_living', pa.int64()),
pa.field('sqft_lot', pa.int64()),
pa.field('floors', pa.float64()),
pa.field('waterfront', pa.int64()),
pa.field('view', pa.int64()),
pa.field('condition', pa.int64()),
pa.field('grade', pa.int64()),
pa.field('sqft_above', pa.int64()),
pa.field('sqft_basement', pa.int64()),
pa.field('yr_built', pa.int64()),
pa.field('yr_renovated', pa.int64()),
pa.field('zipcode', pa.int64()),
pa.field('lat', pa.float64()),
pa.field('long', pa.float64()),
pa.field('sqft_living15', pa.int64()),
pa.field('sqft_lot15', pa.int64()),
pa.field('sale_price', pa.float64())
])
output_schema = pa.schema([
pa.field('tensor', pa.list_(pa.float32(), list_size=18))
])
preprocess_model = wl.upload_model("preprocess-step", "./models/preprocess_step.zip", \
framework=wallaroo.framework.Framework.PYTHON, \
input_schema=input_schema, output_schema=output_schema)
display(preprocess_model)
Name | preprocess-step |
Version | d0cb7d27-5c83-45c6-a231-e16c2c5818b9 |
File Name | preprocess_step.zip |
SHA | c09bbca6748ff23d83f48f57446c3ad6b5758c403936157ab731b3c269c0afb9 |
Status | ready |
Image Path | None |
Architecture | x86 |
Acceleration | none |
Updated At | 2024-03-Apr 18:11:34 |