Wallaroo SDK Upload Tutorials: Python Step Shape

How to upload different Python Step Shape models to Wallaroo.

The following tutorials cover how to upload sample Python Step Shape models into a Wallaroo instance.

Python scripts are uploaded to Wallaroo and and treated like an ML Models in Pipeline steps. These will be referred to as Python steps.

Python steps can include:

  • Preprocessing steps to prepare the data received to be handed to ML Model deployed as another Pipeline step.
  • Postprocessing steps to take data output by a ML Model as part of a Pipeline step, and prepare the data to be received by some other data store or entity.
  • A model contained within a Python script.

In all of these, the requirements for uploading a Python script as a ML Model in Wallaroo are the same.

ParameterDescription
Web Sitehttps://www.python.org/
Supported Librariespython==3.8
FrameworkFramework.PYTHON aka python

Python models uploaded to Wallaroo are executed Wallaroo Containerized Runtime.

Note that Python models - aka “Python steps” - are standalone python scripts that use the python libraries. These are commonly used for data formatting such as the pre and post-processing steps, and are also appropriate for simple models (such as ARIMA Statsmodels). A Wallaroo Python model can be composed of one or more Python script that matches the Wallaroo requirements.

This is contrasted with Arbitrary Python models, also known as Bring Your Own Predict (BYOP) that allow for custom model inference methods with supporting scripts and artifacts. These are used with pre-trained models (PyTorch, Tensorflow, etc) along with their supporting artifacts such as other Python modules, scripts, model files, etc.

Python Models Requirements

Python scripts packaged as Python models in Wallaroo have the following requirements.

  • At least one .py Python script file with the following:
    • Must be compatible with Python version 3.8.

    • Imports the mac.types.InferenceData included with the Wallaroo SDK. For example:

      from mac.types import InferenceData
      
    • Includes the following method as the entry point for Wallaroo model inferencing:

      def process_data(input_data: InferenceData) -> InferenceData:
          # additional code block here
      
      • Only one implementation of process_data(input_data: InferenceData) -> InferenceData is allowed. There can be as many Python scripts included in the .zip file as needed, but only one can have this method as the entry point.

        • The process_data function must return a dictionary where the keys are strings and the values are NumPy arrays. In the case of single values (scalars) these must be single-element arrays. For example:

          def process_data(input_data: InferenceData) -> InferenceData:
           # return a dictionary with the field output that transforms the input field `variable` to its value to the 10th power.
            return {
              'output' : np.rint(np.power(10, input_data["variable"]))
            }
          
      • InferenceData represents a dictionary of numpy arrays where the first dimension is always the batch size. The type annotations set in the input_schema and output_schema for the model when uploaded must be present and correct.

      • process_data accepts and returns InferenceData. Any other implementations will return an error.

    • (Optional): A requirements.txt file that includes any additional Python libraries required by the Python script with the following requirements:

      • The Python libraries must match the targeted infrastructure. For details on uploading models to a specific infrastructures such as ARM, see Automated Model Packaging.
      • The Python libraries must be compatible with Python version python==3.8.6.

The Python script, optional requirements.txt file, and artifacts are packaged in a .zip file with the Python script and optional requirements.txt file in the root folder. For example, the sample files stored in the folder preprocess-step:

/preprocss-step
    sample-script.py
    requirements.txt
    /artifacts
        datalist.csv

The files are packaged into a .zip file. For example, the following packages the contents of the folder preprocess_step into preprocess_step.zip.

zip -r preprocess_step.zip preprocess_step/*

In the example below, the Python model is used as a pre processing step for another ML model. It accepts as an input the InferenceData submitted as part of an inference request. It then formats the data and outputs a dictionary of numpy arrays with the field tensor. This data is then able to be passed to the next model in a pipeline step.

import datetime
import logging

import numpy as np
import pandas as pd

import wallaroo

from mac.types import InferenceData

logger = logging.getLogger(__name__)

_vars = [
    "bedrooms",
    "bathrooms",
    "sqft_living",
    "sqft_lot",
    "floors",
    "waterfront",
    "view",
    "condition",
    "grade",
    "sqft_above",
    "sqft_basement",
    "lat",
    "long",
    "sqft_living15",
    "sqft_lot15",
    "house_age",
    "renovated",
    "yrs_since_reno",
]


def process_data(input_data: InferenceData) -> InferenceData:
    input_df = pd.DataFrame(input_data)
    thisyear = datetime.datetime.now().year
    input_df["house_age"] = thisyear - input_df["yr_built"]
    input_df["renovated"] = np.where((input_df["yr_renovated"] > 0), 1, 0)
    input_df["yrs_since_reno"] = np.where(
        input_df["renovated"],
        input_df["yr_renovated"] - input_df["yr_built"],
        0,
    )
    input_df = input_df.loc[:, _vars]

    return {"tensor": input_df.to_numpy(dtype=np.float32)}

In line with other Wallaroo inference results, the outputs of a Python step that returns a pandas DataFrame or Arrow Table will be listed in the out. metadata, with all inference outputs listed as out.{variable 1}, out.{variable 2}, etc. For example, a postprocessing Python step that is the final model step in a pipeline with the output field output is included in the out dataset as the field out.output in the Wallaroo inference result.

 timein.tensorout.outputanomaly.count
02023-06-20 20:23:28.395[0.6878518042, 0.1760734021, -0.869514083, 0.3..[12.886651039123535]0

Python Libraries

Python libraries required by the included Python script are specified in the requirements.txt file included in the .zip file. These requirements and the versions of libraries should be exactly the same between creating the model and deploying it in Wallaroo.

The requirements.txt file specifies:

  • Python Libraries available through PyPi.org and the specific version. For example:

    requests == 2.32.2
    
  • Python Wheels as Python model artifacts: Python Wheels as Python model artifacts are included in the .zip file and are referred to in the Python model’s requirements.txt file based on the relative path within the .zip file.

    For example, if the Python model’s .zip file includes the Python Wheel libraries/custom_wheel.whl

    ├── libraries
    │   └── custom_wheel.whl
    ├── main.py
    └── requirements.txt
    

    Then the requirements.txt file included with the Python model’s .zip file refers to this Python Wheel as:

    libraries/custom_wheel.whl
    
  • External Python Wheels: Python Wheels that are available from external sources (aka - not included as Python model artifacts):

    • Must be referred to by the full URL.
    • Must be available from the Wallaroo instance.

    For example, to include the Python Wheel hosted at https://example.wallaroo.ai/libraries/custom_wheel.whl, the requirements.txt file included with the Python model’s .zip file refers to this Python Wheel as:

    https://example.wallaroo.ai/libraries/custom_wheel.whl
    
  • Extra Index URL: For Python libraries that require the --extra-index-url flag:

    • Set the --extra-index-url flag with the full URL to the extra index. This must be available from the Wallaroo instance.
    • In the next line, specify the Python library and version.
    • Repeat the steps above for each Python library with an extra index URL.

    For example, to include the extra index URL https://download.pytorch.org/whl/cu117 for the torchvision Python library, the requirements.txt file included with the Python model’s .zip file refers to this Python Wheel as:

    --extra-index-url https://download.pytorch.org/whl/cu117
    torchvision==0.15.0
    

Python Model Shape Upload to Wallaroo with the Wallaroo SDK Tutorial

How to upload a Python Step Shape model to Wallaroo via the SDK