Custom Model Best Practices

Concepts regarding ML Model Data Schema definitions for Wallaroo.

Table of Contents

The following guide details the best practices for Wallaroo’s Custom Model, also known as Bring Your Own Model (BYOP) framework. It outlines the recommended order of operations to test, validate, and deploy a BYOP model efficiently.

It is recommended that scripts are tested locally (or on a virtual machine) to verify the Custom Model works as expected before uploading to Wallaroo. This helps find issues before uploading and deploying the model in Wallaroo.

Some of the troubleshooting checks require administrative access to the cluster hosting the Wallaroo installation and the kubectl command.

Custom Model Basics

Custom Model (BYOP) models are uploaded to Wallaroo via a ZIP file with the following components:

ArtifactTypeDescription
Python scripts aka .py files with classes that extend mac.inference.Inference and mac.inference.creation.InferenceBuilderPython Script (Required)Extend the classes mac.inference.Inference and mac.inference.creation.InferenceBuilder. These are included with the Wallaroo SDK. Further details are in Custom Model Script Requirements. Note that there is no specified naming requirements for the classes that extend mac.inference.Inference and mac.inference.creation.InferenceBuilder - any qualified class name is sufficient as long as these two classes are extended as defined below.
requirements.txtPython requirements file (Required)This sets the Python libraries used for the Custom Model. Best practice is to lock the requirements to a specific version; for example sample-library==1.2.1 instead of sample-library. These libraries should be targeted for Python 3.10 compliance. These requirements and the versions of libraries should be exactly the same between creating the model and deploying it in Wallaroo. This insures that the script and methods will function exactly the same as during the model creation process. This file is required - even if the requirements.txt file is empty, the file must be present and be the only requirements.txt file included in ZIP file.
artifacts folderFolder (Optional)The artifacts folder in the root directory of the ZIP file contains additional files, directories, etc that are not part of the inference process but may be used for other purposes. This directory is ignored by the Wallaroo validation process at model upload (including other Python scripts, training artifacts, documentation, etc). This directory is not required.
Additional Files and FoldersFiles (Optional)Other models, files, and other artifacts used in support of this model. Note that any items outside of the root folder artifacts are subject to verification based on the rules above.

For full details on Custom Models requirements and use, see Custom Model.

Test the Custom Model

Before uploading the Custom Model to Wallaroo, follow these best practices to verify the Custom Model performs as expected.

Prepare the Testing Environment

This step prepares the testing environment, where the Custom Model is verified before uploading to Wallaroo. This environment is created via the following steps.

  1. Create a Virtual Environment. For this step, use conda or venv to create a new environment with the following conditions:
    1. Python Version: Set to Python 3.10 (e.g., 3.10.16).
    2. Wallaroo SDK Install the wallaroo package via pip. Ensure the version matches your Wallaroo instance version.

For example, using conda, the command would be:

conda create -n "byop-test-environment" python=3.10

Then install Wallaroo with the command:

pip install wallaroo==2025.2.1

Prepare the Model Artifact

The BYOP framework handles most native Python code, but specific formatting is required for production stability. Use the following process to create the proper structure.

  1. Gather Packages: Create a requirements.txt file listing all necessary Python packages.
    • Note: Ensure these packages are compatible with Linux-based environments (packages with C-extensions must be Linux-compatible).
  2. Folder Structure: Create a clean working directory with the following structure:
    • byop/ (Contains your model script and artifacts)
      • requirements.txt: REQUIRED A list of required Python libraries for this Custom Model not provided by default with the Wallaroo SDK. Best practice is to lock by version; for example sample-library=1.2.4 instead of sample-library.
      • artifacts/: OPTIONAL Additional files and folders not part of the inference process but are included but used for other purposes. Any files or folders in this directory are ignored by the Custom Model validation process.
    • data/ (Contains testing inputs/outputs)
    • schemas/ (Contains Arrow schema files)
    • deployment.ipynb (Or test.py for local orchestration)

Prepare Test Data

This step highly recommends using a Python script, for example test.py that contains the Custom Model mac.inference.Inference and mac.inference.creation.InferenceBuilder extensions. For best results, define the inputs and expected outputs to ensure the “contract” is clear.

This script should include all steps for running the model locally, from loading and shaping data to performing the inference to dealing with outputs.

  • Input/Output Relationship: Ensure a strictly 1:1 relationship between input rows and output rows. Wallaroo does not support many-to-one or one-to-many inference mapping in this context.
  • Data Format:
    • Format data as a dictionary of numpy arrays.
    • Each key represents a field name; the value is the numpy array of data.
    • This replicates the InferenceData format used by the Wallaroo backend
  • Action: Verify the output data format from the test.py agrees with all Wallaroo requirements before proceeding to the next step.

Develop the Inference Builder Class

Create a script (e.g., custom_model.py) inside the byop folder. This will be the main entry point.

The InferenceBuilder class loads artifacts efficiently to prevent high latency during inference. To achieve this:

  1. Implement the create function.
    1. Inside create, call a helper function (e.g., load_artifacts()) to load heavy files: model weights, static lookup tables, explainers, or configuration files.
    2. Do not load the load_artifacts method in the predict function. Doing so will cause them to reload on every single inference request, significantly negatively affecting performance.
    3. Store these artifacts in a dictionary and assign them to inference.model.

Develop the Inference Class

The Inference class manages the inference process, taking in the data, processing the inference request and returning the results.

The predict(input_data:InferenceData):InferenceData is the entry point for execution and performs the following:

  • Retrieving artifacts from self.model['artifact_key'].
  • Accepting input data as a dictionary of numpy arrays.

The following best practices for the Inference Class are:

  • Code Organization:
    • Keep the predict method readable.
    • Extract complex logic into helper functions located in a separate script (e.g., utils.py) within the byop folder.
  • Error Handling & Logging:
    • Wrap model steps in try/except blocks.
    • Import the traceback module.
    • Use a logger (instantiated at the script level) to capture errors: logger.error(traceback.format_exc()).
    • Timing: Log the execution time of critical functions to assist with latency debugging.
  • Return Values:
    • Return a dictionary of numpy arrays.
    • Ensure data types match the expected schema.
    • Data can not be nested dictionaries.
      • Nested JSON’s should be converted to a string and returned.

Local Verification

Use the test.py script to verify the sample Custom Model script as follows:

  • Convert Results: In your test.py, convert the result dictionary to a Pandas DataFrame for easy inspection.
  • Shape Check: Verify the input row count matches the output row count exactly.
  • Value Check: validate that the inference results match expected values.

Sample Test Script

The following example shows a test script in action which performs the following:

  • Loads sample data.
  • Loads any modules used by the Custom Model.
  • Executes an inference request and displays the results and the data shape.
# import libraries
from pathlib import Path
from mac.config.inference import CustomInferenceConfig
from byop.byop import BYOPInferenceBuilder
import pandas as pd
import pyarrow as pa

# create the input dataframe and convert to dictionary for testing
input_df = pd.DataFrame({"input_number": [1,2,3],
                             "id": [20000000004093819,20012684296980773,481562342]
                            })

input_dictionary = {
        col: input_df[col].to_numpy() for col in input_df.columns
    }

print(input_dictionary)


# prepare the BYOP and import any modules
builder = BYOPInferenceBuilder()
config = CustomInferenceConfig(
    framework="custom", 
    model_path=Path("./byop/"), modules_to_include={"*.py"}
)

# create the BYOP object
inference = builder.create(config)

# run a simulated inference
results = inference.predict(input_data=input_dictionary)
print(results)

# Schema Generation

# convert results into a dataframe
results_df = pd.DataFrame({
    key : value.tolist() for key, value in results.items()
    })
input_schema = pa.Schema.from_pandas(input_df).remove_metadata()
output_schema = pa.Schema.from_pandas(results_df).remove_metadata()
print(input_schema)
print(output_schema)

The following shows the results of this script:

python test.py
{'input_number': array([1, 2, 3]), 'id': array([20000000004093819, 20012684296980773,         481562342])}
INFO     byop.byop - INFO: Starting prediction process                                                                                byop.py:36
INFO     byop.byop - INFO: Gathering of input data features: 2                                                                        byop.py:38
INFO     byop.byop - INFO: Converting input data to DataFrame                                                                         byop.py:41
INFO     byop.byop - INFO: Running model prediction                                                                                   byop.py:48
INFO     byop.byop - INFO: Predictions completed.                                                                                     byop.py:61
{'result': array([21, 22, 23]), 'id': array([20000000004093819, 20012684296980773,         481562342])}
input_number: int64
id: int64
result: int64
id: int64

Schema Generation

Once the test script verifies the Custom Model works as defined:

  • Generate the Input and Output schemas using the verified data.
  • Save these as PyArrow schemas in your schemas folder.
    • These schemas are mandatory during the Wallaroo model upload process.

For more details, see:

Packaging and Deployment

With the Custom Model testing complete, it is ready for packaging and uploading to Wallaroo.

Zipping the Artifacts

Custom Model (BYOP) models are uploaded to Wallaroo via a ZIP file with the following components:

ArtifactTypeDescription
Python scripts aka .py files with classes that extend mac.inference.Inference and mac.inference.creation.InferenceBuilderPython Script (Required)Extend the classes mac.inference.Inference and mac.inference.creation.InferenceBuilder. These are included with the Wallaroo SDK. Further details are in Custom Model Script Requirements. Note that there is no specified naming requirements for the classes that extend mac.inference.Inference and mac.inference.creation.InferenceBuilder - any qualified class name is sufficient as long as these two classes are extended as defined below.
requirements.txtPython requirements file (Required)This sets the Python libraries used for the Custom Model. Best practice is to lock the requirements to a specific version; for example sample-library==1.2.1 instead of sample-library. These libraries should be targeted for Python 3.10 compliance. These requirements and the versions of libraries should be exactly the same between creating the model and deploying it in Wallaroo. This insures that the script and methods will function exactly the same as during the model creation process. This file is required - even if the requirements.txt file is empty, the file must be present and be the only requirements.txt file included in ZIP file.
artifacts folderFolder (Optional)The artifacts folder in the root directory of the ZIP file contains additional files, directories, etc that are not part of the inference process but may be used for other purposes. This directory is ignored by the Wallaroo validation process at model upload (including other Python scripts, training artifacts, documentation, etc). This directory is not required.
Additional Files and FoldersFiles (Optional)Other models, files, and other artifacts used in support of this model. Note that any items outside of the root folder artifacts are subject to verification based on the rules above.

The following represents a sample Custom Model and the artifacts, and the zip command for packaging them.

sample_model/
   requirements.txt
   sample_model.h5
   custom_model.py
   artifacts/
      additional_file.txt
      another_file.txt
zip -r sample_model.zip sample_model/*

Once zipped, use the Model Upload guide to upload the model into Wallaroo. The following is a sample command of that procedure using the Wallaroo SDK and the model sample-model and the model file sample_model.zip, and assumes the input_schema and output_schema variables are properly defined.

# import the Wallaroo SDK
import wallaroo

# set the Wallaroo client
wl = wallaroo.Client()

# upload the model
model = wl.upload_model("sample-model", 
                    path="sample_model.zip", 
                    framework=wallaroo.framework.Framework.CUSTOM, 
                    input_schema=input_schema,
                    output_schema=output_schema)

Model Upload Troubleshooting

Use the following guides to resolve issues that may arise. Some of these require use of the kubectl terminal application and administrative access to the cluster hosting the Wallaroo installation.

Namespace Check

If deployment stalls, run kubectl get ns. Look for a namespace named task-{number} (usually the most recent one). Use the kubectl command to get the pods and show the logs. For example:

Get the pods in the namespace:

kubectl get pods -n task-sample-namespace

Get the logs of a specific pod. The option -f is to continue following the log output.

kubectl logs -f sample-pod-name -n task-sample-namespace

Deployment and Inference Testing

With the model uploaded, next is testing the actual deployment of the model. Deploying a model follows the same procedure as Model Deploy:

  • Create a pipeline and assign the model as a pipeline step.
  • Define the deployment configuration and assign how many cpus, memory, etc that will be allocated. Custom Models are assigned to the Wallaroo Containerized Runtime, and their resource allocations are processed using the sidekick methods (sidekick_cpus, sidekick_memory, etc) as defined in Deployment Configuration with the Wallaroo SDK.
  • Issue the deploy command with the deployment configuration.

The following is a simplified version of this process for a sample Custom Model previously uploaded and the model version saved to the variable model. The model is assigned 4 cpus and 2 G of memory through the sidekick_cpus and sidekick_memory options.

Note that this model is a sample - large models may require more resources, gpus, or other requirements.

# set the deployment configuration
deployment_config = DeploymentConfigBuilder() \
    .cpus(1).memory('2Gi') \
    .sidekick_cpus(model, 4) \
    .sidekick_memory(model, '2Gi') \
    .build()

# build the pipeline and assign the model as a pipeline step
pipeline = wl.build_pipeline("sample-pipeline")
pipeline.add_model_step(model)

# deploy the pipeline with the deployment configuration
pipeline.deploy(deployment_config=deployment_config)

Verify the pipeline is deployed via the pipeline.status() method. If there are no issues, the deployment status will first show Starting, and when deployment is complete the status is Running. If there are errors, the status is Error. The following is an example of a successful deployment. Note that sample-model is running under the engine_lbs and sidekicks sections.

pipeline.status()

{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.240.5.6',
   'name': 'engine-7bd8d4664d-69qfx',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'sample-pipeline',
      'status': 'Running',
      'version': 'c27736f6-0ee2-4ca0-9982-9845d2d5f756'}]},
   'model_statuses': {'models': [{'name': 'sample-model',
      'sha': 'ffa1a170b0e1628924c18a96c44f43c8afef1e535e378c2eb071a61dd282c669',
      'status': 'Running',
      'version': '4d3f402d-e242-409f-8678-29c18f59a4a8'}]}}],
 'engine_lbs': [{'ip': '10.240.5.7',
   'name': 'engine-lb-776bbf49b9-rb5mt',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.240.5.8',
   'name': 'engine-sidekick-sample-model-99-55d95d96f5-gjml9',
   'status': 'Running',
   'reason': None,
   'details': [],
   'statuses': '\n'}]}

Troubleshooting Custom Model Deployment

If an error is returned, one method to check for errors are to check the Kubernetes logs through the following procedure:

  1. Identify the namespace for the pipeline in the format {pipeline-name}-{id}. For example, the pipeline sample-pipeline has the namespace sample-pipeline-48:

    kubectl get namespaces
    NAME                           STATUS        AGE
    default                        Active        153d
    edge-pipeline-6                Terminating   22m
    kube-node-lease                Active        153d
    kube-public                    Active        153d
    kube-system                    Active        153d
    sample-pipeline-48             Active        53s
    velero                         Active        140d
    wallaroo                       Active        4d2h
    
  2. Check the pods and identify the sidekick pod - this is the pod the Custom Model is run in.

    kubectl get pods -n sample-pipeline-48
    NAME                         READY   STATUS        RESTARTS   AGE
    engine-85cb7b6bd8-bqf6v                          1/1     Running       0          17s
    engine-d4cfb84c8-fs89h                           1/1     Terminating   0          2m55s
    engine-lb-584f54c899-qj55z                       1/1     Running       0          2m55s
    helm-runner-s5t5k                                0/1     Completed     0          2m56s
    engine-sidekick-sample-model-75dc59c578-sld6m    1/1     Running       0          2m56s
    
  3. Check the sidekick pod logs via the kubectl logs command. The following example shows the command. The option -f is to continue following the log output.

    kubectl logs -f engine-sidekick-sample-model-75dc59c578-sld6m -n sample-pipeline-48
    

Additional logging options are described in The Wallaroo.AI Cheat Sheet: Retrieve Pipeline System Logs.

Inference Testing

After uploading to Wallaroo, perform sample inference tests using the same input data as the local tests and verify data returns is the same shape and format as the local test. Note that the inputs and outputs for the sample tests are likely to be a Dictionary of Numpys, while the sample test for Wallaroo inputs and outputs are in either pandas DataFrames or Apache Arrow Tables. For example:

Local Inference Test:

# run a simulated inference
results = inference.predict(input_data=input_dictionary)
print(results)

{'result': array([21, 22, 23]),
 'id': array([20000000004093819, 20012684296980773,         481562342])}

Wallaroo Inference Test

print(pipeline.infer(input_df))
timein.idin.input_numberout.idout.resultanomaly.count
02026-01-28 21:51:37.36320000000004093819120000000004093819210
12026-01-28 21:51:37.36320012684296980773220012684296980773220
22026-01-28 21:51:37.3634815623423481562342230

Inference Troubleshooting

The following tips can help resolve inference troubleshooting issues.

  1. Status Check: Pipeline status is checked via the SDK command pipeline.status() (replace pipeline with the name of the variable used in the SDK) may report Running before the sidekick is fully ready. If the first inference fails immediately, wait 60 seconds and retry.

  2. Failure Analysis:

  3. Connectivity: If the sidekick shows no activity, check the engine-lb (Load Balancer) and engine pod logs for errors. This usually indicates a data formatting issue preventing the request from reaching the sidekick. For example:

    kubectl get pods -n sample-pipeline-48
    NAME                         READY   STATUS        RESTARTS   AGE
    engine-85cb7b6bd8-bqf6v                          1/1     Running       0          17s
    engine-lb-584f54c899-qj55z                       1/1     Running       0          2m55s
    helm-runner-s5t5k                                0/1     Completed     0          2m56s
    engine-sidekick-sample-model-75dc59c578-sld6m    1/1     Running       0          2m56s
    

    Get the engine logs:

    kubectl logs engine-85cb7b6bd8-bqf6v -n sample-pipeline-48
    

    Get the engine-lb logs:

    kubectl logs engine-lb-584f54c899-qj55z -n sample-pipeline-48
    

Tutorials