Custom Model Best Practices Tutorial

How to use Wallaroo best practices to build a sample Custom Model.

This tutorial can be downloaded as part of the Wallaroo Tutorials repository.

Custom Model Best Practices Tutorial

The following tutorial demonstrates the Wallaroo Custom Model aka BYOP best practices. This is meant as a companion tutorial to the Custom Model Best Practices guide.

This tutorial demonstrates:

  • Creating a Wallaroo Custom Model environment.
  • Creating the Wallaroo Custom Model “byop-sample”.
  • Perform a test inference the new Custom Model to verify the model works before uploading.
  • Upload the Custom Model to Wallaroo and perform a sample inference and verify the results match the pre-upload inference test.

Tutorial Resources

The following resources are available from the Wallaroo GitHub repository with this tutorial:

  • byop-sample.zip - The sample Custom Model with all artifacts needed for this tutorial.
  • test.py - The sample test script for verifying the Custom Model.
  • upload.py - The sample upload script that uploads the Custom Model to Wallaroo and performs sample inferences.

Build the Custom Model

To build the custom model, the following folder structure is used for the model onboarding development and test process as detailed in Prepare the Testing Environment.

  • byop/ (Contains your model script and artifacts that are uploaded to Wallaroo.)
  • data/ (Contains testing inputs/outputs)
  • schemas/ (Contains Arrow schema files)

For example:

├── byop # the Custom Model 
│   ├── artifacts
│   │   ├── sample_file.txt
│   │   └── sample_file2.txt
│   ├── byop.py
│   ├── custom_packages
│   │   └── custom_script.py
│   └── requirements.txt
├── data # the sample data
│   ├── byop_results.csv
│   └── input_data.parquet
├── schemas # the schemas
│   ├── input_schema.pkl
│   └── output_schema.pkl
│   test.py # test script
│   upload.py # test upload script

Develop the Inference Builder Class

Create a script (e.g., custom_model.py) inside the byop folder. This will be the main entry point.

The InferenceBuilder class loads artifacts efficiently to prevent high latency during inference. To achieve this:

  1. Implement the create function.
    1. Inside create, call a helper function (e.g., load_artifacts()) to load heavy files: model weights, static lookup tables, explainers, or configuration files.
    2. Do not load the load_artifacts method in the predict function. Doing so will cause them to reload on every single inference request, significantly negatively affecting performance.
    3. Store these artifacts in a dictionary and assign them to inference.model.

The following is the example from the byop/byop.py file.

class BYOPInferenceBuilder(InferenceBuilder):
    @property
    def inference(self) -> BYOPInference:
        return BYOPInference

    def create(self, config: CustomInferenceConfig) -> BYOPInference:
        inference = self.inference()

        # when loading a model artifacts
        model = self._load_models(config.model_path)
        inference.model = model

        return inference
    
    def _load_models(self, model_path: Path):

        return {
            'dummy_model': "wallaroo",
        }

Develop the Inference Class

The Inference class manages the inference process, taking in the data, processing the inference request and returning the results.

The predict(input_data:InferenceData):InferenceData is the entry point for execution and performs the following:

  • Retrieving artifacts from self.model['artifact_key'].
  • Accepting input data as a dictionary of numpy arrays.

The following best practices for the Inference Class are:

  • Code Organization:
    • Keep the predict method readable.
    • Extract complex logic into helper functions located in a separate script (e.g., utils.py) within the byop folder.
  • Error Handling & Logging:
    • Wrap model steps in try/except blocks.
    • Import the traceback module.
    • Use a logger (instantiated at the script level) to capture errors: logger.error(traceback.format_exc()).
    • Timing: Log the execution time of critical functions to assist with latency debugging.
  • Return Values:
    • Return a dictionary of numpy arrays.
    • Ensure data types match the expected schema.
    • Data can not be nested dictionaries.
      • Nested JSON’s should be converted to a string and returned.

The following is the example from the byop/byop.py file.

class BYOPInference(Inference):
    @property
    def expected_model_types(self) -> Set[Any]:
        return {Dict}

    @Inference.model.setter
    def model(self, model) -> None:
        self._model = model

    def _predict(self, input_data: InferenceData) -> InferenceData:

        logger.info("Starting prediction process")
        try:
            logger.info(f"Gathering of input data features: {len(input_data)}")
            results = []

            logger.info("Converting input data to DataFrame")
            df = pd.DataFrame({
                key : value.tolist() for key, value in input_data.items()
                })
            
            try:
                # --- Run model prediction ---
                logger.info("Running model prediction")
                for index, row in df.iterrows():
                    input_number = row['input_number']
                    result = complex_algorithm(input_number)
                    results.append(result)

            except Exception as e:
                logger.error(f"Error during model prediction: {e}")
                logger.error(traceback.format_exc())
                raise e
                        

            logger.info("Predictions completed.")
            
            output_dictionary = { 
                "result": np.array(results, dtype=np.int64),
                "id": np.array(input_data["id"].tolist(), dtype=np.int64)
                }

            return output_dictionary

            
        
        except Exception as e:
            logger.error(f"Error during prediction: {e}")
            logger.error(traceback.format_exc())
            raise e

Local Verification

Before uploading to Wallaroo, we will run test inferences locally to verify the logic works and the inputs and outputs match the shapes we need as detailed in Local Verification.

The scripts used below are contained in the sample test.py script; the following code sections show the segments of that script and their function.

We start by importing the Python libraries used for the Custom Model.

from pathlib import Path
from mac.config.inference import CustomInferenceConfig
from byop.byop import BYOPInferenceBuilder
import pandas as pd
import pyarrow as pa

For the next step, set the sample input data as a DataFrame and as a Dictionary.

For our verification, we will execute the Custom Model’s predict code with sample data and verify the results. This data will be in two formats:

  • DataFrame: The format accepted by models deployed in Wallaroo.
  • Dictionary of numpys: Custom Model’s accept a dictionary of Numpy values.

Wallaroo accept either pandas DataFrames or Apache Arrow tables, then converts those into a dictionary of numpy values for fast inference results.

input_df = pd.DataFrame({"input_number": [1,2,3],
                             "id": [20000000004093819,20012684296980773,481562342]
                            })

input_dictionary = {
        col: input_df[col].to_numpy() for col in input_df.columns
    }

print(input_df)
print(input_dictionary)
   input_number                 id
0             1  20000000004093819
1             2  20012684296980773
2             3          481562342
{'input_number': array([1, 2, 3]), 'id': array([20000000004093819, 20012684296980773,         481562342])}

Test the Custom Model by supplying the sample data as a dictionary of numpy values and display the results.

# prepare the BYOP and import any modules
builder = BYOPInferenceBuilder()
config = CustomInferenceConfig(
    framework="custom", 
    model_path=Path("./byop/"), modules_to_include={"*.py"}
)

# create the BYOP object
inference = builder.create(config)

# run a simulated inference
results = inference.predict(input_data=input_dictionary)
display(results)

# convert results into a dataframe
results_df = pd.DataFrame({
    key : value.tolist() for key, value in results.items()
    })
INFO     byop.byop - INFO: Starting prediction process                                                   byop.py:36
INFO     byop.byop - INFO: Gathering of input data features: 2                                           byop.py:38
INFO     byop.byop - INFO: Converting input data to DataFrame                                            byop.py:41
INFO     byop.byop - INFO: Running model prediction                                                      byop.py:48
INFO     byop.byop - INFO: Predictions completed.                                                        byop.py:61
{'result': array([21, 22, 23]),
 'id': array([20000000004093819, 20012684296980773,         481562342])}

Schema Generation

The values match the expected data types and shapes. We will use that to generate our input and output schemas used later during the model upload process.

input_schema = pa.Schema.from_pandas(input_df).remove_metadata()
output_schema = pa.Schema.from_pandas(results_df).remove_metadata()
# save to json files

print(input_schema)
print(output_schema)
input_number: int64
id: int64
result: int64
id: int64

Upload Custom Model and Test Inference

With the sample Custom Model executing without errors, it is packaged and uploaded to Wallaroo for testing.

Packaging and Deployment

Custom Models are packaged as ZIP files via the following procedure as detailed in Packaging and Deployment.

  • Include the following structure:
    • Main Python script (entry point). In this example byop.py.
    • requirements.txt: Only one allowed in the top level folder of the ZIP file that specifies Python libraries required for the Custom Model’s inference. For this example, the requirements.txt file is empty, but the file is required even if empty.
    • artifacts: Artifacts folder in the top level folder of the ZIP file is an optional folder. Any files in the artifacts folder are ignored by the Wallaroo Custom Model validation process (including other Python scripts, requirements.txt file, or other artifacts). The artifacts folder in the root level directory are for situations where the contents are not required for the inference process but may be needed for other Custom Model functions.
    • Any additional Helper scripts, modules, etc.

The following represents a sample Custom Model and the artifacts, and the zip command for packaging them.

byop
├── artifacts
│   ├── sample_file.txt
│   └── sample_file2.txt
├── byop.py
├── custom_packages
│   └── custom_script.py
└── requirements.txt

The zip command brings all of these together into a single ZIP file:

zip -r byop-sample.zip byop/*

Sample Script

The file upload.py contains the scripts below and is used for easier porting to a test environment.

Import Wallaroo Libraries

The following libraries are used for interacting with Wallaroo.

# disable logging output from byop imports
import logging
logging.disable(logging.CRITICAL) 

import numpy as np
import pandas as pd
import pyarrow as pa
import wallaroo

from wallaroo.deployment_config import DeploymentConfigBuilder
from wallaroo.framework import Framework

Open a Connection to Wallaroo

The next step is connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.

This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.

If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more details on logging in through Wallaroo, see the Wallaroo SDK Essentials Guide: Client Connection.

wl = wallaroo.Client()

Create Workspace

We’ll set the name of our workspace, then create the Wallaroo workspace to store our model and set it as the current workspace. Future commands will default to this workspace for pipeline creation, model uploads, etc.

workspace_name = f'sample-byop-best-practices'
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)
wl.set_current_workspace(workspace)
{'name': 'sample-byop-best-practices', 'id': 1848, 'archived': False, 'created_by': 'john.hummel@wallaroo.ai', 'created_at': '2026-01-28T21:36:07.305247+00:00', 'models': [{'name': 'sample-byop-model', 'versions': 6, 'owner_id': '""', 'last_update_time': datetime.datetime(2026, 2, 3, 19, 14, 30, 946549, tzinfo=tzutc()), 'created_at': datetime.datetime(2026, 1, 28, 21, 43, 36, 253023, tzinfo=tzutc())}], 'pipelines': [{'name': 'byop-sample-pipeline', 'create_time': datetime.datetime(2026, 1, 28, 21, 47, 51, 60242, tzinfo=tzutc()), 'definition': '[]'}]}

Upload Custom Model

Custom Models are uploaded to Wallaroo through the Wallaroo Client upload_model method. For more details, see the Model Upload guide.

The framework is Framework.CUSTOM for arbitrary Python models, and we’ll specify the input and output schemas for the upload.

model_name = "sample-byop-model"
model_file_name = "./byop-sample.zip"

model = wl.upload_model(model_name, 
                        model_file_name, 
                        framework=Framework.CUSTOM, 
                        input_schema=input_schema, 
                        output_schema=output_schema,
                        convert_wait=True)
model
Waiting for model loading - this will take up to 10min.
Model is pending loading to a container runtime.
Model is attempting loading to a container runtime....
Successful
Ready
Namesample-byop-model
Version3ca8a508-ae2e-46d1-8378-9840632f3210
File Namebyop-sample.zip
SHA9e300129dd5a8d480a7f1b7dea98fac57f8819338568e471a7bd75e527c7564e
Statusready
Error SummaryNone
Image Pathproxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2025.2.2-6527
Architecturex86
Accelerationnone
Updated At2026-28-Jan 21:44:10
Workspace id1848
Workspace namesample-byop-best-practices
print(model)
{'name': 'sample-byop-model', 'version': '3ca8a508-ae2e-46d1-8378-9840632f3210', 'file_name': 'byop-sample.zip', 'image_path': 'proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2025.2.2-6527', 'arch': 'x86', 'accel': 'none', 'last_update_time': datetime.datetime(2026, 1, 28, 21, 44, 10, 398110, tzinfo=tzutc())}

Deploy Pipeline

The model is uploaded and ready for use. We’ll add it as a step in our pipeline, then deploy the pipeline. For this example we’re allocated 1 cpu and 1 Gi RAM to the model to the pipeline through the pipeline’s deployment configuration.

pipeline = wl.build_pipeline("byop-sample-pipeline")

pipeline.clear()
pipeline.add_model_step(model)
namebyop-sample-pipeline
created2026-01-28 21:47:51.060242+00:00
last_updated2026-02-03 19:16:24.812407+00:00
deployedFalse
workspace_id1848
workspace_namesample-byop-best-practices
archx86
accelnone
tags
versionsbf3877be-2066-4068-bfdb-c16493085273, 82bdc05e-0845-4d86-96ec-91c78b0c7788, d5db77b6-60aa-4b5a-b0ec-39a42dd4b6f2, 3aaa3a30-7d1b-4f5b-b953-cba8827b66d1, 1187fe3e-b6a3-47e4-a246-794cc9c65fb6, 308e5025-2979-444a-b7eb-fb702fb85095, 2a12a336-890e-4867-bd6f-cca5a03aa2f4, abd02b84-b9d1-401d-9499-8b23622c4153, c4b65d4d-eac6-4bc9-bcac-3eaf559468cd, 7efdf762-6ab1-4591-ad36-a470de7b8c67, 2922cc06-62f2-47cf-8012-2de7221a265c, 6a9a876b-d2f5-4117-8854-a7ad08e1f34e
stepssample-byop-model
publishedFalse
deployment_config = DeploymentConfigBuilder() \
    .cpus(0.25).memory('512Mi') \
    .sidekick_cpus(model, 1) \
    .sidekick_memory(model, '1Gi') \
    .build()

pipeline.deploy(deployment_config=deployment_config, wait_for_status=False)
Deployment initiated for byop-sample-pipeline. Please check pipeline status.
namebyop-sample-pipeline
created2026-01-28 21:47:51.060242+00:00
last_updated2026-01-28 21:48:00.033703+00:00
deployedTrue
workspace_id1848
workspace_namesample-byop-best-practices
archx86
accelnone
tags
versions2922cc06-62f2-47cf-8012-2de7221a265c, 6a9a876b-d2f5-4117-8854-a7ad08e1f34e
stepssample-byop-model
publishedFalse
pipeline.status()
{'status': 'Starting',
 'details': ['Scaling'],
 'engines': [{'ip': '10.4.3.78',
   'name': 'engine-6bdbbc5bfb-lvf6l',
   'status': 'Running',
   'reason': None,
   'details': ['containers with unready status: [engine]',
    'containers with unready status: [engine]'],
   'pipeline_statuses': None,
   'model_statuses': None}],
 'engine_lbs': [{'ip': '10.4.3.77',
   'name': 'engine-lb-d579789c7-wkd7f',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': [{'ip': '10.4.3.79',
   'name': 'engine-sidekick-sample-byop-model-1398-86c8dc6968-fbbv4',
   'status': 'Failed',
   'reason': None,
   'details': ['containers with unready status: [engine-sidekick-sample-byop-model-1398]',
    'containers with unready status: [engine-sidekick-sample-byop-model-1398]'],
   'statuses': None}]}
import time
time.sleep(15)

while pipeline.status()['status'] != 'Running':
    time.sleep(15)
    print("Waiting for deployment.")
    display(pipeline.status()['status'])
pipeline.status()['status']
'Running'

Run inference

Everything is in place - we’ll now run a sample inference with the same data used earlier, now in a pandas DataFrame format.

print(input_df)
input_numberid
0120000000004093819
1220012684296980773
23481562342

Print the inference results and verify the values match the local inference values.

print(pipeline.infer(input_df))
timein.idin.input_numberout.idout.resultanomaly.count
02026-01-28 21:51:37.36320000000004093819120000000004093819210
12026-01-28 21:51:37.36320012684296980773220012684296980773220
22026-01-28 21:51:37.3634815623423481562342230
resultid
02120000000004093819
12220012684296980773
223481562342

Undeploy Pipelines

The inference is successful, so we will undeploy the pipeline and return the resources back to the cluster.

pipeline.undeploy()
namebyop-sample-pipeline
created2026-01-28 21:47:51.060242+00:00
last_updated2026-02-03 19:16:24.812407+00:00
deployedFalse
workspace_id1848
workspace_namesample-byop-best-practices
archx86
accelnone
tags
versionsbf3877be-2066-4068-bfdb-c16493085273, 82bdc05e-0845-4d86-96ec-91c78b0c7788, d5db77b6-60aa-4b5a-b0ec-39a42dd4b6f2, 3aaa3a30-7d1b-4f5b-b953-cba8827b66d1, 1187fe3e-b6a3-47e4-a246-794cc9c65fb6, 308e5025-2979-444a-b7eb-fb702fb85095, 2a12a336-890e-4867-bd6f-cca5a03aa2f4, abd02b84-b9d1-401d-9499-8b23622c4153, c4b65d4d-eac6-4bc9-bcac-3eaf559468cd, 7efdf762-6ab1-4591-ad36-a470de7b8c67, 2922cc06-62f2-47cf-8012-2de7221a265c, 6a9a876b-d2f5-4117-8854-a7ad08e1f34e
stepssample-byop-model
publishedFalse