Custom Model Best Practices Tutorial
This tutorial can be downloaded as part of the Wallaroo Tutorials repository.
Custom Model Best Practices Tutorial
The following tutorial demonstrates the Wallaroo Custom Model aka BYOP best practices. This is meant as a companion tutorial to the Custom Model Best Practices guide.
This tutorial demonstrates:
- Creating a Wallaroo Custom Model environment.
- Creating the Wallaroo Custom Model “byop-sample”.
- Perform a test inference the new Custom Model to verify the model works before uploading.
- Upload the Custom Model to Wallaroo and perform a sample inference and verify the results match the pre-upload inference test.
Tutorial Resources
The following resources are available from the Wallaroo GitHub repository with this tutorial:
byop-sample.zip- The sample Custom Model with all artifacts needed for this tutorial.test.py- The sample test script for verifying the Custom Model.upload.py- The sample upload script that uploads the Custom Model to Wallaroo and performs sample inferences.
Build the Custom Model
To build the custom model, the following folder structure is used for the model onboarding development and test process as detailed in Prepare the Testing Environment.
byop/(Contains your model script and artifacts that are uploaded to Wallaroo.)data/(Contains testing inputs/outputs)schemas/(Contains Arrow schema files)
For example:
├── byop # the Custom Model
│ ├── artifacts
│ │ ├── sample_file.txt
│ │ └── sample_file2.txt
│ ├── byop.py
│ ├── custom_packages
│ │ └── custom_script.py
│ └── requirements.txt
├── data # the sample data
│ ├── byop_results.csv
│ └── input_data.parquet
├── schemas # the schemas
│ ├── input_schema.pkl
│ └── output_schema.pkl
│ test.py # test script
│ upload.py # test upload script
Develop the Inference Builder Class
Create a script (e.g., custom_model.py) inside the byop folder. This will be the main entry point.
The InferenceBuilder class loads artifacts efficiently to prevent high latency during inference. To achieve this:
- Implement the
createfunction.- Inside
create, call a helper function (e.g.,load_artifacts()) to load heavy files: model weights, static lookup tables, explainers, or configuration files. - Do not load the
load_artifactsmethod in thepredictfunction. Doing so will cause them to reload on every single inference request, significantly negatively affecting performance. - Store these artifacts in a dictionary and assign them to
inference.model.
- Inside
The following is the example from the byop/byop.py file.
class BYOPInferenceBuilder(InferenceBuilder):
@property
def inference(self) -> BYOPInference:
return BYOPInference
def create(self, config: CustomInferenceConfig) -> BYOPInference:
inference = self.inference()
# when loading a model artifacts
model = self._load_models(config.model_path)
inference.model = model
return inference
def _load_models(self, model_path: Path):
return {
'dummy_model': "wallaroo",
}
Develop the Inference Class
The Inference class manages the inference process, taking in the data, processing the inference request and returning the results.
The predict(input_data:InferenceData):InferenceData is the entry point for execution and performs the following:
- Retrieving artifacts from
self.model['artifact_key']. - Accepting input data as a dictionary of numpy arrays.
The following best practices for the Inference Class are:
- Code Organization:
- Keep the
predictmethod readable. - Extract complex logic into helper functions located in a separate script (e.g.,
utils.py) within thebyopfolder.
- Keep the
- Error Handling & Logging:
- Wrap model steps in
try/exceptblocks. - Import the
tracebackmodule. - Use a logger (instantiated at the script level) to capture errors:
logger.error(traceback.format_exc()). - Timing: Log the execution time of critical functions to assist with latency debugging.
- Wrap model steps in
- Return Values:
- Return a dictionary of numpy arrays.
- Ensure data types match the expected schema.
- Data can not be nested dictionaries.
- Nested JSON’s should be converted to a string and returned.
The following is the example from the byop/byop.py file.
class BYOPInference(Inference):
@property
def expected_model_types(self) -> Set[Any]:
return {Dict}
@Inference.model.setter
def model(self, model) -> None:
self._model = model
def _predict(self, input_data: InferenceData) -> InferenceData:
logger.info("Starting prediction process")
try:
logger.info(f"Gathering of input data features: {len(input_data)}")
results = []
logger.info("Converting input data to DataFrame")
df = pd.DataFrame({
key : value.tolist() for key, value in input_data.items()
})
try:
# --- Run model prediction ---
logger.info("Running model prediction")
for index, row in df.iterrows():
input_number = row['input_number']
result = complex_algorithm(input_number)
results.append(result)
except Exception as e:
logger.error(f"Error during model prediction: {e}")
logger.error(traceback.format_exc())
raise e
logger.info("Predictions completed.")
output_dictionary = {
"result": np.array(results, dtype=np.int64),
"id": np.array(input_data["id"].tolist(), dtype=np.int64)
}
return output_dictionary
except Exception as e:
logger.error(f"Error during prediction: {e}")
logger.error(traceback.format_exc())
raise e
Local Verification
Before uploading to Wallaroo, we will run test inferences locally to verify the logic works and the inputs and outputs match the shapes we need as detailed in Local Verification.
The scripts used below are contained in the sample test.py script; the following code sections show the segments of that script and their function.
We start by importing the Python libraries used for the Custom Model.
from pathlib import Path
from mac.config.inference import CustomInferenceConfig
from byop.byop import BYOPInferenceBuilder
import pandas as pd
import pyarrow as pa
For the next step, set the sample input data as a DataFrame and as a Dictionary.
For our verification, we will execute the Custom Model’s predict code with sample data and verify the results. This data will be in two formats:
- DataFrame: The format accepted by models deployed in Wallaroo.
- Dictionary of numpys: Custom Model’s accept a dictionary of Numpy values.
Wallaroo accept either pandas DataFrames or Apache Arrow tables, then converts those into a dictionary of numpy values for fast inference results.
input_df = pd.DataFrame({"input_number": [1,2,3],
"id": [20000000004093819,20012684296980773,481562342]
})
input_dictionary = {
col: input_df[col].to_numpy() for col in input_df.columns
}
print(input_df)
print(input_dictionary)
input_number id
0 1 20000000004093819
1 2 20012684296980773
2 3 481562342
{'input_number': array([1, 2, 3]), 'id': array([20000000004093819, 20012684296980773, 481562342])}
Test the Custom Model by supplying the sample data as a dictionary of numpy values and display the results.
# prepare the BYOP and import any modules
builder = BYOPInferenceBuilder()
config = CustomInferenceConfig(
framework="custom",
model_path=Path("./byop/"), modules_to_include={"*.py"}
)
# create the BYOP object
inference = builder.create(config)
# run a simulated inference
results = inference.predict(input_data=input_dictionary)
display(results)
# convert results into a dataframe
results_df = pd.DataFrame({
key : value.tolist() for key, value in results.items()
})
INFO byop.byop - INFO: Starting prediction process byop.py:36
INFO byop.byop - INFO: Gathering of input data features: 2 byop.py:38
INFO byop.byop - INFO: Converting input data to DataFrame byop.py:41
INFO byop.byop - INFO: Running model prediction byop.py:48
INFO byop.byop - INFO: Predictions completed. byop.py:61
{'result': array([21, 22, 23]),
'id': array([20000000004093819, 20012684296980773, 481562342])}
Schema Generation
The values match the expected data types and shapes. We will use that to generate our input and output schemas used later during the model upload process.
input_schema = pa.Schema.from_pandas(input_df).remove_metadata()
output_schema = pa.Schema.from_pandas(results_df).remove_metadata()
# save to json files
print(input_schema)
print(output_schema)
input_number: int64
id: int64
result: int64
id: int64
Upload Custom Model and Test Inference
With the sample Custom Model executing without errors, it is packaged and uploaded to Wallaroo for testing.
Packaging and Deployment
Custom Models are packaged as ZIP files via the following procedure as detailed in Packaging and Deployment.
- Include the following structure:
- Main Python script (entry point). In this example
byop.py. requirements.txt: Only one allowed in the top level folder of the ZIP file that specifies Python libraries required for the Custom Model’s inference. For this example, therequirements.txtfile is empty, but the file is required even if empty.artifacts: Artifacts folder in the top level folder of the ZIP file is an optional folder. Any files in theartifactsfolder are ignored by the Wallaroo Custom Model validation process (including other Python scripts,requirements.txtfile, or other artifacts). Theartifactsfolder in the root level directory are for situations where the contents are not required for the inference process but may be needed for other Custom Model functions.- Any additional Helper scripts, modules, etc.
- Main Python script (entry point). In this example
The following represents a sample Custom Model and the artifacts, and the zip command for packaging them.
byop
├── artifacts
│ ├── sample_file.txt
│ └── sample_file2.txt
├── byop.py
├── custom_packages
│ └── custom_script.py
└── requirements.txt
The zip command brings all of these together into a single ZIP file:
zip -r byop-sample.zip byop/*
Sample Script
The file upload.py contains the scripts below and is used for easier porting to a test environment.
Import Wallaroo Libraries
The following libraries are used for interacting with Wallaroo.
# disable logging output from byop imports
import logging
logging.disable(logging.CRITICAL)
import numpy as np
import pandas as pd
import pyarrow as pa
import wallaroo
from wallaroo.deployment_config import DeploymentConfigBuilder
from wallaroo.framework import Framework
Open a Connection to Wallaroo
The next step is connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more details on logging in through Wallaroo, see the Wallaroo SDK Essentials Guide: Client Connection.
wl = wallaroo.Client()
Create Workspace
We’ll set the name of our workspace, then create the Wallaroo workspace to store our model and set it as the current workspace. Future commands will default to this workspace for pipeline creation, model uploads, etc.
workspace_name = f'sample-byop-best-practices'
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)
wl.set_current_workspace(workspace)
{'name': 'sample-byop-best-practices', 'id': 1848, 'archived': False, 'created_by': 'john.hummel@wallaroo.ai', 'created_at': '2026-01-28T21:36:07.305247+00:00', 'models': [{'name': 'sample-byop-model', 'versions': 6, 'owner_id': '""', 'last_update_time': datetime.datetime(2026, 2, 3, 19, 14, 30, 946549, tzinfo=tzutc()), 'created_at': datetime.datetime(2026, 1, 28, 21, 43, 36, 253023, tzinfo=tzutc())}], 'pipelines': [{'name': 'byop-sample-pipeline', 'create_time': datetime.datetime(2026, 1, 28, 21, 47, 51, 60242, tzinfo=tzutc()), 'definition': '[]'}]}
Upload Custom Model
Custom Models are uploaded to Wallaroo through the Wallaroo Client upload_model method. For more details, see the Model Upload guide.
The framework is Framework.CUSTOM for arbitrary Python models, and we’ll specify the input and output schemas for the upload.
model_name = "sample-byop-model"
model_file_name = "./byop-sample.zip"
model = wl.upload_model(model_name,
model_file_name,
framework=Framework.CUSTOM,
input_schema=input_schema,
output_schema=output_schema,
convert_wait=True)
model
Waiting for model loading - this will take up to 10min.
Model is pending loading to a container runtime.
Model is attempting loading to a container runtime....
Successful
Ready
| Name | sample-byop-model |
| Version | 3ca8a508-ae2e-46d1-8378-9840632f3210 |
| File Name | byop-sample.zip |
| SHA | 9e300129dd5a8d480a7f1b7dea98fac57f8819338568e471a7bd75e527c7564e |
| Status | ready |
| Error Summary | None |
| Image Path | proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2025.2.2-6527 |
| Architecture | x86 |
| Acceleration | none |
| Updated At | 2026-28-Jan 21:44:10 |
| Workspace id | 1848 |
| Workspace name | sample-byop-best-practices |
print(model)
{'name': 'sample-byop-model', 'version': '3ca8a508-ae2e-46d1-8378-9840632f3210', 'file_name': 'byop-sample.zip', 'image_path': 'proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2025.2.2-6527', 'arch': 'x86', 'accel': 'none', 'last_update_time': datetime.datetime(2026, 1, 28, 21, 44, 10, 398110, tzinfo=tzutc())}
Deploy Pipeline
The model is uploaded and ready for use. We’ll add it as a step in our pipeline, then deploy the pipeline. For this example we’re allocated 1 cpu and 1 Gi RAM to the model to the pipeline through the pipeline’s deployment configuration.
pipeline = wl.build_pipeline("byop-sample-pipeline")
pipeline.clear()
pipeline.add_model_step(model)
| name | byop-sample-pipeline |
|---|---|
| created | 2026-01-28 21:47:51.060242+00:00 |
| last_updated | 2026-02-03 19:16:24.812407+00:00 |
| deployed | False |
| workspace_id | 1848 |
| workspace_name | sample-byop-best-practices |
| arch | x86 |
| accel | none |
| tags | |
| versions | bf3877be-2066-4068-bfdb-c16493085273, 82bdc05e-0845-4d86-96ec-91c78b0c7788, d5db77b6-60aa-4b5a-b0ec-39a42dd4b6f2, 3aaa3a30-7d1b-4f5b-b953-cba8827b66d1, 1187fe3e-b6a3-47e4-a246-794cc9c65fb6, 308e5025-2979-444a-b7eb-fb702fb85095, 2a12a336-890e-4867-bd6f-cca5a03aa2f4, abd02b84-b9d1-401d-9499-8b23622c4153, c4b65d4d-eac6-4bc9-bcac-3eaf559468cd, 7efdf762-6ab1-4591-ad36-a470de7b8c67, 2922cc06-62f2-47cf-8012-2de7221a265c, 6a9a876b-d2f5-4117-8854-a7ad08e1f34e |
| steps | sample-byop-model |
| published | False |
deployment_config = DeploymentConfigBuilder() \
.cpus(0.25).memory('512Mi') \
.sidekick_cpus(model, 1) \
.sidekick_memory(model, '1Gi') \
.build()
pipeline.deploy(deployment_config=deployment_config, wait_for_status=False)
Deployment initiated for byop-sample-pipeline. Please check pipeline status.
| name | byop-sample-pipeline |
|---|---|
| created | 2026-01-28 21:47:51.060242+00:00 |
| last_updated | 2026-01-28 21:48:00.033703+00:00 |
| deployed | True |
| workspace_id | 1848 |
| workspace_name | sample-byop-best-practices |
| arch | x86 |
| accel | none |
| tags | |
| versions | 2922cc06-62f2-47cf-8012-2de7221a265c, 6a9a876b-d2f5-4117-8854-a7ad08e1f34e |
| steps | sample-byop-model |
| published | False |
pipeline.status()
{'status': 'Starting',
'details': ['Scaling'],
'engines': [{'ip': '10.4.3.78',
'name': 'engine-6bdbbc5bfb-lvf6l',
'status': 'Running',
'reason': None,
'details': ['containers with unready status: [engine]',
'containers with unready status: [engine]'],
'pipeline_statuses': None,
'model_statuses': None}],
'engine_lbs': [{'ip': '10.4.3.77',
'name': 'engine-lb-d579789c7-wkd7f',
'status': 'Running',
'reason': None,
'details': []}],
'sidekicks': [{'ip': '10.4.3.79',
'name': 'engine-sidekick-sample-byop-model-1398-86c8dc6968-fbbv4',
'status': 'Failed',
'reason': None,
'details': ['containers with unready status: [engine-sidekick-sample-byop-model-1398]',
'containers with unready status: [engine-sidekick-sample-byop-model-1398]'],
'statuses': None}]}
import time
time.sleep(15)
while pipeline.status()['status'] != 'Running':
time.sleep(15)
print("Waiting for deployment.")
display(pipeline.status()['status'])
pipeline.status()['status']
'Running'
Run inference
Everything is in place - we’ll now run a sample inference with the same data used earlier, now in a pandas DataFrame format.
print(input_df)
| input_number | id | |
|---|---|---|
| 0 | 1 | 20000000004093819 |
| 1 | 2 | 20012684296980773 |
| 2 | 3 | 481562342 |
Print the inference results and verify the values match the local inference values.
print(pipeline.infer(input_df))
| time | in.id | in.input_number | out.id | out.result | anomaly.count | |
|---|---|---|---|---|---|---|
| 0 | 2026-01-28 21:51:37.363 | 20000000004093819 | 1 | 20000000004093819 | 21 | 0 |
| 1 | 2026-01-28 21:51:37.363 | 20012684296980773 | 2 | 20012684296980773 | 22 | 0 |
| 2 | 2026-01-28 21:51:37.363 | 481562342 | 3 | 481562342 | 23 | 0 |
| result | id | |
|---|---|---|
| 0 | 21 | 20000000004093819 |
| 1 | 22 | 20012684296980773 |
| 2 | 23 | 481562342 |
Undeploy Pipelines
The inference is successful, so we will undeploy the pipeline and return the resources back to the cluster.
pipeline.undeploy()
| name | byop-sample-pipeline |
|---|---|
| created | 2026-01-28 21:47:51.060242+00:00 |
| last_updated | 2026-02-03 19:16:24.812407+00:00 |
| deployed | False |
| workspace_id | 1848 |
| workspace_name | sample-byop-best-practices |
| arch | x86 |
| accel | none |
| tags | |
| versions | bf3877be-2066-4068-bfdb-c16493085273, 82bdc05e-0845-4d86-96ec-91c78b0c7788, d5db77b6-60aa-4b5a-b0ec-39a42dd4b6f2, 3aaa3a30-7d1b-4f5b-b953-cba8827b66d1, 1187fe3e-b6a3-47e4-a246-794cc9c65fb6, 308e5025-2979-444a-b7eb-fb702fb85095, 2a12a336-890e-4867-bd6f-cca5a03aa2f4, abd02b84-b9d1-401d-9499-8b23622c4153, c4b65d4d-eac6-4bc9-bcac-3eaf559468cd, 7efdf762-6ab1-4591-ad36-a470de7b8c67, 2922cc06-62f2-47cf-8012-2de7221a265c, 6a9a876b-d2f5-4117-8854-a7ad08e1f34e |
| steps | sample-byop-model |
| published | False |