This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
Introduction
The following tutorial is a brief example of how to convert a XGBoost Classification ML model with the convert_model
method and upload it into your Wallaroo instance.
This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.
- Convert a
XGBoost
Classification ML model and upload it into the Wallaroo engine. - Run a sample inference on the converted model in a Wallaroo instance.
This tutorial provides the following:
xgb_class.pickle
: A pretrainedXGBoost
Classification model with 25 columns.xgb_class_eval.json
: Test data to perform a sample inference.
Prerequisites
Wallaroo supports the following model versions:
- XGBoost: Version 1.6.0
- SKLearn: 1.1.2
Conversion Steps
To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments)
method takes 3 parameters. The parameters for XGBoost
conversions are:
path
(STRING): The path to the ML model file.source_type
(ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associatedModelConversionSource
:- sklearn:
ModelConversionSource.SKLEARN
- xgboost:
ModelConversionSource.XGBOOST
- keras:
ModelConversionSource.KERAS
- sklearn:
conversion_arguments
: The arguments for the conversion based on the type of model being converted. These are:wallaroo.ModelConversion.ConvertXGBoostArgs
: Used forXGBoost
models and takes the following parameters:name
: The name of the model being converted.comment
: Any comments for the model.number_of_columns
: The number of columns the model was trained for.input_type
: A tensorflow Dtype called in the formatModelConversionInputType.{type}
, where{type}
isFloat
,Double
, etc depending on the model.
Import Libraries
The first step is to import the libraries needed.
import wallaroo
from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError
# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
Connect to Wallaroo
Connect to your Wallaroo instance and store the connection into the variable wl
.
# Login through local Wallaroo instance
# wl = wallaroo.Client()
# SSO login through keycloak
wallarooPrefix = "YOUR PREFIX"
wallarooSuffix = "YOUR PREFIX"
wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}",
auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}",
auth_type="sso")
Arrow Support
As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs. This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.
If Arrow support has been enabled, arrowEnabled=True
. If disabled or you’re not sure, set it to arrowEnabled=False
The examples below will be shown in an arrow enabled environment.
import os
# Only set the below to make the OS environment ARROW_ENABLED to TRUE. Otherwise, leave as is.
# os.environ["ARROW_ENABLED"]="True"
if "ARROW_ENABLED" not in os.environ or os.environ["ARROW_ENABLED"].casefold() == "False".casefold():
arrowEnabled = False
else:
arrowEnabled = True
print(arrowEnabled)
Configuration and Methods
The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras
model, and the sample data.
The functions get_workspace(name)
will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name)
will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.
workspace_name = 'xgboost-classification-autoconvert-workspace'
pipeline_name = 'xgboost-classification-autoconvert-pipeline'
model_name = 'xgb-class-model'
model_file_name = 'xgb_class.pickle'
def get_workspace(name):
workspace = None
for ws in wl.list_workspaces():
if ws.name() == name:
workspace= ws
if(workspace == None):
workspace = wl.create_workspace(name)
return workspace
def get_pipeline(name):
try:
pipeline = wl.pipelines_by_name(pipeline_name)[0]
except EntityNotFoundError:
pipeline = wl.build_pipeline(pipeline_name)
return pipeline
Set the Workspace and Pipeline
Set or create the workspace and pipeline based on the names configured earlier.
workspace = get_workspace(workspace_name)
wl.set_current_workspace(workspace)
pipeline = get_pipeline(pipeline_name)
pipeline
Set the Model Autoconvert Parameters
Set the paramters for converting the xgb-class-model
.
#the number of columns
NF = 25
model_conversion_args = ConvertXGBoostArgs(
name=model_name,
comment="xgboost classification model test",
number_of_columns=NF,
input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST
Upload and Convert the Model
Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx
.
# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)
Test Inference
With the model uploaded and converted, we can run a sample inference.
Deploy the Pipeline
Add the uploaded and converted model_wl
as a step in the pipeline, then deploy it.
pipeline.add_model_step(model_wl).deploy()
Run the Inference
Use the evaluation data to verify the process completed successfully.
if arrowEnabled is True:
sample_data = 'xgb_class_eval.df.json'
result = pipeline.infer_from_file(sample_data)
display(result)
else:
sample_data = 'xgb_class_eval.json'
result = pipeline.infer_from_file(sample_data)
display(result[0].data())
Undeploy the Pipeline
With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.
pipeline.undeploy()