XGBoost Classification Auto-Convert Within Wallaroo

How to convert XGBoost ML Classification models and upload to Wallaroo.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Introduction

The following tutorial is a brief example of how to convert a XGBoost Classification ML model with the convert_model method and upload it into your Wallaroo instance.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

  • Convert a XGBoost Classification ML model and upload it into the Wallaroo engine.
  • Run a sample inference on the converted model in a Wallaroo instance.

This tutorial provides the following:

  • xgb_class.pickle: A pretrained XGBoost Classification model with 25 columns.
  • xgb_class_eval.json: Test data to perform a sample inference.

Prerequisites

Wallaroo supports the following model versions:

  • XGBoost: Version 1.6.0
  • SKLearn: 1.1.2

Conversion Steps

To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments) method takes 3 parameters. The parameters for XGBoost conversions are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
    • keras: ModelConversionSource.KERAS
  • conversion_arguments: The arguments for the conversion based on the type of model being converted. These are:
    • wallaroo.ModelConversion.ConvertXGBoostArgs: Used for XGBoost models and takes the following parameters:
    • name: The name of the model being converted.
    • comment: Any comments for the model.
    • number_of_columns: The number of columns the model was trained for.
    • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}, where {type} is Float, Double, etc depending on the model.

Import Libraries

The first step is to import the libraries needed.

import wallaroo

from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

Connect to Wallaroo

Connect to your Wallaroo instance and store the connection into the variable wl.

# Login through local Wallaroo instance

# wl = wallaroo.Client()

# SSO login through keycloak

wallarooPrefix = "YOUR PREFIX"
wallarooSuffix = "YOUR PREFIX"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                    auth_type="sso")

Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs. This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, arrowEnabled=True. If disabled or you’re not sure, set it to arrowEnabled=False

The examples below will be shown in an arrow enabled environment.

import os
# Only set the below to make the OS environment ARROW_ENABLED to TRUE.  Otherwise, leave as is.
# os.environ["ARROW_ENABLED"]="True"

if "ARROW_ENABLED" not in os.environ or os.environ["ARROW_ENABLED"].casefold() == "False".casefold():
    arrowEnabled = False
else:
    arrowEnabled = True
print(arrowEnabled)

Configuration and Methods

The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras model, and the sample data.

The functions get_workspace(name) will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name) will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.

workspace_name = 'xgboost-classification-autoconvert-workspace'
pipeline_name = 'xgboost-classification-autoconvert-pipeline'
model_name = 'xgb-class-model'
model_file_name = 'xgb_class.pickle'

def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

Set the Workspace and Pipeline

Set or create the workspace and pipeline based on the names configured earlier.

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline

Set the Model Autoconvert Parameters

Set the paramters for converting the xgb-class-model.

#the number of columns
NF = 25

model_conversion_args = ConvertXGBoostArgs(
    name=model_name,
    comment="xgboost classification model test",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST

Upload and Convert the Model

Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx.

# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)

Test Inference

With the model uploaded and converted, we can run a sample inference.

Deploy the Pipeline

Add the uploaded and converted model_wl as a step in the pipeline, then deploy it.

pipeline.add_model_step(model_wl).deploy()

Run the Inference

Use the evaluation data to verify the process completed successfully.

if arrowEnabled is True:
    sample_data = 'xgb_class_eval.df.json'
    result = pipeline.infer_from_file(sample_data)
    display(result)
else:
    sample_data = 'xgb_class_eval.json'
    result = pipeline.infer_from_file(sample_data)
    display(result[0].data())

Undeploy the Pipeline

With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.

pipeline.undeploy()