XGBoost Regression Auto-Convert Within Wallaroo

How to convert XGBoost ML Regression models and upload to Wallaroo with the Wallaroo `convert_model` method.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Introduction

The following tutorial is a brief example of how to convert a XGBoost Regression ML model with the convert_model method and upload it into your Wallaroo instance.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

  • Convert a XGBoost Regression ML model and upload it into the Wallaroo engine.
  • Run a sample inference on the converted model in a Wallaroo instance.

This tutorial provides the following:

  • xgb_reg.pickle: A pretrained XGBoost Regression model with 25 columns.
  • xgb_regression_eval.json: Test data to perform a sample inference.

Conversion Steps

Conversion Steps

To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments) method takes 3 parameters. The parameters for XGBoost conversions are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
    • keras: ModelConversionSource.KERAS
  • conversion_arguments: The arguments for the conversion based on the type of model being converted. These are:
    • wallaroo.ModelConversion.ConvertXGBoostArgs: Used for XGBoost models and takes the following parameters:
    • name: The name of the model being converted.
    • comment: Any comments for the model.
    • number_of_columns: The number of columns the model was trained for.
    • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}, where {type} is Float, Double, etc depending on the model.

Import Libraries

The first step is to import the libraries needed.

import wallaroo

from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)

Connect to Wallaroo

Connect to your Wallaroo instance and store the connection into the variable wl.

# Login through local Wallaroo instance

# wl = wallaroo.Client()

# SSO login through keycloak

# wallarooPrefix = "YOUR PREFIX"
# wallarooSuffix = "YOUR PREFIX"

wl = wallaroo.Client(api_endpoint=f"https://{wallarooPrefix}.api.{wallarooSuffix}", 
                    auth_endpoint=f"https://{wallarooPrefix}.keycloak.{wallarooSuffix}", 
                    auth_type="sso")

Arrow Support

As of the 2023.1 release, Wallaroo provides support for dataframe and Arrow for inference inputs. This tutorial allows users to adjust their experience based on whether they have enabled Arrow support in their Wallaroo instance or not.

If Arrow support has been enabled, arrowEnabled=True. If disabled or you’re not sure, set it to arrowEnabled=False

The examples below will be shown in an arrow enabled environment.

import os
# Only set the below to make the OS environment ARROW_ENABLED to TRUE.  Otherwise, leave as is.
# os.environ["ARROW_ENABLED"]="True"

if "ARROW_ENABLED" not in os.environ or os.environ["ARROW_ENABLED"].casefold() == "False".casefold():
    arrowEnabled = False
else:
    arrowEnabled = True
print(arrowEnabled)

Configuration and Methods

The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras model, and the sample data.

The functions get_workspace(name) will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name) will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.

workspace_name = 'xgboost-regression-autoconvert-workspace'
pipeline_name = 'xgboost-regression-autoconvert-pipeline'
model_name = 'xgb-regression-model'
model_file_name = 'xgb_reg.pickle'

def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

Set the Workspace and Pipeline

Set or create the workspace and pipeline based on the names configured earlier.

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name xgboost-regression-autoconvert-pipeline
created 2023-02-22 17:36:31.143795+00:00
last_updated 2023-02-23 16:29:55.996960+00:00
deployed False
tags
versions 6f6a7cf1-b328-4a67-a37d-4047b1082516, 00bb4dd6-3061-4619-9491-a3b7c307f784, 45081f0c-1991-4ce0-907c-6135ca328084, 1220c119-45e9-4ff4-bdfd-8ff2f95486d5
steps xgb-regression-model

Set the Model Autoconvert Parameters

Set the paramters for converting the xgb-class-model.

#the number of columns
NF = 25

model_conversion_args = ConvertXGBoostArgs(
    name=model_name,
    comment="xgboost regression model test",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST

Upload and Convert the Model

Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx.

# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)

Test Inference

With the model uploaded and converted, we can run a sample inference.

Deploy the Pipeline

Add the uploaded and converted model_wl as a step in the pipeline, then deploy it.

pipeline.add_model_step(model_wl).deploy()
Waiting for deployment - this will take up to 45s ............. ok
name xgboost-regression-autoconvert-pipeline
created 2023-02-22 17:36:31.143795+00:00
last_updated 2023-02-24 16:26:10.105654+00:00
deployed True
tags
versions 73345b3e-e141-497e-bcab-e6145cb8b6ec, 6f6a7cf1-b328-4a67-a37d-4047b1082516, 00bb4dd6-3061-4619-9491-a3b7c307f784, 45081f0c-1991-4ce0-907c-6135ca328084, 1220c119-45e9-4ff4-bdfd-8ff2f95486d5
steps xgb-regression-model
pipeline.status()
{'status': 'Running',
 'details': [],
 'engines': [{'ip': '10.48.1.19',
   'name': 'engine-749bb6f4fd-4qt2x',
   'status': 'Running',
   'reason': None,
   'details': [],
   'pipeline_statuses': {'pipelines': [{'id': 'xgboost-regression-autoconvert-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'xgb-regression-model',
      'version': '2f30475f-6385-4c5e-bfbb-9d47fa84f8ae',
      'sha': '7414d26cb5495269bc54bcfeebd269d7c74412cbfca07562fc7cb184c55b6f8e',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.48.1.20',
   'name': 'engine-lb-74b4969486-7jzwh',
   'status': 'Running',
   'reason': None,
   'details': []}],
 'sidekicks': []}

Run the Inference

Use the test_class_eval.json as set earlier as our sample_data and perform the inference.

if arrowEnabled is True:
    sample_data = 'xgb_regression_eval.df.json'
    result = pipeline.infer_from_file(sample_data)
    display(result)
else:
    sample_data = 'xgb_regression_eval.json'
    result = pipeline.infer_from_file(sample_data)
    result[0].data()
    
time in.tensor out.variable check_failures
0 2023-02-24 16:26:24.157 [-0.0337420814, -0.1876901281, 0.3183056488, 1.1831088244, -0.3047963287, 1.0713634828, 0.4679136198, 1.1382147115, 2.8101110944, -0.9981048796, -0.2543715265, 0.2845195171, -0.6477265924, -1.2198006181, 2.0592129832, -1.586429512, 0.1884164743, -0.3816011585, 1.0781704305, -0.2251253601, 0.6067409459, 0.9659944831, -0.690207203, -0.3849078305, -1.7806555641] [124.11397] 0
1 2023-02-24 16:26:24.157 [-0.6374335428, 0.9713713274, -0.3899847809, -1.8685333445, 0.6264452739, 1.0778638153, -1.1687273967, -1.9366353171, -0.7583260267, -0.1288186991, 2.2018769654, -0.9383105208, -0.0959982166, 0.6889112707, 1.0172067951, -0.1988865499, 1.3461760224, -0.5692275708, 0.0112450486, -1.0244657911, -0.0065034946, -0.888033574, 2.5997682335, -0.6593191496, 0.4554196997] [-238.03009] 0
2 2023-02-24 16:26:24.157 [0.9847406173, -0.6887896553, -0.9483266359, -0.6146245598, 0.395195321, 0.2237676197, -2.1580851068, -0.8124396117, 0.8795326949, 1.0463472648, -0.2343060791, 1.9127900859, -0.0636431887, 2.7055743269, 1.424242505, 0.1486958646, -0.7771892138, -0.6720552548, 0.9127712446, 0.680721406, 1.5207886874, 1.9579334337, -0.9336538468, -0.2942243461, 0.8563934417] [253.06976] 0
3 2023-02-24 16:26:24.157 [-0.0894312686, 2.0916777545, 0.155086745, 0.8335388277, 0.4376497549, -0.2875695352, -1.272466627, -0.8226918076, -0.8637972417, -0.4856051115, -0.978749107, 0.2675108269, 0.5246808262, -0.96869578, 0.8475004997, 1.0027495438, 0.4704188579, 2.6906210825, 1.34454675, -1.4987055653, 0.680752942, -2.6459314502, 0.6274277031, 1.3640818416, -0.8077878088] [141.34639] 0
4 2023-02-24 16:26:24.157 [-0.9200220805, -1.8760634694, -0.8277296049, 0.6511561005, 1.5066237509, -1.1236118386, -0.3776053288, -0.0445487434, -1.4965713379, -0.1756118518, 0.0317408338, 0.2496108303, 1.6857141605, 0.0339106658, -0.3340227553, -0.3428326984, -0.5932644698, -0.4395685475, -0.6870452688, -0.4132149028, -0.7352879532, 0.2080507404, 0.4575261189, -2.0175947284, 1.154633581] [42.79154] 0

Undeploy the Pipeline

With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.

pipeline.undeploy()
Waiting for undeployment - this will take up to 45s ..................................... ok
name xgboost-regression-autoconvert-pipeline
created 2023-02-22 17:36:31.143795+00:00
last_updated 2023-02-24 16:26:10.105654+00:00
deployed False
tags
versions 73345b3e-e141-497e-bcab-e6145cb8b6ec, 6f6a7cf1-b328-4a67-a37d-4047b1082516, 00bb4dd6-3061-4619-9491-a3b7c307f784, 45081f0c-1991-4ce0-907c-6135ca328084, 1220c119-45e9-4ff4-bdfd-8ff2f95486d5
steps xgb-regression-model