sklearn and XGBoost Regression Auto-Conversion Tutorial Within Wallaroo

How to use the Wallaroo convert_model method with a sklearn model.

Machine Learning (ML) models can be converted into a Wallaroo Model object and uploaded into Wallaroo workspace using the Wallaroo Client convert_model(path, source_type, conversion_arguments) method. This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.

The three input parameters are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
  • conversion_arguments: The arguments for the conversion:
    • name: The name of the model being converted.
    • comment: Any comments for the model.
    • number_of_columns: The number of columns the model was trained for.
    • input_type: The ModelConversationInputType, typically Float or Double depending on the model.

The following tutorial demonstrates how to convert a sklearn Linear Model and a XGBoost Regression Model, and upload them into a Wallaroo Workspace. The following is provided for the tutorial:

  • sklearn-linear-model.pickle: A sklearn linear model. An example of training the model is provided in the Jupyter Notebook sklearn-linear-model-example.ipynb. It has 25 columns.
  • xgb_reg.pickle: A XGBoost regression model. An example of training the model is provided in the Jupyter Notebook xgboost-regression-model-example.ipynb. It has 25 columns.

Steps

Prerequisites

Before starting, the following must be available:

  • The model to upload into a workspace.
  • The number of columns the model was trained for.

Import Libraries

Import the libraries that will be used for the auto-conversion process.

import pickle
import json

import wallaroo

from wallaroo.ModelConversion import ConvertSKLearnArguments, ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType

The following code is used to either connect to an existing workspace or to create a new one. For more details on working with workspaces, see the Wallaroo Workspace Management Guide.

def get_workspace(name):
    wl = wallaroo.Client()
    for ws in wl.list_workspaces():
        if ws.name() == name:
            return ws
    return None

def get_or_create_workspace(name):
    wl = wallaroo.Client()
    ws = get_workspace(name)
    if ws is None:
        ws = wl.create_workspace(name)
    return ws

Connect to Wallaroo

Connect to your Wallaroo instance.

wl = wallaroo.Client()

Set the Workspace

We’ll connect or create the workspace testautoconversion and use it for our model testing.

workspace_name = "testautoconversion"
_ = wl.set_current_workspace(get_or_create_workspace(workspace_name))
wl.get_current_workspace()

Set the Model Conversion Arguments

We’ll create two different configurations, one for each of our models:

  • sklearn_model_conversion_args: Used for our sklearn model.
  • xgboost_model_converstion_args: Used for our XGBoost model.
# The number of columns
NF=25

sklearn_model_conversion_args = ConvertSKLearnArguments(
    name="lm-test",
    comment="test linear regression",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Double
)
sklearn_model_conversion_type = ModelConversionSource.SKLEARN

xgboost_model_conversion_args = ConvertXGBoostArgs(
    name="xgb-test-reg",
    comment="xgboost regression model test",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
xgboost_model_conversion_type = ModelConversionSource.XGBOOST

Convert the Models

The convert_model method converts the model using the arguments, and uploads it into the current workspace - in this case, testconversion. Once complete, we can run get_current_workspace to verify that the models were uploaded.

# converts and uploads the sklearn model.
wl.convert_model('lm.pickle', sklearn_model_conversion_type, sklearn_model_conversion_args)

# converts and uploads the XGBoost model.
wl.convert_model('xgb_reg.pickle', xgboost_model_conversion_type, xgboost_model_conversion_args)
wl.get_current_workspace()