This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Model Conversion Tutorials

How to convert ML models into a Wallaroo compatible format.

Wallaroo pipelines support the ONNX standard and models converted using the Model autoconversion method.

These sample guides and their machine language models can be downloaded from the Wallaroo Tutorials Repository.

1 - Keras Convert and Upload Within Wallaroo

How to convert Keras ML models and upload them to Wallaroo using the Wallaroo auto-conversion method.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Introduction

Machine Learning (ML) models can be converted into a Wallaroo Model object and uploaded into Wallaroo workspace using the Wallaroo Client convert_model(path, source_type, conversion_arguments) method. This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.

The following tutorial is a brief example of how to convert a Keras or Tensor ML model to ONNX. This allows organizations that have trained Keras or Tensor models to convert them and use them with Wallaroo.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

This tutorial demonstrates how to:

  • Convert a keras ML model and upload it into the Wallaroo engine.
  • Run a sample inference on the converted model in a Wallaroo instance.

This tutorial provides the following:

  • simple_sentiment_model.zip: A pre-trained keras sentiment model to be converted. This has 100 columns.

Conversion Steps

To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments) method takes 3 parameters. The paramters for keras conversions are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
    • keras: ModelConversionSource.KERAS
  • conversion_arguments: The arguments for the conversion based on the type of model being converted. These are:
    • wallaroo.ModelConversion.ConvertKerasArguments: Used for converting keras type models and takes the following parameters:
      • name: The name of the model being converted.
      • comment: Any comments for the model.
      • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}, where {type} is Float, Double, etc depending on the model.
      • dimensions: Corresponds to the keras xtrain in the format List[Union[None, int, float]].

Import Libraries

The first step is to import the libraries needed.

import wallaroo

from wallaroo.ModelConversion import ConvertKerasArguments, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

Configuration and Methods

The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras model, and the sample data.

The functions get_workspace(name) will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name) will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.

workspace_name = 'keras-autoconvert-workspace'
pipeline_name = 'keras-autoconvert-pipeline'
model_name = 'simple-sentiment-model'
model_file_name = 'simple_sentiment_model.zip'
sample_data = 'simple_sentiment_testdata.json'

def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

Connect to Wallaroo

Connect to your Wallaroo instance and store the connection into the variable wl.

wl = wallaroo.Client()

Set the Workspace and Pipeline

Set or create the workspace and pipeline based on the names configured earlier.

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name keras-autoconvert-pipeline
created 2022-07-07 16:27:57.437207+00:00
last_updated 2022-07-07 16:28:42.403022+00:00
deployed False
tags
steps simple-sentiment-model

Set the Model Autoconvert Parameters

Set the paramters for converting the simple-sentiment-model. This includes the shape of the model.

model_columns = 100

model_conversion_args = ConvertKerasArguments(
    name=model_name,
    comment="simple keras model",
    input_type=ModelConversionInputType.Float32,
    dimensions=(None, model_columns)
)
model_conversion_type = ModelConversionSource.KERAS

Upload and Convert the Model

Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx.

converted model

# converts and uploads model.
model_wl = wl.convert_model('simple_sentiment_model.zip', model_conversion_type, model_conversion_args)
model_wl
{'name': 'simple-sentiment-model', 'version': 'c76870f8-e16b-4534-bb17-e18a3e3806d5', 'file_name': '14d9ab8d-47f4-4557-82a7-6b26cb67ab05-converted.onnx', 'last_update_time': datetime.datetime(2022, 7, 7, 16, 41, 22, 528430, tzinfo=tzutc())}

Test Inference

With the model uploaded and converted, we can run a sample inference.

Add Pipeline Step and Deploy

We will add the model as a step into our pipeline, then deploy it.

pipeline.add_model_step(model_wl).deploy()
Waiting for deployment - this will take up to 45s .... ok
name keras-autoconvert-pipeline
created 2022-07-07 16:27:57.437207+00:00
last_updated 2022-07-07 16:41:23.615423+00:00
deployed True
tags
steps simple-sentiment-model

Run a Test Inference

We can run a test inference from the simple_sentiment_testdata.json file, then display just the results.

sample_data = 'simple_sentiment_testdata.json'
result = pipeline.infer_from_file(sample_data)
result[0].data()
Waiting for inference response - this will take up to 45s .... ok


[array([[0.09469762],
        [0.99103099],
        [0.93407357],
        [0.56030995],
        [0.9964503 ]])]

Undeploy the Pipeline

With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.

pipeline.undeploy()
Waiting for undeployment - this will take up to 45s ................................... ok
name keras-autoconvert-pipeline
created 2022-07-07 16:27:57.437207+00:00
last_updated 2022-07-07 16:57:06.402657+00:00
deployed False
tags
steps simple-sentiment-model

2 - PyTorch to ONNX Outside Wallaroo

How to convert PyTorch ML models into the ONNX format.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

How to Convert PyTorch to ONNX

The following tutorial is a brief example of how to convert a PyTorth (aka sk-learn) ML model to ONNX. This allows organizations that have trained sk-learn models to convert them and use them with Wallaroo.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service. This sample code is based on the guide Convert your PyTorch model to ONNX.

This tutorial provides the following:

  • pytorchbikeshare.pt: a RandomForestRegressor PyTorch model. This model has a total of 58 inputs, and uses the class BikeShareRegressor.

Conversion Process

Libraries

The first step is to import our libraries we will be using. For this example, the PyTorth torch library will be imported into this kernel.

# the Pytorch libraries
# Import into this kernel

import sys
!{sys.executable} -m pip install torch

import torch
import torch.onnx 
Collecting torch
  Downloading torch-1.12.0-cp38-cp38-manylinux1_x86_64.whl (776.3 MB)
     |████████████████████████████████| 776.3 MB 4.7 kB/s  eta 0:00:01     |██████                          | 146.9 MB 89.9 MB/s eta 0:00:07     |██████████████████████▍         | 544.4 MB 87.7 MB/s eta 0:00:03
[?25hRequirement already satisfied: typing-extensions in /opt/conda/lib/python3.8/site-packages (from torch) (3.7.4.3)
Installing collected packages: torch
Successfully installed torch-1.12.0

Load the Model

To load a PyTorch model into a variable, the model’s class has to be defined. For out example we are using the BikeShareRegressor class as defined below.

class BikeShareRegressor(torch.nn.Module):
    def __init__(self):
        super(BikeShareRegressor, self).__init__()

        
        self.net = nn.Sequential(nn.Linear(input_size, l1),
                                 torch.nn.ReLU(),
                                 torch.nn.Dropout(p=dropout),
                                 nn.BatchNorm1d(l1),
                                 nn.Linear(l1, l2),
                                 torch.nn.ReLU(),
                                 torch.nn.Dropout(p=dropout),                                
                                 nn.BatchNorm1d(l2),                                                                                                   
                                 nn.Linear(l2, output_size))

    def forward(self, x):
        return self.net(x)

Now we will load the model into the variable pytorch_tobe_converted.

# load the Pytorch model
model = torch.load("./pytorch_bikesharingmodel.pt")

Convert_ONNX Inputs

Now we will define our method Convert_ONNX() which has the following inputs:

  • PyTorchModel: the PyTorch we are converting.

  • modelInputs: the model input or tuple for multiple inputs.

  • onnxPath: The location to save the onnx file.

  • opset_version: The ONNX version to export to.

  • input_names: Array of the model’s input names.

  • output_names: Array of the model’s output names.

  • dynamic_axes: Sets variable length axes in the format, replacing the batch_size as necessary: {'modelInput' : { 0 : 'batch_size'}, 'modelOutput' : {0 : 'batch_size'}}

  • export_params: Whether to store the trained parameter weight inside the model file. Defaults to True.

  • do_constant_folding: Sets whether to execute constant folding for optimization. Defaults to True.

#Function to Convert to ONNX 
def Convert_ONNX(): 

    # set the model to inference mode 
    model.eval() 

    # Export the model   
    torch.onnx.export(model,         # model being run 
         dummy_input,       # model input (or a tuple for multiple inputs) 
         pypath,       # where to save the model  
         export_params=True,  # store the trained parameter weights inside the model file 
         opset_version=15,    # the ONNX version to export the model to 
         do_constant_folding=True,  # whether to execute constant folding for optimization 
         input_names = ['modelInput'],   # the model's input names 
         output_names = ['modelOutput'], # the model's output names 
         dynamic_axes = {'modelInput' : {0 : 'batch_size'}, 'modelOutput' : {0 : 'batch_size'}} # variable length axes 
    ) 
    print(" ") 
    print('Model has been converted to ONNX') 

Convert the Model

We’ll now set our variables and run our conversion. For out example, the input_size is known to be 58, and the device value we’ll derive from torch.cuda. We’ll also set the ONNX version for exporting to 10.

pypath = "pytorchbikeshare.onnx"

input_size = 58

if torch.cuda.is_available():
    device = 'cuda'
else:
    device = 'cpu'

onnx_opset_version = 15

# Set up some dummy input tensor for the model
dummy_input = torch.randn(1, input_size, requires_grad=True).to(device)

Convert_ONNX()
Model has been converted to ONNX

Conclusion

And now our conversion is complete. Please feel free to use this sample code in your own projects.

3 - sklearn and XGBoost Regression Auto-Conversion Tutorial Within Wallaroo

How to use the Wallaroo convert_model method with a sklearn model.

Auto-Conversion And Upload Tutorial

Machine Learning (ML) models can be converted into a Wallaroo and uploaded into Wallaroo workspace using the Wallaroo Client convert_model(path, source_type, conversion_arguments) method. This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.

The three input parameters are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
  • conversion_arguments: The arguments for the conversion:
    • name: The name of the model being converted.
    • comment: Any comments for the model.
    • number_of_columns: The number of columns the model was trained for.
    • input_type: The ModelConversationInputType, typically Float or Double depending on the model.

The following tutorial demonstrates how to convert a sklearn Linear Model and a XGBoost Regression Model, and upload them into a Wallaroo Workspace. The following is provided for the tutorial:

  • sklearn-linear-model.pickle: A sklearn linear model. An example of training the model is provided in the Jupyter Notebook sklearn-linear-model-example.ipynb. It has 25 columns.
  • xgb_reg.pickle: A XGBoost regression model. An example of training the model is provided in the Jupyter Notebook xgboost-regression-model-example.ipynb. It has 25 columns.

Steps

Prerequisites

Before starting, the following must be available:

  • The model to upload into a workspace.
  • The number of columns the model was trained for.

Import Libraries

Import the libraries that will be used for the auto-conversion process.

import pickle
import json

import wallaroo

from wallaroo.ModelConversion import ConvertSKLearnArguments, ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

The following code is used to either connect to an existing workspace or to create a new one. For more details on working with workspaces, see the Wallaroo Workspace Management Guide.

Connect to Wallaroo

Connect to your Wallaroo instance.

wl = wallaroo.Client()

Set the Workspace

We’ll connect or create the workspace testautoconversion and use it for our model testing.

workspace_name = 'testautoconversion'
def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace
workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

wl.get_current_workspace()
{'name': 'testautoconversion', 'id': 12, 'archived': False, 'created_by': '13f4ce0d-cb22-4a5c-b07b-c65e4d730315', 'created_at': '2022-08-02T22:16:30.552476+00:00', 'models': [], 'pipelines': []}

Set the Model Conversion Arguments

We’ll create two different configurations, one for each of our models:

  • sklearn_model_conversion_args: Used for our sklearn model.
  • xgboost_model_converstion_args: Used for our XGBoost model.
# The number of columns
NF=25

sklearn_model_conversion_args = ConvertSKLearnArguments(
    name="lm-test",
    comment="test linear regression",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Double
)
sklearn_model_conversion_type = ModelConversionSource.SKLEARN

xgboost_model_conversion_args = ConvertXGBoostArgs(
    name="xgb-test-reg",
    comment="xgboost regression model test",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
xgboost_model_conversion_type = ModelConversionSource.XGBOOST

Convert the Models

The convert_model method converts the model using the arguments, and uploads it into the current workspace - in this case, testconversion. Once complete, we can run get_current_workspace to verify that the models were uploaded.

# converts and uploads the sklearn model.
wl.convert_model('sklearn-linear-model.pickle', sklearn_model_conversion_type, sklearn_model_conversion_args)

# converts and uploads the XGBoost model.
wl.convert_model('xgb_reg.pickle', xgboost_model_conversion_type, xgboost_model_conversion_args)
{'name': 'xgb-test-reg', 'version': '9ade0e7a-dc3f-4935-8974-ed8bda12d148', 'file_name': '39c215bb-ae23-4a05-b520-aa0b8d94ba42-converted.onnx', 'last_update_time': datetime.datetime(2022, 8, 3, 14, 26, 58, 413122, tzinfo=tzutc())}
wl.get_current_workspace()
{'name': 'testautoconversion', 'id': 12, 'archived': False, 'created_by': '13f4ce0d-cb22-4a5c-b07b-c65e4d730315', 'created_at': '2022-08-02T22:16:30.552476+00:00', 'models': [{'name': 'lm-test', 'version': '2227f4a5-3139-4bc8-844c-3587546f326a', 'file_name': '2fb7d46d-d92f-4371-872c-5300c52188bb-converted.onnx', 'last_update_time': datetime.datetime(2022, 8, 3, 14, 26, 55, 892457, tzinfo=tzutc())}, {'name': 'xgb-test-reg', 'version': '9ade0e7a-dc3f-4935-8974-ed8bda12d148', 'file_name': '39c215bb-ae23-4a05-b520-aa0b8d94ba42-converted.onnx', 'last_update_time': datetime.datetime(2022, 8, 3, 14, 26, 58, 413122, tzinfo=tzutc())}], 'pipelines': []}

4 - sk-learn Regression Model to ONNX Outside Wallaroo

How to convert sk-learn Regression ML models into the ONNX format.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

How to Convert sk-learn Regression Model to ONNX

The following tutorial is a brief example of how to convert a scikit-learn (aka sk-learn) regression ML model to the ONNX.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

This tutorial provides the following:

  • demand_curve.pickle: a demand curve trained sk-learn model. Once this file is converted to ONNX format, it can be used as part of the Demand Curve Pipeline Tutorial.

    This model contains 3 columns: UnitPrice, cust_known, and UnitPriceXcust_known.

Conversion Process

Libraries

The first step is to import our libraries we will be using.

# Used to load the sk-learn model
import pickle

# Used for the conversion process
import onnx, skl2onnx, onnxmltools
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType

Next let’s define our model_to_onnx method for converting a sk-learn model to ONNX. This has the following inputs:

  • model: The sk-learn model we’re converting.
  • cols: The number of inputs the model expects
  • input_type: Determines how to manage float values, which can either be DoubleTensorType or FloatTensorType.
# convert model to ONNX

def model_to_onnx(model, cols, *, input_type='Double'):
    input_type_lower=input_type.lower()
    # How to manage float values
    if input_type=='Double':
        tensor_type=DoubleTensorType
    elif input_type=='Float':
        tensor_type=FloatTensorType
    else:
        raise ValueError("bad input type")
    tensor_size=cols
    initial_type=[(f'{input_type_lower}_input', tensor_type([None, tensor_size]))]
    onnx_model=onnxmltools.convert_sklearn(model,initial_types=initial_type)
    return onnx_model

With our method defined, now it’s time to convert. Let’s load our sk-learn model and save it into the variable sklearn_model.

# pickle the model, so I can try the Wallaroo converter on it

sklearn_model = pickle.load(open('./demand_curve.pickle', 'rb'))
/opt/conda/lib/python3.8/site-packages/sklearn/base.py:310: UserWarning: Trying to unpickle estimator LinearRegression from version 0.24.2 when using version 0.24.1. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(

Now we’ll convert our sklearn-model into the variable onnx_model using our model_to_onnx method. Recall that our sklearn-model has 3 columns.

onnx_model_converted = model_to_onnx(sklearn_model, 3)

Now we can save our model to a onnx file.

onnx.save_model(onnx_model_converted, "demand_curve.onnx")

5 - Statsmodel Upload to Wallaroo Tutorial

How to upload a Statsmodel ML model into Wallaroo

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Introduction

Organizations can deploy a Machine Learning (ML) model based on the statsmodels directly into Wallaroo through the following process. This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.

This example provides the following:

  • train-statsmodel.ipynb: A sample Jupyter Notebook that trains a sample model. The model predicts how many bikes will be rented on each of the next 7 days, based on the previous 7 days’ bike rentals, temperature, and wind speed. Additional files to support this example are:
    • day.csv: Data used to train the sample statsmodel example.
    • infer.py: The inference script that is part of the statsmodel.
  • convert-statsmodel-tutorial.ipynb: A sample Jupyter Notebook that demonstrates how to upload, convert, and deploy the statsmodel example into a Wallaroo instance. Additional files to support this example are:
    • bike_day_model.pkl: A statsmodel ML model trained from the train-statsmodel.ipynb Notebook.

      IMPORTANT NOTE: The statsmodel ML model is composed of two parts that are contained in the .pkl file:

      • The pickled Python runtime expects a dictionary with two keys: model and script:

        • model—the pickled model, which will be automatically loaded into the python runtime with the name ‘model’
        • script—the text of the python script to be run, in a format similar to the existing python script steps (i.e. defining a wallaroo_json method which operates on the data). In this cae, the file infer.py is the script used.
    • bike_day_eval.json: Evaluation data used to test the model’s performance.

Steps

The following steps will perform the following:

  1. Upload the statsmodel ML model bike_day_model.pkl into a Wallaroo.
  2. Deploy the model into a pipeline.
  3. Run a test inference.
  4. Undeploy the pipeline.

Import Libraries

The first step is to import the libraries that we will need.

import json
import os
import datetime

import wallaroo
from wallaroo.object import EntityNotFoundError

Initialize connection

Start a connect to the Wallaroo instance and save the connection into the variable wl.

wl = wallaroo.Client()

Set Configurations

The following will set the workspace, model name, and pipeline that will be used for this example. If the workspace or pipeline already exist, then they will assigned for use in this example. If they do not exist, they will be created based on the names listed below.

workspace_name = 'bikedayevalworkspace'
pipeline_name = 'bikedayevalpipeline'
model_name = 'bikedaymodel'
model_file_name = 'bike_day_model.pkl'

Set the Workspace and Pipeline

This sample code will create or use the existing workspace bike-day-workspace as the current workspace.

def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name bike-day-evel-pipeline
created 2022-07-05 19:09:22.895067+00:00
last_updated 2022-07-05 19:11:16.553505+00:00
deployed False
tags
steps bike-day-model

Upload Pickled Package Statsmodel Model

Upload the statsmodel stored into the pickled package bike_day_model.pkl. See the Notebook train-statsmodel.ipynb for more details on creating this package.

Note that this package is being specified as a python configuration.

file_name = "bike_day_model.pkl"

bike_day_model = wl.upload_model(model_name, model_file_name).configure(runtime="python")

Deploy the Pipeline

We will now add the uploaded model as a step for the pipeline, then deploy it.

pipeline.add_model_step(bike_day_model)
name bike-day-evel-pipeline
created 2022-07-05 19:09:22.895067+00:00
last_updated 2022-07-05 19:11:16.553505+00:00
deployed False
tags
steps bike-day-model
pipeline.deploy()
Waiting for deployment - this will take up to 45s ................. ok
name bike-day-evel-pipeline
created 2022-07-05 19:09:22.895067+00:00
last_updated 2022-07-05 20:10:27.589019+00:00
deployed True
tags
steps bike-day-model
pipeline.status()
{'status': 'Running',
 'details': None,
 'engines': [{'ip': '10.164.3.4',
   'name': 'engine-5f75f487c6-9d456',
   'status': 'Running',
   'reason': None,
   'pipeline_statuses': {'pipelines': [{'id': 'bike-day-evel-pipeline',
      'status': 'Running'}]},
   'model_statuses': {'models': [{'name': 'bike-day-model',
      'version': 'ff154938-4e49-468e-ac6a-4ee37d62a724',
      'sha': 'ba1fc2a6e8b876684f2fd11534ee6212f840f02cbaefaa48615016cb9e90b30c',
      'status': 'Running'}]}}],
 'engine_lbs': [{'ip': '10.164.5.61',
   'name': 'engine-lb-85846c64f8-khznn',
   'status': 'Running',
   'reason': None}]}

Run Inference

Perform an inference from the evaluation data JSON file bike_day_eval.json.

pipeline.infer_from_file('bike_day_eval.json')
Waiting for inference response - this will take up to 45s .. ok


[InferenceResult({'check_failures': [],
  'elapsed': 5369777,
  'model_name': 'bike-day-model',
  'model_version': 'ff154938-4e49-468e-ac6a-4ee37d62a724',
  'original_data': {'holiday': {'0': 0,
                                '1': 0,
                                '2': 0,
                                '3': 0,
                                '4': 0,
                                '5': 0,
                                '6': 0},
                    'temp': {'0': 0.317391,
                             '1': 0.365217,
                             '2': 0.415,
                             '3': 0.54,
                             '4': 0.4725,
                             '5': 0.3325,
                             '6': 0.430435},
                    'windspeed': {'0': 0.184309,
                                  '1': 0.203117,
                                  '2': 0.209579,
                                  '3': 0.231017,
                                  '4': 0.368167,
                                  '5': 0.207721,
                                  '6': 0.288783},
                    'workingday': {'0': 1,
                                   '1': 1,
                                   '2': 1,
                                   '3': 1,
                                   '4': 0,
                                   '5': 0,
                                   '6': 1}},
  'outputs': [{'Json': {'data': [{'forecast': [1882.3784554842296,
                                               2130.607915715519,
                                               2340.8400538168335,
                                               2895.754978556798,
                                               2163.65751556893,
                                               1509.1792126536425,
                                               2431.1838923984033]}],
                        'dim': [1],
                        'v': 1}}],
  'pipeline_name': 'bike-day-evel-pipeline',
  'time': 1657051854529})]

Undeploy the Pipeline

Undeploy the pipeline and return the resources back to the Wallaroo instance.

pipeline.undeploy()
Waiting for undeployment - this will take up to 45s ................................ ok
name bike-day-evel-pipeline
created 2022-07-05 19:09:22.895067+00:00
last_updated 2022-07-05 20:10:27.589019+00:00
deployed False
tags
steps bike-day-model

6 - XGBoost Classification Auto-Convert Within Wallaroo

How to convert XGBoost ML Classification models and upload to Wallaroo.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Introduction

The following tutorial is a brief example of how to convert a XGBoost Classification ML model with the convert_model method and upload it into your Wallaroo instance.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

  • Convert a XGBoost Classification ML model and upload it into the Wallaroo engine.
  • Run a sample inference on the converted model in a Wallaroo instance.

This tutorial provides the following:

  • xgb_class.pickle: A pretrained XGBoost Classification model with 25 columns.
  • xgb_class_eval.json: Test data to perform a sample inference.

Conversion Steps

Conversion Steps

To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments) method takes 3 parameters. The parameters for XGBoost conversions are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
    • keras: ModelConversionSource.KERAS
  • conversion_arguments: The arguments for the conversion based on the type of model being converted. These are:
    • wallaroo.ModelConversion.ConvertXGBoostArgs: Used for XGBoost models and takes the following parameters:
    • name: The name of the model being converted.
    • comment: Any comments for the model.
    • number_of_columns: The number of columns the model was trained for.
    • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}, where {type} is Float, Double, etc depending on the model.

Import Libraries

The first step is to import the libraries needed.

import wallaroo

from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

Configuration and Methods

The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras model, and the sample data.

The functions get_workspace(name) will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name) will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.

workspace_name = 'xgboost-classification-autoconvert-workspace'
pipeline_name = 'xgboost-classification-autoconvert-pipeline'
model_name = 'xgb-class-model'
model_file_name = 'xgb_class.pickle'
sample_data = 'xgb_class_eval.json'

def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

Connect to Wallaroo

Connect to your Wallaroo instance and store the connection into the variable wl.

wl = wallaroo.Client()

Set the Workspace and Pipeline

Set or create the workspace and pipeline based on the names configured earlier.

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name xgboost-classification-autoconvert-pipeline
created 2022-08-03 15:35:20.889178+00:00
last_updated 2022-08-03 15:35:20.889178+00:00
deployed (none)
tags
steps

Set the Model Autoconvert Parameters

Set the paramters for converting the xgb-class-model.

#the number of columns
NF = 25

model_conversion_args = ConvertXGBoostArgs(
    name=model_name,
    comment="xgboost classification model test",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST

Upload and Convert the Model

Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx.

# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)

Test Inference

With the model uploaded and converted, we can run a sample inference.

Deploy the Pipeline

Add the uploaded and converted model_wl as a step in the pipeline, then deploy it.

pipeline.add_model_step(model_wl).deploy()
Waiting for deployment - this will take up to 45s .... ok
name xgboost-classification-autoconvert-pipeline
created 2022-08-03 15:35:20.889178+00:00
last_updated 2022-08-03 15:35:23.597027+00:00
deployed True
tags
steps xgb-class-model

Run the Inference

Use the test_class_eval.json as set earlier as our sample_data and perform the inference.

result = pipeline.infer_from_file(sample_data)
result[0].data()
Waiting for inference response - this will take up to 45s ....... ok


[array([0.e+000, 0.e+000, 5.e-324, 0.e+000, 5.e-324]),
 array([[0.99668795, 0.00331205],
        [0.52999395, 0.47000605],
        [0.14704436, 0.85295564],
        [0.995507  , 0.004493  ],
        [0.19796491, 0.80203509]])]

Undeploy the Pipeline

With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.

pipeline.undeploy()
Waiting for undeployment - this will take up to 45s .............................. ok
name xgboost-classification-autoconvert-pipeline
created 2022-08-03 15:35:20.889178+00:00
last_updated 2022-08-03 15:35:23.597027+00:00
deployed False
tags
steps xgb-class-model

7 - XGBoost Regression Auto-Convert Within Wallaroo

How to convert XGBoost ML Regression models and upload to Wallaroo with the Wallaroo convert_model method.

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

Introduction

The following tutorial is a brief example of how to convert a XGBoost Regression ML model with the convert_model method and upload it into your Wallaroo instance.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

  • Convert a XGBoost Regression ML model and upload it into the Wallaroo engine.
  • Run a sample inference on the converted model in a Wallaroo instance.

This tutorial provides the following:

  • xgb_reg.pickle: A pretrained XGBoost Regression model with 25 columns.
  • xgb_regression_eval.json: Test data to perform a sample inference.

Conversion Steps

Conversion Steps

To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments) method takes 3 parameters. The parameters for XGBoost conversions are:

  • path (STRING): The path to the ML model file.
  • source_type (ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource:
    • sklearn: ModelConversionSource.SKLEARN
    • xgboost: ModelConversionSource.XGBOOST
    • keras: ModelConversionSource.KERAS
  • conversion_arguments: The arguments for the conversion based on the type of model being converted. These are:
    • wallaroo.ModelConversion.ConvertXGBoostArgs: Used for XGBoost models and takes the following parameters:
    • name: The name of the model being converted.
    • comment: Any comments for the model.
    • number_of_columns: The number of columns the model was trained for.
    • input_type: A tensorflow Dtype called in the format ModelConversionInputType.{type}, where {type} is Float, Double, etc depending on the model.

Import Libraries

The first step is to import the libraries needed.

import wallaroo

from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError

Configuration and Methods

The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras model, and the sample data.

The functions get_workspace(name) will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name) will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.

workspace_name = 'xgboost-regression-autoconvert-workspace'
pipeline_name = 'xgboost-regression-autoconvert-pipeline'
model_name = 'xgb-regression-model'
model_file_name = 'xgb_reg.pickle'
sample_data = 'xgb_regression_eval.json'

def get_workspace(name):
    workspace = None
    for ws in wl.list_workspaces():
        if ws.name() == name:
            workspace= ws
    if(workspace == None):
        workspace = wl.create_workspace(name)
    return workspace

def get_pipeline(name):
    try:
        pipeline = wl.pipelines_by_name(pipeline_name)[0]
    except EntityNotFoundError:
        pipeline = wl.build_pipeline(pipeline_name)
    return pipeline

Connect to Wallaroo

Connect to your Wallaroo instance and store the connection into the variable wl.

wl = wallaroo.Client()

Set the Workspace and Pipeline

Set or create the workspace and pipeline based on the names configured earlier.

workspace = get_workspace(workspace_name)

wl.set_current_workspace(workspace)

pipeline = get_pipeline(pipeline_name)
pipeline
name xgboost-regression-autoconvert-pipeline
created 2022-07-08 14:12:08.527632+00:00
last_updated 2022-07-08 14:12:08.527632+00:00
deployed (none)
tags
steps

Set the Model Autoconvert Parameters

Set the paramters for converting the xgb-class-model.

#the number of columns
NF = 25

model_conversion_args = ConvertXGBoostArgs(
    name=model_name,
    comment="xgboost regression model test",
    number_of_columns=NF,
    input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST

Upload and Convert the Model

Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx.

# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)

Test Inference

With the model uploaded and converted, we can run a sample inference.

Deploy the Pipeline

Add the uploaded and converted model_wl as a step in the pipeline, then deploy it.

pipeline.add_model_step(model_wl).deploy()
Waiting for deployment - this will take up to 45s ..... ok
name xgboost-regression-autoconvert-pipeline
created 2022-07-08 14:12:08.527632+00:00
last_updated 2022-07-08 14:12:10.324722+00:00
deployed True
tags
steps xgb-regression-model

Run the Inference

Use the test_class_eval.json as set earlier as our sample_data and perform the inference.

result = pipeline.infer_from_file(sample_data)
result[0].data()
[array([[  30.71360016],
        [-202.30688477],
        [ 285.74139404],
        [ -56.76713943],
        [-238.28738403]])]

Undeploy the Pipeline

With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.

pipeline.undeploy()
Waiting for undeployment - this will take up to 45s ................................. ok
name xgboost-regression-autoconvert-pipeline
created 2022-07-08 14:12:08.527632+00:00
last_updated 2022-07-08 14:12:10.324722+00:00
deployed False
tags
steps xgb-regression-model

8 - XGBoost Convert to ONNX

How to convert XGBoost to ONNX using the onnxmltools.convert library

This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.

How to Convert XGBoost to ONNX

The following tutorial is a brief example of how to convert a XGBoost ML model to the ONNX standard. This allows organizations that have trained XGBoost models to convert them and use them with Wallaroo.

This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.

This tutorial provides the following:

Conversion Process

Libraries

The first step is to import our libraries we will be using.

import onnx
from onnxmltools.convert import convert_xgboost

from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType

Set Variables

The following variables are required to be known before the process can be started:

  • number of columns: The number of columns used by the model.
  • TARGET_OPSET: Verify the TARGET_OPSET value taht will be used in the conversion process matches the current Wallaroo model uploads requirements.
# set the number of columns
ncols = 18

# derive the opset value

# from onnx.defs import onnx_opset_version
#from onnxconverter_common.onnx_ex import DEFAULT_OPSET_NUMBER
#TARGET_OPSET = min(DEFAULT_OPSET_NUMBER, onnx_opset_version())

TARGET_OPSET = 15

Load the XGBoost Model

Next we will load our model that has been saved in the pickle format and unpickle it.

# load the xgboost model
with open("housing_model_xgb.pkl", "rb") as f:
    xgboost_model = pickle.load(f)

Conversion Inputs

The convert_xgboost method has the following format and requires the following inputs:

convert_xgboost({XGBoost Model}, 
                {XGBoost Model Type},
                [
                    ('input', 
                    {Tensor Data Type}([None, {ncols}]))
                ],
                target_opset={TARGET_OPSET})
  1. XGBoost Model: The XGBoost Model to convert.
  2. XGBoost Model Type: The type of XGBoost model. In this example is it a tree-based classifier.
  3. Tensor Data Type: Either FloatTensorType or DoubleTensorType from the skl2onnx.common.data_types library.
  4. ncols: Number of columns in the model.
  5. TARGET_OPSET: The target opset which can be derived in code showed below.

Convert the Model

With all of our data in place we can now convert our XBBoost model to ONNX using the convert_xgboost method.

onnx_model_converted = convert_xgboost(xgboost_model, 'tree-based classifier',
                             [('input', FloatTensorType([None, ncols]))],
                             target_opset=TARGET_OPSET)

Save the Model

With the model converted to ONNX, we can now save it and use it in a Wallaroo pipeline.

onnx.save_model(onnx_model_converted, "housing_model_xgb.onnx")