Wallaroo pipelines support the ONNX standard and models converted using the Model autoconversion method.
These sample guides and their machine language models can be downloaded from the Wallaroo Tutorials Repository.
This is the multi-page printable view of this section. Click here to print.
Wallaroo pipelines support the ONNX standard and models converted using the Model autoconversion method.
These sample guides and their machine language models can be downloaded from the Wallaroo Tutorials Repository.
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
Machine Learning (ML) models can be converted into a Wallaroo Model object and uploaded into Wallaroo workspace using the Wallaroo Client convert_model(path, source_type, conversion_arguments)
method. This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.
The following tutorial is a brief example of how to convert a Keras or Tensor ML model to ONNX. This allows organizations that have trained Keras or Tensor models to convert them and use them with Wallaroo.
This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.
This tutorial demonstrates how to:
keras
ML model and upload it into the Wallaroo engine.This tutorial provides the following:
simple_sentiment_model.zip
: A pre-trained keras
sentiment model to be converted. This has 100 columns.To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments)
method takes 3 parameters. The paramters for keras
conversions are:
path
(STRING): The path to the ML model file.source_type
(ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource
:
ModelConversionSource.SKLEARN
ModelConversionSource.XGBOOST
ModelConversionSource.KERAS
conversion_arguments
: The arguments for the conversion based on the type of model being converted. These are:
wallaroo.ModelConversion.ConvertKerasArguments
: Used for converting keras
type models and takes the following parameters:
name
: The name of the model being converted.comment
: Any comments for the model.input_type
: A tensorflow Dtype called in the format ModelConversionInputType.{type}
, where {type}
is Float
, Double
, etc depending on the model.dimensions
: Corresponds to the keras xtrain
in the format List[Union[None, int, float]].The first step is to import the libraries needed.
import wallaroo
from wallaroo.ModelConversion import ConvertKerasArguments, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError
import pandas as pd
# used to display dataframe information without truncating
from IPython.display import display
pd.set_option('display.max_colwidth', None)
The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras
model, and the sample data.
The functions get_workspace(name)
will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name)
will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.
workspace_name = 'externalkerasautoconvertworkspace'
pipeline_name = 'externalkerasautoconvertpipeline'
model_name = 'externalsimple-sentiment-model'
model_file_name = 'simple_sentiment_model.zip'
sample_data = 'simple_sentiment_testdata.json'
def get_workspace(name):
workspace = None
for ws in wl.list_workspaces():
if ws.name() == name:
workspace= ws
if(workspace == None):
workspace = wl.create_workspace(name)
return workspace
def get_pipeline(name):
try:
pipeline = wl.pipelines_by_name(name)[0]
except EntityNotFoundError:
pipeline = wl.build_pipeline(name)
return pipeline
The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. If logging in externally, update the wallarooPrefix
and wallarooSuffix
variables with the proper DNS information. For more information on Wallaroo DNS settings, see the Wallaroo DNS Integration Guide.
# Client connection from local Wallaroo instance
wl = wallaroo.Client()
Set or create the workspace and pipeline based on the names configured earlier.
workspace = get_workspace(workspace_name)
wl.set_current_workspace(workspace)
pipeline = get_pipeline(pipeline_name)
pipeline
name | externalkerasautoconvertpipeline |
---|---|
created | 2023-05-17 21:13:27.523527+00:00 |
last_updated | 2023-05-17 21:13:27.523527+00:00 |
deployed | (none) |
tags | |
versions | 3948e0dc-d591-4ff5-a48f-b8d17195a806 |
steps |
Set the paramters for converting the simple-sentiment-model
. This includes the shape of the model.
model_columns = 100
model_conversion_args = ConvertKerasArguments(
name=model_name,
comment="simple keras model",
input_type=ModelConversionInputType.Float32,
dimensions=(None, model_columns)
)
model_conversion_type = ModelConversionSource.KERAS
Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx
.
# converts and uploads model.
model_wl = wl.convert_model('simple_sentiment_model.zip', model_conversion_type, model_conversion_args)
model_wl
{'name': 'externalsimple-sentiment-model', 'version': 'c378425b-b70f-465a-a15b-d9e662b15263', 'file_name': '19ec5f96-d3a6-47af-ae6f-928187735de2-converted.onnx', 'image_path': None, 'last_update_time': datetime.datetime(2023, 5, 17, 21, 13, 29, 933149, tzinfo=tzutc())}
With the model uploaded and converted, we can run a sample inference.
We will add the model as a step into our pipeline, then deploy it.
pipeline.add_model_step(model_wl).deploy()
name | externalkerasautoconvertpipeline |
---|---|
created | 2023-05-17 21:13:27.523527+00:00 |
last_updated | 2023-05-17 21:13:30.959401+00:00 |
deployed | True |
tags | |
versions | 7be0dd01-ef82-4335-b60d-6f1cd5287e5b, 3948e0dc-d591-4ff5-a48f-b8d17195a806 |
steps | externalsimple-sentiment-model |
pipeline.status()
{'status': 'Running',
'details': [],
'engines': [{'ip': '10.244.3.139',
'name': 'engine-59fb67fcc6-tns2j',
'status': 'Running',
'reason': None,
'details': [],
'pipeline_statuses': {'pipelines': [{'id': 'externalkerasautoconvertpipeline',
'status': 'Running'}]},
'model_statuses': {'models': [{'name': 'externalsimple-sentiment-model',
'version': 'c378425b-b70f-465a-a15b-d9e662b15263',
'sha': '49f7367eede690b369aef322569c5b54c4133692610a11dc29b14d4c49ea983c',
'status': 'Running'}]}}],
'engine_lbs': [{'ip': '10.244.4.170',
'name': 'engine-lb-584f54c899-fnntk',
'status': 'Running',
'reason': None,
'details': []}],
'sidekicks': []}
We can run a test inference from the simple_sentiment_testdata.json
file, then display just the results.
sample_data = 'simple_sentiment_testdata.df.json'
result = pipeline.infer_from_file(sample_data)
display(result["out.dense"])
0 [0.094697624]
1 [0.991031]
2 [0.93407357]
3 [0.56030995]
4 [0.9964503]
Name: out.dense, dtype: object
With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.
pipeline.undeploy()
name | externalkerasautoconvertpipeline |
---|---|
created | 2023-05-17 21:13:27.523527+00:00 |
last_updated | 2023-05-17 21:13:30.959401+00:00 |
deployed | False |
tags | |
versions | 7be0dd01-ef82-4335-b60d-6f1cd5287e5b, 3948e0dc-d591-4ff5-a48f-b8d17195a806 |
steps | externalsimple-sentiment-model |
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
The following tutorial is a brief example of how to convert a PyTorth (aka sk-learn) ML model to ONNX. This allows organizations that have trained sk-learn models to convert them and use them with Wallaroo.
This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service. This sample code is based on the guide Convert your PyTorch model to ONNX.
This tutorial provides the following:
pytorchbikeshare.pt
: a RandomForestRegressor PyTorch model. This model has a total of 58 inputs, and uses the class BikeShareRegressor
.os
torch
The first step is to import our libraries we will be using. For this example, the PyTorth torch
library will be imported into this kernel.
# the Pytorch libraries
# Import into this kernel
import torch
import torch.onnx
To load a PyTorch model into a variable, the model’s class
has to be defined. For out example we are using the BikeShareRegressor
class as defined below.
class BikeShareRegressor(torch.nn.Module):
def __init__(self):
super(BikeShareRegressor, self).__init__()
self.net = nn.Sequential(nn.Linear(input_size, l1),
torch.nn.ReLU(),
torch.nn.Dropout(p=dropout),
nn.BatchNorm1d(l1),
nn.Linear(l1, l2),
torch.nn.ReLU(),
torch.nn.Dropout(p=dropout),
nn.BatchNorm1d(l2),
nn.Linear(l2, output_size))
def forward(self, x):
return self.net(x)
Now we will load the model into the variable pytorch_tobe_converted
.
# load the Pytorch model
model = torch.load("./pytorch_bikesharingmodel.pt")
Now we will define our method Convert_ONNX()
which has the following inputs:
PyTorchModel: the PyTorch we are converting.
modelInputs: the model input or tuple for multiple inputs.
onnxPath: The location to save the onnx file.
opset_version: The ONNX version to export to.
input_names: Array of the model’s input names.
output_names: Array of the model’s output names.
dynamic_axes: Sets variable length axes in the format, replacing the batch_size
as necessary:
{'modelInput' : { 0 : 'batch_size'}, 'modelOutput' : {0 : 'batch_size'}}
export_params: Whether to store the trained parameter weight inside the model file. Defaults to True
.
do_constant_folding: Sets whether to execute constant folding for optimization. Defaults to True
.
#Function to Convert to ONNX
def Convert_ONNX():
# set the model to inference mode
model.eval()
# Export the model
torch.onnx.export(model, # model being run
dummy_input, # model input (or a tuple for multiple inputs)
pypath, # where to save the model
export_params=True, # store the trained parameter weights inside the model file
opset_version=15, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['modelInput'], # the model's input names
output_names = ['modelOutput'], # the model's output names
dynamic_axes = {'modelInput' : {0 : 'batch_size'}, 'modelOutput' : {0 : 'batch_size'}} # variable length axes
)
print(" ")
print('Model has been converted to ONNX')
We’ll now set our variables and run our conversion. For out example, the input_size
is known to be 58, and the device
value we’ll derive from torch.cuda
. We’ll also set the ONNX version for exporting to 15.
pypath = "pytorchbikeshare.onnx"
input_size = 58
if torch.cuda.is_available():
device = 'cuda'
else:
device = 'cpu'
onnx_opset_version = 15
# Set up some dummy input tensor for the model
dummy_input = torch.randn(1, input_size, requires_grad=True).to(device)
Convert_ONNX()
================ Diagnostic Run torch.onnx.export version 2.0.0 ================
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================
Model has been converted to ONNX
And now our conversion is complete. Please feel free to use this sample code in your own projects.
Machine Learning (ML) models can be converted into a Wallaroo and uploaded into Wallaroo workspace using the Wallaroo Client convert_model(path, source_type, conversion_arguments)
method. This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.
The three input parameters are:
path
(STRING): The path to the ML model file.source_type
(ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource
:
ModelConversionSource.SKLEARN
ModelConversionSource.XGBOOST
conversion_arguments
: The arguments for the conversion:
name
: The name of the model being converted.comment
: Any comments for the model.number_of_columns
: The number of columns the model was trained for.input_type
: The ModelConversationInputType, typically Float
or Double
depending on the model.The following tutorial demonstrates how to convert a sklearn Linear Model and a XGBoost Regression Model, and upload them into a Wallaroo Workspace. The following is provided for the tutorial:
sklearn-linear-model.pickle
: A sklearn linear model. An example of training the model is provided in the Jupyter Notebook sklearn-linear-model-example.ipynb
. It has 25 columns.xgb_reg.pickle
: A XGBoost regression model. An example of training the model is provided in the Jupyter Notebook xgboost-regression-model-example.ipynb
. It has 25 columns.os
wallaroo
: The Wallaroo SDK. Included with the Wallaroo JupyterHub service by default.scikit-learn
Version 1.1.1xgboost
Version 1.6.2pickle
Import the libraries that will be used for the auto-conversion process.
import pickle
import json
import wallaroo
from wallaroo.ModelConversion import ConvertSKLearnArguments, ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError
# Verify the version of XGBoost used to generate the models
import sklearn
import sklearn.datasets
import xgboost as xgb
print(xgb.__version__)
print(sklearn.__version__)
1.6.2
1.1.2
The following code is used to either connect to an existing workspace or to create a new one. For more details on working with workspaces, see the Wallaroo Workspace Management Guide.
The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. If logging in externally, update the wallarooPrefix
and wallarooSuffix
variables with the proper DNS information. For more information on Wallaroo DNS settings, see the Wallaroo DNS Integration Guide.
# Client connection from local Wallaroo instance
wl = wallaroo.Client()
We’ll connect or create the workspace testautoconversion
and use it for our model testing.
workspace_name = 'testautoconversion'
def get_workspace(name):
workspace = None
for ws in wl.list_workspaces():
if ws.name() == name:
workspace= ws
if(workspace == None):
workspace = wl.create_workspace(name)
return workspace
workspace = get_workspace(workspace_name)
wl.set_current_workspace(workspace)
wl.get_current_workspace()
{'name': 'testautoconversion', 'id': 11, 'archived': False, 'created_by': '028c8b48-c39b-4578-9110-0b5bdd3824da', 'created_at': '2023-05-17T21:11:40.856672+00:00', 'models': [], 'pipelines': []}
We’ll create two different configurations, one for each of our models:
sklearn_model_conversion_args
: Used for our sklearn model.xgboost_model_converstion_args
: Used for our XGBoost model.# The number of columns
NF=25
sklearn_model_conversion_args = ConvertSKLearnArguments(
name="sklearntest",
comment="test linear regression",
number_of_columns=NF,
input_type=ModelConversionInputType.Double
)
sklearn_model_conversion_type = ModelConversionSource.SKLEARN
xgboost_model_conversion_args = ConvertXGBoostArgs(
name="xgbtestreg",
comment="xgboost regression model test",
number_of_columns=NF,
input_type=ModelConversionInputType.Float32
)
xgboost_model_conversion_type = ModelConversionSource.XGBOOST
The convert_model
method converts the model using the arguments, and uploads it into the current workspace - in this case, testconversion
. Once complete, we can run get_current_workspace
to verify that the models were uploaded.
# converts and uploads the sklearn model.
wl.convert_model('sklearn-linear-model.pickle', sklearn_model_conversion_type, sklearn_model_conversion_args)
# converts and uploads the XGBoost model.
wl.convert_model('xgb_reg.pickle', xgboost_model_conversion_type, xgboost_model_conversion_args)
{'name': 'xgbtestreg', 'version': '07b1ff68-0a7b-4687-b64f-76ddc995e7c6', 'file_name': '957178f8-fcd1-496d-b985-29beb89ba46e-converted.onnx', 'image_path': None, 'last_update_time': datetime.datetime(2023, 5, 17, 21, 11, 43, 585229, tzinfo=tzutc())}
wl.get_current_workspace()
{'name': 'testautoconversion', 'id': 11, 'archived': False, 'created_by': '028c8b48-c39b-4578-9110-0b5bdd3824da', 'created_at': '2023-05-17T21:11:40.856672+00:00', 'models': [{'name': 'sklearntest', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 5, 17, 21, 11, 42, 463507, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 5, 17, 21, 11, 42, 463507, tzinfo=tzutc())}, {'name': 'xgbtestreg', 'versions': 1, 'owner_id': '""', 'last_update_time': datetime.datetime(2023, 5, 17, 21, 11, 43, 585229, tzinfo=tzutc()), 'created_at': datetime.datetime(2023, 5, 17, 21, 11, 43, 585229, tzinfo=tzutc())}], 'pipelines': []}
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
The following tutorial is a brief example of how to convert a scikit-learn (aka sk-learn) regression ML model to the ONNX.
This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.
This tutorial provides the following:
demand_curve.pickle
: a demand curve trained sk-learn model. Once this file is converted to ONNX format, it can be used as part of the Demand Curve Pipeline Tutorial.
This model contains 3 columns: UnitPrice
, cust_known
, and UnitPriceXcust_known
.
pickle
skl2onnx
onnxmltools
onnx
warnings
The first step is to import our libraries we will be using.
# Used to load the sk-learn model
import pickle
# Used for the conversion process
import onnx, skl2onnx, onnxmltools
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.data_types import DoubleTensorType
# ignoring warnings for demonstration
import warnings
warnings.filterwarnings('ignore')
Next let’s define our model_to_onnx
method for converting a sk-learn model to ONNX. This has the following inputs:
model
: The sk-learn model we’re converting.cols
: The number of inputs the model expectsinput_type
: Determines how to manage float values, which can either be DoubleTensorType
or FloatTensorType
.# convert model to ONNX
def model_to_onnx(model, cols, *, input_type='Double'):
input_type_lower=input_type.lower()
# How to manage float values
if input_type=='Double':
tensor_type=DoubleTensorType
elif input_type=='Float':
tensor_type=FloatTensorType
else:
raise ValueError("bad input type")
tensor_size=cols
initial_type=[(f'{input_type_lower}_input', tensor_type([None, tensor_size]))]
onnx_model=onnxmltools.convert_sklearn(model,initial_types=initial_type)
return onnx_model
With our method defined, now it’s time to convert. Let’s load our sk-learn model and save it into the variable sklearn_model
.
# pickle the model, so I can try the Wallaroo converter on it
sklearn_model = pickle.load(open('./demand_curve.pickle', 'rb'))
Now we’ll convert our sklearn-model
into the variable onnx_model
using our model_to_onnx
method. Recall that our sklearn-model
has 3 columns.
onnx_model_converted = model_to_onnx(sklearn_model, 3)
Now we can save our model to a onnx
file.
onnx.save_model(onnx_model_converted, "demand_curve.onnx")
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
Organizations can deploy a Machine Learning (ML) model based on the statsmodels directly into Wallaroo through the following process. This conversion process transforms the model into an open format that can be run across different frameworks at compiled C-language speeds.
This example provides the following:
train-statsmodel.ipynb
: A sample Jupyter Notebook that trains a sample model. The model predicts how many bikes will be rented on each of the next 7 days, based on the previous 7 days’ bike rentals, temperature, and wind speed. Additional files to support this example are:
day.csv
: Data used to train the sample statsmodel
example.infer.py
: The inference script that is part of the statsmodel
.convert-statsmodel-tutorial.ipynb
: A sample Jupyter Notebook that demonstrates how to upload, convert, and deploy the statsmodel
example into a Wallaroo instance. Additional files to support this example are:
bike_day_model.pkl
: A statsmodel
ML model trained from the train-statsmodel.ipynb
Notebook.
IMPORTANT NOTE: The statsmodel
ML model is composed of two parts that are contained in the .pkl file:
The pickled Python runtime expects a dictionary with two keys: model
and script
:
model
—the pickled model, which will be automatically loaded into the python runtime with the name ‘model’script
—the text of the python script to be run, in a format similar to the existing python script steps (i.e. defining a wallaroo_json method which operates on the data). In this cae, the file infer.py
is the script used.bike_day_eval.json
: Evaluation data used to test the model’s performance.
The following steps will perform the following:
statsmodel
ML model bike_day_model.pkl
into a Wallaroo.The first step is to import the libraries that we will need.
import json
import os
import datetime
import wallaroo
from wallaroo.object import EntityNotFoundError
# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. If logging in externally, update the wallarooPrefix
and wallarooSuffix
variables with the proper DNS information. For more information on Wallaroo DNS settings, see the Wallaroo DNS Integration Guide.
# Login through local Wallaroo instance
wl = wallaroo.Client()
The following will set the workspace, model name, and pipeline that will be used for this example. If the workspace or pipeline already exist, then they will assigned for use in this example. If they do not exist, they will be created based on the names listed below.
workspace_name = 'statsmodelworkspace'
pipeline_name = 'statsmodelpipeline'
model_name = 'bikedaymodel'
model_file_name = 'bike_day_model.pkl'
This sample code will create or use the existing workspace bike-day-workspace
as the current workspace.
def get_workspace(name):
workspace = None
for ws in wl.list_workspaces():
if ws.name() == name:
workspace= ws
if(workspace == None):
workspace = wl.create_workspace(name)
return workspace
def get_pipeline(name):
try:
pipeline = wl.pipelines_by_name(pipeline_name)[0]
except EntityNotFoundError:
pipeline = wl.build_pipeline(pipeline_name)
return pipeline
workspace = get_workspace(workspace_name)
wl.set_current_workspace(workspace)
pipeline = get_pipeline(pipeline_name)
pipeline
name | statsmodelpipeline |
---|---|
created | 2023-05-17 21:19:52.898178+00:00 |
last_updated | 2023-05-17 21:19:52.898178+00:00 |
deployed | (none) |
tags | |
versions | 5456dd2a-3167-4b3c-ad3a-85544292a230 |
steps |
Upload the statsmodel stored into the pickled package bike_day_model.pkl
. See the Notebook train-statsmodel.ipynb
for more details on creating this package.
Note that this package is being specified as a python
configuration.
file_name = "bike_day_model.pkl"
bike_day_model = wl.upload_model(model_name, model_file_name).configure(runtime="python")
We will now add the uploaded model as a step for the pipeline, then deploy it.
pipeline.add_model_step(bike_day_model)
name | statsmodelpipeline |
---|---|
created | 2023-05-17 21:19:52.898178+00:00 |
last_updated | 2023-05-17 21:19:52.898178+00:00 |
deployed | (none) |
tags | |
versions | 5456dd2a-3167-4b3c-ad3a-85544292a230 |
steps |
pipeline.deploy()
name | statsmodelpipeline |
---|---|
created | 2023-05-17 21:19:52.898178+00:00 |
last_updated | 2023-05-17 21:19:55.996411+00:00 |
deployed | True |
tags | |
versions | 4af264e3-f427-4b02-b5ad-4f6690b0ee06, 5456dd2a-3167-4b3c-ad3a-85544292a230 |
steps | bikedaymodel |
pipeline.status()
{'status': 'Running',
'details': [],
'engines': [{'ip': '10.244.3.141',
'name': 'engine-c77f759f7-f7fxd',
'status': 'Running',
'reason': None,
'details': [],
'pipeline_statuses': {'pipelines': [{'id': 'statsmodelpipeline',
'status': 'Running'}]},
'model_statuses': {'models': [{'name': 'bikedaymodel',
'version': '66bf61d5-d144-4f77-82f1-58dabf2bbc33',
'sha': '09b50a8e6a5cff566598dae6fb94f5d7d35c94e278373251cd8b1fd9a000c0a7',
'status': 'Running'}]}}],
'engine_lbs': [{'ip': '10.244.4.172',
'name': 'engine-lb-584f54c899-67jbk',
'status': 'Running',
'reason': None,
'details': []}],
'sidekicks': []}
Perform an inference from the evaluation data JSON file bike_day_eval.json
.
results = pipeline.infer_from_file('bike_day_eval.json', data_format="custom-json")
display(results)
[{'forecast': [1882.3784555157672,
2130.607915701861,
2340.84005381799,
2895.754978552066,
2163.657515565616,
1509.1792126509536,
2431.1838923957016]}]
Undeploy the pipeline and return the resources back to the Wallaroo instance.
pipeline.undeploy()
name | statsmodelpipeline |
---|---|
created | 2023-05-17 21:19:52.898178+00:00 |
last_updated | 2023-05-17 21:19:55.996411+00:00 |
deployed | False |
tags | |
versions | 4af264e3-f427-4b02-b5ad-4f6690b0ee06, 5456dd2a-3167-4b3c-ad3a-85544292a230 |
steps | bikedaymodel |
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
The following tutorial is a brief example of how to convert a XGBoost Classification ML model with the convert_model
method and upload it into your Wallaroo instance.
This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.
XGBoost
Classification ML model and upload it into the Wallaroo engine.This tutorial provides the following:
xgb_class.pickle
: A pretrained XGBoost
Classification model with 25 columns.xgb_class_eval.json
: Test data to perform a sample inference.Wallaroo supports the following model versions:
To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments)
method takes 3 parameters. The parameters for XGBoost
conversions are:
path
(STRING): The path to the ML model file.source_type
(ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource
:
ModelConversionSource.SKLEARN
ModelConversionSource.XGBOOST
ModelConversionSource.KERAS
conversion_arguments
: The arguments for the conversion based on the type of model being converted. These are:
wallaroo.ModelConversion.ConvertXGBoostArgs
: Used for XGBoost
models and takes the following parameters:name
: The name of the model being converted.comment
: Any comments for the model.number_of_columns
: The number of columns the model was trained for.input_type
: A tensorflow Dtype called in the format ModelConversionInputType.{type}
, where {type}
is Float
, Double
, etc depending on the model.The first step is to import the libraries needed.
import wallaroo
from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError
# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. If logging in externally, update the wallarooPrefix
and wallarooSuffix
variables with the proper DNS information. For more information on Wallaroo DNS settings, see the Wallaroo DNS Integration Guide.
# Login through local Wallaroo instance
wl = wallaroo.Client()
The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras
model, and the sample data.
The functions get_workspace(name)
will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name)
will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.
workspace_name = 'xgboost-classification-autoconvert-workspace'
pipeline_name = 'xgboost-classification-autoconvert-pipeline'
model_name = 'xgb-class-model'
model_file_name = 'xgb_class.pickle'
def get_workspace(name):
workspace = None
for ws in wl.list_workspaces():
if ws.name() == name:
workspace= ws
if(workspace == None):
workspace = wl.create_workspace(name)
return workspace
def get_pipeline(name):
try:
pipeline = wl.pipelines_by_name(name)[0]
except EntityNotFoundError:
pipeline = wl.build_pipeline(name)
return pipeline
Set or create the workspace and pipeline based on the names configured earlier.
workspace = get_workspace(workspace_name)
wl.set_current_workspace(workspace)
pipeline = get_pipeline(pipeline_name)
pipeline
name | xgboost-classification-autoconvert-pipeline |
---|---|
created | 2023-05-17 21:21:19.962450+00:00 |
last_updated | 2023-05-17 21:21:19.962450+00:00 |
deployed | (none) |
tags | |
versions | bbe4dce4-f62a-4f4f-a45c-aebbfce23304 |
steps |
Set the paramters for converting the xgb-class-model
.
#the number of columns
NF = 25
model_conversion_args = ConvertXGBoostArgs(
name=model_name,
comment="xgboost classification model test",
number_of_columns=NF,
input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST
Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx
.
# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)
With the model uploaded and converted, we can run a sample inference.
Add the uploaded and converted model_wl
as a step in the pipeline, then deploy it.
pipeline.add_model_step(model_wl).deploy()
name | xgboost-classification-autoconvert-pipeline |
---|---|
created | 2023-05-17 21:21:19.962450+00:00 |
last_updated | 2023-05-17 21:21:22.906665+00:00 |
deployed | True |
tags | |
versions | 5f7bb0cc-f60d-4cee-8425-c5e85331ae2f, bbe4dce4-f62a-4f4f-a45c-aebbfce23304 |
steps | xgb-class-model |
Use the evaluation data to verify the process completed successfully.
sample_data = 'xgb_class_eval.df.json'
result = pipeline.infer_from_file(sample_data)
display(result)
time | in.tensor | out.label | out.probabilities | check_failures | |
---|---|---|---|---|---|
0 | 2023-05-17 21:21:34.273 | [-0.9650039837, 1.7162569382, 1.8570196174, 0.7225873636, 1.4614264692, 1.9567455469, 3.1280554236, 2.4737274835, 2.045634687, 0.0697759683, -0.7334890238, 1.4661397464, -1.7339080123, -0.3295498275, -0.5405674404, 0.9325072938, -0.1753815275, 0.8389569878, 0.2995238298, 2.020354449, 0.307715435, -0.786562628, 1.6198295619, -3.1550540615, 2.4493095715] | [1] | [0.45164853, 0.54835147] | 0 |
1 | 2023-05-17 21:21:34.273 | [-0.24290676, -2.7621478465, -1.0460044448, -0.4367771067, 0.7114974086, 3.1152360132, 0.8780655791, 1.5959052391, 0.1291853603, -0.4705432269, -0.2870965835, 0.2758634598, -2.5296629025, -0.8581708475, -0.0447250952, -0.8147113092, 0.3394927614, 0.1165005518, 0.5214230106, 1.0323965467, 0.824008803, -0.2602068525, -2.5164397098, -2.2480625668, 0.7147467132] | [0] | [0.76527536, 0.23472464] | 0 |
2 | 2023-05-17 21:21:34.273 | [0.3261925153, -1.1340263025, -0.0210165684, -0.402436985, 0.1136841647, 1.9756910921, -1.6567823116, -3.0377564302, 1.0839562248, 1.535350752, -1.5641493986, -0.4037836272, -0.0502258358, -1.383033319, -2.1692714889, 0.5474654104, 0.5884733316, -0.6575750129, -0.4456088906, 1.9450809267, -0.5395060067, 0.0020371202, -2.0035740797, 5.3368805176, -1.3683109303] | [1] | [0.011478066, 0.98852193] | 0 |
3 | 2023-05-17 21:21:34.273 | [0.7071268106, -1.1177500788, 0.1311702635, -0.0342823916, 1.4166474292, -0.7600812269, -1.643252821, 1.1809622308, 1.1552655664, -1.4616319423, -1.3196760448, -0.3871231717, -1.0052010294, 0.3757483273, 0.8164121104, 0.6636194102, 0.2054206669, 0.3971757239, 1.0712736575, 0.5687901164, 0.545534547, -0.4022272078, 0.5202183853, -1.1450692638, -1.6687803276] | [1] | [0.40538806, 0.59461194] | 0 |
4 | 2023-05-17 21:21:34.273 | [-0.4753271684, 0.9648567582, 4.1002801029, -0.3474129796, 0.5912316716, -0.3616544697, -2.9339075495, 0.8583809009, -0.7625328481, -1.447786717, -0.0183969915, -0.1028844583, -1.9931308252, -0.6141588978, 1.5368353642, -0.5482829279, 2.1576770706, 0.4772412627, 0.9956210462, 1.7124754134, -0.7415852899, -0.3876944367, 5.7178008466, 7.1237030134, 0.1815704771] | [1] | [0.0016139746, 0.998386] | 0 |
With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.
pipeline.undeploy()
name | xgboost-classification-autoconvert-pipeline |
---|---|
created | 2023-05-17 21:21:19.962450+00:00 |
last_updated | 2023-05-17 21:21:22.906665+00:00 |
deployed | False |
tags | |
versions | 5f7bb0cc-f60d-4cee-8425-c5e85331ae2f, bbe4dce4-f62a-4f4f-a45c-aebbfce23304 |
steps | xgb-class-model |
convert_model
method.This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
The following tutorial is a brief example of how to convert a XGBoost Regression ML model with the convert_model
method and upload it into your Wallaroo instance.
This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.
XGBoost
Regression ML model and upload it into the Wallaroo engine.This tutorial provides the following:
xgb_reg.pickle
: A pretrained XGBoost
Regression model with 25 columns.xgb_regression_eval.json
: Test data to perform a sample inference.To use the Wallaroo autoconverter convert_model(path, source_type, conversion_arguments)
method takes 3 parameters. The parameters for XGBoost
conversions are:
path
(STRING): The path to the ML model file.source_type
(ModelConversionSource): The type of ML model to be converted. As of this time Wallaroo auto-conversion supports the following source types and their associated ModelConversionSource
:
ModelConversionSource.SKLEARN
ModelConversionSource.XGBOOST
ModelConversionSource.KERAS
conversion_arguments
: The arguments for the conversion based on the type of model being converted. These are:
wallaroo.ModelConversion.ConvertXGBoostArgs
: Used for XGBoost
models and takes the following parameters:name
: The name of the model being converted.comment
: Any comments for the model.number_of_columns
: The number of columns the model was trained for.input_type
: A tensorflow Dtype called in the format ModelConversionInputType.{type}
, where {type}
is Float
, Double
, etc depending on the model.The first step is to import the libraries needed.
import wallaroo
from wallaroo.ModelConversion import ConvertXGBoostArgs, ModelConversionSource, ModelConversionInputType
from wallaroo.object import EntityNotFoundError
# used to display dataframe information without truncating
from IPython.display import display
import pandas as pd
pd.set_option('display.max_colwidth', None)
The first step is to connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. If logging in externally, update the wallarooPrefix
and wallarooSuffix
variables with the proper DNS information. For more information on Wallaroo DNS settings, see the Wallaroo DNS Integration Guide.
# Login through local Wallaroo instance
wl = wallaroo.Client()
The following will set the workspace, pipeline, model name, the model file name used when uploading and converting the keras
model, and the sample data.
The functions get_workspace(name)
will either set the current workspace to the requested name, or create it if it does not exist. The function get_pipeline(name)
will either set the pipeline used to the name requested, or create it in the current workspace if it does not exist.
workspace_name = 'xgboost-regression-autoconvert-workspace'
pipeline_name = 'xgboost-regression-autoconvert-pipeline'
model_name = 'xgb-regression-model'
model_file_name = 'xgb_reg.pickle'
def get_workspace(name):
workspace = None
for ws in wl.list_workspaces():
if ws.name() == name:
workspace= ws
if(workspace == None):
workspace = wl.create_workspace(name)
return workspace
def get_pipeline(name):
try:
pipeline = wl.pipelines_by_name(name)[0]
except EntityNotFoundError:
pipeline = wl.build_pipeline(name)
return pipeline
Set or create the workspace and pipeline based on the names configured earlier.
workspace = get_workspace(workspace_name)
wl.set_current_workspace(workspace)
pipeline = get_pipeline(pipeline_name)
pipeline
name | xgboost-regression-autoconvert-pipeline |
---|---|
created | 2023-05-17 21:21:56.828449+00:00 |
last_updated | 2023-05-17 21:21:56.828449+00:00 |
deployed | (none) |
tags | |
versions | 324433ae-db9a-4d43-9563-ff76df59953d |
steps |
Set the paramters for converting the xgb-class-model
.
#the number of columns
NF = 25
model_conversion_args = ConvertXGBoostArgs(
name=model_name,
comment="xgboost regression model test",
number_of_columns=NF,
input_type=ModelConversionInputType.Float32
)
model_conversion_type = ModelConversionSource.XGBOOST
Now we can upload the convert the model. Once finished, it will be stored as {unique-file-id}-converted.onnx
.
# convert and upload
model_wl = wl.convert_model(model_file_name, model_conversion_type, model_conversion_args)
With the model uploaded and converted, we can run a sample inference.
Add the uploaded and converted model_wl
as a step in the pipeline, then deploy it.
pipeline.add_model_step(model_wl).deploy()
name | xgboost-regression-autoconvert-pipeline |
---|---|
created | 2023-05-17 21:21:56.828449+00:00 |
last_updated | 2023-05-17 21:21:59.912121+00:00 |
deployed | True |
tags | |
versions | f5337089-2756-469a-871a-1cb9e3416847, 324433ae-db9a-4d43-9563-ff76df59953d |
steps | xgb-regression-model |
pipeline.status()
{'status': 'Running',
'details': [],
'engines': [{'ip': '10.244.2.161',
'name': 'engine-5578b4dccb-k4xmk',
'status': 'Running',
'reason': None,
'details': [],
'pipeline_statuses': {'pipelines': [{'id': 'xgboost-regression-autoconvert-pipeline',
'status': 'Running'}]},
'model_statuses': {'models': [{'name': 'xgb-regression-model',
'version': 'da71f6f2-28f9-4e33-be56-bec9ffddd3c8',
'sha': 'b39f4982fa58efe81b5dcc8ee40fe4ef3348d0d53aa76d74fd82332e1aac394b',
'status': 'Running'}]}}],
'engine_lbs': [{'ip': '10.244.4.174',
'name': 'engine-lb-584f54c899-gqsk4',
'status': 'Running',
'reason': None,
'details': []}],
'sidekicks': []}
Use the test_class_eval.json
as set earlier as our sample_data
and perform the inference.
sample_data = 'xgb_regression_eval.df.json'
result = pipeline.infer_from_file(sample_data)
display(result)
time | in.tensor | out.variable | check_failures | |
---|---|---|---|---|
0 | 2023-05-17 21:22:11.368 | [-0.0337420814, -0.1876901281, 0.3183056488, 1.1831088244, -0.3047963287, 1.0713634828, 0.4679136198, 1.1382147115, 2.8101110944, -0.9981048796, -0.2543715265, 0.2845195171, -0.6477265924, -1.2198006181, 2.0592129832, -1.586429512, 0.1884164743, -0.3816011585, 1.0781704305, -0.2251253601, 0.6067409459, 0.9659944831, -0.690207203, -0.3849078305, -1.7806555641] | [3.069023] | 0 |
1 | 2023-05-17 21:22:11.368 | [-0.6374335428, 0.9713713274, -0.3899847809, -1.8685333445, 0.6264452739, 1.0778638153, -1.1687273967, -1.9366353171, -0.7583260267, -0.1288186991, 2.2018769654, -0.9383105208, -0.0959982166, 0.6889112707, 1.0172067951, -0.1988865499, 1.3461760224, -0.5692275708, 0.0112450486, -1.0244657911, -0.0065034946, -0.888033574, 2.5997682335, -0.6593191496, 0.4554196997] | [41.130955] | 0 |
2 | 2023-05-17 21:22:11.368 | [0.9847406173, -0.6887896553, -0.9483266359, -0.6146245598, 0.395195321, 0.2237676197, -2.1580851068, -0.8124396117, 0.8795326949, 1.0463472648, -0.2343060791, 1.9127900859, -0.0636431887, 2.7055743269, 1.424242505, 0.1486958646, -0.7771892138, -0.6720552548, 0.9127712446, 0.680721406, 1.5207886874, 1.9579334337, -0.9336538468, -0.2942243461, 0.8563934417] | [27.114595] | 0 |
3 | 2023-05-17 21:22:11.368 | [-0.0894312686, 2.0916777545, 0.155086745, 0.8335388277, 0.4376497549, -0.2875695352, -1.272466627, -0.8226918076, -0.8637972417, -0.4856051115, -0.978749107, 0.2675108269, 0.5246808262, -0.96869578, 0.8475004997, 1.0027495438, 0.4704188579, 2.6906210825, 1.34454675, -1.4987055653, 0.680752942, -2.6459314502, 0.6274277031, 1.3640818416, -0.8077878088] | [61.69479] | 0 |
4 | 2023-05-17 21:22:11.368 | [-0.9200220805, -1.8760634694, -0.8277296049, 0.6511561005, 1.5066237509, -1.1236118386, -0.3776053288, -0.0445487434, -1.4965713379, -0.1756118518, 0.0317408338, 0.2496108303, 1.6857141605, 0.0339106658, -0.3340227553, -0.3428326984, -0.5932644698, -0.4395685475, -0.6870452688, -0.4132149028, -0.7352879532, 0.2080507404, 0.4575261189, -2.0175947284, 1.154633581] | [-92.68761] | 0 |
With the tests complete, we will undeploy the pipeline to return the resources back to the Wallaroo instance.
pipeline.undeploy()
name | xgboost-regression-autoconvert-pipeline |
---|---|
created | 2023-05-17 21:21:56.828449+00:00 |
last_updated | 2023-05-17 21:21:59.912121+00:00 |
deployed | False |
tags | |
versions | f5337089-2756-469a-871a-1cb9e3416847, 324433ae-db9a-4d43-9563-ff76df59953d |
steps | xgb-regression-model |
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
The following tutorial is a brief example of how to convert a XGBoost ML model to the ONNX standard. This allows organizations that have trained XGBoost models to convert them and use them with Wallaroo.
This tutorial assumes that you have a Wallaroo instance and are running this Notebook from the Wallaroo Jupyter Hub service.
This tutorial provides the following:
housing_model_xgb.pkl
: A pretrained model used as part of the Notebooks in Production tutorial. This model has a total of 18 columns.The first step is to import our libraries we will be using.
import onnx
import pickle
from onnxmltools.convert import convert_xgboost
from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType
The following variables are required to be known before the process can be started:
# set the number of columns
ncols = 18
TARGET_OPSET = 15
Next we will load our model that has been saved in the pickle
format and unpickle it.
# load the xgboost model
with open("housing_model_xgb.pkl", "rb") as f:
xgboost_model = pickle.load(f)
The convert_xgboost
method has the following format and requires the following inputs:
convert_xgboost({XGBoost Model},
{XGBoost Model Type},
[
('input',
{Tensor Data Type}([None, {ncols}]))
],
target_opset={TARGET_OPSET})
tree-based classifier
.FloatTensorType
or DoubleTensorType
from the skl2onnx.common.data_types
library.With all of our data in place we can now convert our XBBoost model to ONNX using the convert_xgboost
method.
onnx_model_converted = convert_xgboost(xgboost_model, 'tree-based classifier',
[('input', FloatTensorType([None, ncols]))],
target_opset=TARGET_OPSET)
With the model converted to ONNX, we can now save it and use it in a Wallaroo pipeline.
onnx.save_model(onnx_model_converted, "housing_model_xgb.onnx")