Wallaroo SDK Essentials Guide: Model Uploads and Registrations: XGBoost

How to upload and use XGBoost ML Models with Wallaroo

Model Naming Requirements

Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be lower case ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

Wallaroo supports XGBoost models. The following XGBoost models are either packaged into the Wallaroo Native Runtime, or are packaged into the Wallaroo Containerized Runtime.

Parameter	Description
Web Site	https://xgboost.ai/
Supported Libraries	`scikit-learn==1.3.0` `xgboost==1.7.4`
Framework	`Framework.XGBOOST` aka `xgboost`
Supported File Types	`pickle` (XGB files are not supported.)

During the model upload process, Wallaroo optimizes models by converting them to the Wallaroo Native Runtime, if possible, or running the model directly in the Wallaroo Containerized Runtime. See the Model Deploy for details on how to configure pipeline resources based on the model’s runtime.

Since the Wallaroo 2024.1 release, XGBoost support is enhanced to performantly support a wider set of XGBoost models. XGBoost models are not required to be trained with ONNX nomenclature in order to successfully convert to a performant runtime.

XGBoost Types Support

The following XGBoost model types are supported by Wallaroo. XGBoost models not supported by Wallaroo are supported via the Custom Model, also known as Bring Your Own Predict (BYOP).

XGBoost Model Type	Wallaroo Packaging Supported
XGBClassifier	√
XGBRegressor	√
Booster Classifier	√
Booster Classifier	√
Booster Regressor	√
Booster Random Forest Regressor	√
Booster Random Forest Classifier	√
XGBRFClassifier	√
XGBRFRegressor	√
XGBRanker*	X

XGBRanker XGBoost models are currently supported via converting them to BYOP models.

XGBoost Schema Inputs

XGBoost schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. If a model is originally trained to accept inputs of different data types, it will need to be retrained to only accept one data type for each column - typically pa.float64() is a good choice.

For example, the following DataFrame has 4 columns, each column a float.

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2

For submission to an XGBoost model, the data input schema will be a single array with 4 float values.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float32(), list_size=4))
])

When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.

Original DataFrame:

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2

Converted DataFrame:

	inputs
0	[5.1, 3.5, 1.4, 0.2]
1	[4.9, 3.0, 1.4, 0.2]

XGBoost Schema Outputs

Outputs for XGBoost are labeled based on the trained model outputs.

Outputs for XBoost that are meant to be predictions or probabilities must be labeled as part of the output schema. For example, a model that outputs either 1 or 0 as its output would have the output schema as follows:

output_schema = pa.schema([
    pa.field('predictions', pa.float32()),
])

When used in Wallaroo, the inference result is contained in the out metadata as out.predictions.

pipeline.infer(dataframe)

	time	in.inputs	out.predictions	anomaly.count
0	2023-07-05 15:11:29.776	[5.1, 3.5, 1.4, 0.2]	0	0
1	2023-07-05 15:11:29.776	[4.9, 3.0, 1.4, 0.2]	0	0

Uploading XGBoost Models

XGBoost models are uploaded to Wallaroo through the Wallaroo Client upload_model method.

Upload XGBoost Model Parameters

The following parameters are available for XGBoost models.

Parameter	Type	Description
`name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`path`	`string` (Required)	The path to the model file being uploaded.
`framework`	`string` (Required)	Set as the `Framework.XGBOOST`.
`input_schema`	`pyarrow.lib.Schema` (Required)	The input schema in Apache Arrow schema format.
`output_schema`	`pyarrow.lib.Schema` (Required)	The output schema in Apache Arrow schema format.
`convert_wait`	`bool` (Optional) (Default: True)	True: Waits in the script for the model conversion completion. False: Proceeds with the script without waiting for the model conversion process to display complete.

Once the upload process starts, the model is containerized by the Wallaroo instance. This process may take up to 10 minutes.

Upload XGBoost Model Return

upload_model returns a wallaroo.model_version.ModelVersion object with the following fields.

Field	Type	Description
`name`	String	The name of the model.
`version`	String	The model version as a unique UUID.
`file_name`	String	The file name of the model as stored in Wallaroo.
`SHA`	String	The hash value of the model file.
`Status`	String	The status of the model.
`image_path`	String	The image used to deploy the model in the Wallaroo engine.
`last_update_time`	DateTime	When the model was last updated.

Upload XGBoost Model Example

The following example is of uploading a PyTorch ML Model to a Wallaroo instance.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

output_schema = pa.schema([
    pa.field('probabilities', pa.float64())
])

model = wl.upload_model('xgboost-classification', 
                        './models/model-auto-conversion_xgboost_xgb_classification_iris.pkl', 
                        framework=Framework.XGBOOST, 
                        input_schema=input_schema, 
                        output_schema=output_schema)

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Model is attempting loading to a native runtime..incompatible

Model is pending loading to a container runtime.
Model is attempting loading to a container runtime............successful

Ready

Tutorials

The following tutorials are available to show different types of XGBoost models uploaded and deployed to Wallaroo.

Wallaroo Upload and Deploy Tutorials: XGBoost