Wallaroo SDK Essentials Guide: Model Uploads and Registrations: XGBoost
Table of Contents
Model Naming Requirements
Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be lower case ASCII alpha-numeric characters or dash (-) only. .
and _
are not allowed.
Wallaroo supports XGBoost models. The following XGBoost models are either packaged into the Wallaroo Native Runtime, or are packaged into the Wallaroo Containerized Runtime.
Parameter | Description |
---|---|
Web Site | https://xgboost.ai/ |
Supported Libraries |
|
Framework | Framework.XGBOOST aka xgboost |
Supported File Types | pickle (XGB files are not supported.) |
During the model upload process, Wallaroo optimizes models by converting them to the Wallaroo Native Runtime, if possible, or running the model directly in the Wallaroo Containerized Runtime. See the Model Deploy for details on how to configure pipeline resources based on the model’s runtime.
Since the Wallaroo 2024.1 release, XGBoost support is enhanced to performantly support a wider set of XGBoost models. XGBoost models are not required to be trained with ONNX nomenclature in order to successfully convert to a performant runtime.
XGBoost Types Support
The following XGBoost model types are supported by Wallaroo. XGBoost models not supported by Wallaroo are supported via the Custom Models, also known as Bring Your Own Predict (BYOP).
XGBoost Model Type | Wallaroo Packaging Supported |
---|---|
XGBClassifier | √ |
XGBRegressor | √ |
Booster Classifier | √ |
Booster Classifier | √ |
Booster Regressor | √ |
Booster Random Forest Regressor | √ |
Booster Random Forest Classifier | √ |
XGBRFClassifier | √ |
XGBRFRegressor | √ |
XGBRanker* | X |
- XGBRanker XGBoost models are currently supported via converting them to BYOP models.
XGBoost Schema Inputs
XGBoost schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. If a model is originally trained to accept inputs of different data types, it will need to be retrained to only accept one data type for each column - typically pa.float64()
is a good choice.
For example, the following DataFrame has 4 columns, each column a float
.
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 |
1 | 4.9 | 3.0 | 1.4 | 0.2 |
For submission to an XGBoost model, the data input schema will be a single array with 4 float values.
input_schema = pa.schema([
pa.field('inputs', pa.list_(pa.float32(), list_size=4))
])
When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.
Original DataFrame:
sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 |
1 | 4.9 | 3.0 | 1.4 | 0.2 |
Converted DataFrame:
inputs | |
---|---|
0 | [5.1, 3.5, 1.4, 0.2] |
1 | [4.9, 3.0, 1.4, 0.2] |
XGBoost Schema Outputs
Outputs for XGBoost are labeled based on the trained model outputs.
Outputs for XBoost that are meant to be predictions
or probabilities
must be labeled as part of the output schema. For example, a model that outputs either 1 or 0 as its output would have the output schema as follows:
output_schema = pa.schema([
pa.field('predictions', pa.float32()),
])
When used in Wallaroo, the inference result is contained in the out
metadata as out.predictions
.
pipeline.infer(dataframe)
time | in.inputs | out.predictions | anomaly.count | |
---|---|---|---|---|
0 | 2023-07-05 15:11:29.776 | [5.1, 3.5, 1.4, 0.2] | 0 | 0 |
1 | 2023-07-05 15:11:29.776 | [4.9, 3.0, 1.4, 0.2] | 0 | 0 |
Uploading XGBoost Models
XGBoost models are uploaded to Wallaroo through the Wallaroo Client upload_model
method.
Upload XGBoost Model Parameters
The following parameters are available for XGBoost models.
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | Set as the Framework.XGBOOST . |
input_schema | pyarrow.lib.Schema (Required) | The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema (Required) | The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) (Default: True) |
|
Once the upload process starts, the model is containerized by the Wallaroo instance. This process may take up to 10 minutes.
Upload XGBoost Model Return
upload_model
returns a wallaroo.model_version.ModelVersion
object with the following fields.
Field | Type | Description |
---|---|---|
name | String | The name of the model. |
version | String | The model version as a unique UUID. |
file_name | String | The file name of the model as stored in Wallaroo. |
SHA | String | The hash value of the model file. |
Status | String | The status of the model. |
image_path | String | The image used to deploy the model in the Wallaroo engine. |
last_update_time | DateTime | When the model was last updated. |
Upload XGBoost Model Example
The following example is of uploading a PyTorch ML Model to a Wallaroo instance.
input_schema = pa.schema([
pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])
output_schema = pa.schema([
pa.field('probabilities', pa.float64())
])
model = wl.upload_model('xgboost-classification',
'./models/model-auto-conversion_xgboost_xgb_classification_iris.pkl',
framework=Framework.XGBOOST,
input_schema=input_schema,
output_schema=output_schema)
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Model is attempting loading to a native runtime..incompatible
Model is pending loading to a container runtime.
Model is attempting loading to a container runtime............successful
Ready
Tutorials
The following tutorials are available to show different types of XGBoost models uploaded and deployed to Wallaroo.