Wallaroo SDK Essentials Guide: Model Uploads and Registrations: XGBoost

How to upload and use XGBoost ML Models with Wallaroo

Model Naming Requirements

Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must be lower case ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

Wallaroo supports XGBoost models. The following XGBoost models are either packaged into the Wallaroo Native Runtime, or are packaged into the Wallaroo Containerized Runtime.

Web Sitehttps://xgboost.ai/
Supported Libraries
  • scikit-learn==1.3.0
  • xgboost==1.7.4
FrameworkFramework.XGBOOST aka xgboost
Supported File Typespickle (XGB files are not supported.)

During the model upload process, Wallaroo optimizes models by converting them to the Wallaroo Native Runtime, if possible, or running the model directly in the Wallaroo Containerized Runtime. See the Model Deploy for details on how to configure pipeline resources based on the model’s runtime.

Since the Wallaroo 2024.1 release, XGBoost support is enhanced to performantly support a wider set of XGBoost models. XGBoost models are not required to be trained with ONNX nomenclature in order to successfully convert to a performant runtime.

XGBoost Types Support

The following XGBoost model types are supported by Wallaroo. XGBoost models not supported by Wallaroo are supported via the Arbitrary Python models, also known as Bring Your Own Predict (BYOP).

XGBoost Model TypeWallaroo Packaging Supported
Booster Classifier
Booster Classifier
Booster Regressor
Booster Random Forest Regressor
Booster Random Forest Classifier
  • XGBRanker XGBoost models are currently supported via converting them to BYOP models.

XGBoost Schema Inputs

XGBoost schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. If a model is originally trained to accept inputs of different data types, it will need to be retrained to only accept one data type for each column - typically pa.float64() is a good choice.

For example, the following DataFrame has 4 columns, each column a float.

 sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)

For submission to an XGBoost model, the data input schema will be a single array with 4 float values.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))

When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.

Original DataFrame:

 sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)

Converted DataFrame:

0[5.1, 3.5, 1.4, 0.2]
1[4.9, 3.0, 1.4, 0.2]

XGBoost Schema Outputs

Outputs for XGBoost are labeled based on the trained model outputs.

Outputs for XBoost that are meant to be predictions or probabilities must be labeled as part of the output schema. For example, a model that outputs either 1 or 0 as its output would have the output schema as follows:

output_schema = pa.schema([
    pa.field('predictions', pa.int32())

When used in Wallaroo, the inference result is contained in the out metadata as out.predictions.

02023-07-05 15:11:29.776[5.1, 3.5, 1.4, 0.2]00
12023-07-05 15:11:29.776[4.9, 3.0, 1.4, 0.2]00

Uploading XGBoost Models

XGBoost models are uploaded to Wallaroo through the Wallaroo Client upload_model method.

Upload XGBoost Model Parameters

The following parameters are available for XGBoost models.

namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)Set as the Framework.XGBOOST.
input_schemapyarrow.lib.Schema (Required)The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema (Required)The output schema in Apache Arrow schema format.
convert_waitbool (Optional) (Default: True)
  • True: Waits in the script for the model conversion completion.
  • False: Proceeds with the script without waiting for the model conversion process to display complete.

Once the upload process starts, the model is containerized by the Wallaroo instance. This process may take up to 10 minutes.

Upload XGBoost Model Return

upload_model returns a wallaroo.model_version.ModelVersion object with the following fields.

nameStringThe name of the model.
versionStringThe model version as a unique UUID.
file_nameStringThe file name of the model as stored in Wallaroo.
SHAStringThe hash value of the model file.
StatusStringThe status of the model.
image_pathStringThe image used to deploy the model in the Wallaroo engine.
last_update_timeDateTimeWhen the model was last updated.

Upload XGBoost Model Example

The following example is of uploading a PyTorch ML Model to a Wallaroo instance.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))

output_schema = pa.schema([
    pa.field('probabilities', pa.float64())

model = wl.upload_model('xgboost-classification', 

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Model is attempting loading to a native runtime..incompatible

Model is pending loading to a container runtime.
Model is attempting loading to a container runtime............successful



The following tutorials are available to show different types of XGBoost models uploaded and deployed to Wallaroo.