Wallaroo SDK Upload Tutorials: XGBoost

How to upload different XGBoost models to Wallaroo.

The following tutorials cover how to upload sample XGBoost models.

Parameter Description
Web Site https://xgboost.ai/
Supported Libraries xgboost==1.7.4
Framework Framework.XGBOOST aka xgboost
Supported File Types pickle (XGB files are not supported.)
Runtime Containerized aka tensorflow / mlflow

XGBoost Schema Inputs

XGBoost schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. If a model is originally trained to accept inputs of different data types, it will need to be retrained to only accept one data type for each column - typically pa.float64() is a good choice.

For example, the following DataFrame has 4 columns, each column a float.

  sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2

For submission to an XGBoost model, the data input schema will be a single array with 4 float values.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.

Original DataFrame:

  sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2

Converted DataFrame:

  inputs
0 [5.1, 3.5, 1.4, 0.2]
1 [4.9, 3.0, 1.4, 0.2]

XGBoost Schema Outputs

Outputs for XGBoost are labeled based on the trained model outputs. For this example, the output is simply a single output listed as output. In the Wallaroo inference result, it is grouped with the metadata out as out.output.

output_schema = pa.schema([
    pa.field('output', pa.int32())
])
pipeline.infer(dataframe)
  time in.inputs out.output check_failures
0 2023-07-05 15:11:29.776 [5.1, 3.5, 1.4, 0.2] 0 0
1 2023-07-05 15:11:29.776 [4.9, 3.0, 1.4, 0.2] 0 0

Wallaroo SDK Upload Tutorial: RF Regressor Tutorial

How to upload a RF Regressor Tutorial to Wallaroo

Wallaroo SDK Upload Tutorial: XGBoost Classification

How to upload a XGBoost Classification to Wallaroo

Wallaroo SDK Upload Tutorial: XGBoost Regressor

How to upload a XGBoost Regressor to Wallaroo

Wallaroo SDK Upload Tutorial: XGBoost RF Classification

How to upload a XGBoost RF Classification to Wallaroo