Wallaroo SDK Upload Tutorial: RF Regressor Tutorial
How to upload a RF Regressor Tutorial to Wallaroo
The following tutorials cover how to upload sample XGBoost models.
| Parameter | Description |
|---|---|
| Web Site | https://xgboost.ai/ |
| Supported Libraries | xgboost==1.7.4 |
| Framework | Framework.XGBOOST aka xgboost |
| Supported File Types | pickle (XGB files are not supported.) |
| Runtime | Containerized aka tensorflow / mlflow |
XGBoost schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. If a model is originally trained to accept inputs of different data types, it will need to be retrained to only accept one data type for each column - typically pa.float64() is a good choice.
For example, the following DataFrame has 4 columns, each column a float.
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 |
For submission to an XGBoost model, the data input schema will be a single array with 4 float values.
input_schema = pa.schema([
pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])
When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.
Original DataFrame:
| sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 |
Converted DataFrame:
| inputs | |
|---|---|
| 0 | [5.1, 3.5, 1.4, 0.2] |
| 1 | [4.9, 3.0, 1.4, 0.2] |
Outputs for XGBoost are labeled based on the trained model outputs. For this example, the output is simply a single output listed as output. In the Wallaroo inference result, it is grouped with the metadata out as out.output.
output_schema = pa.schema([
pa.field('output', pa.int32())
])
pipeline.infer(dataframe)
| time | in.inputs | out.output | check_failures | |
|---|---|---|---|---|
| 0 | 2023-07-05 15:11:29.776 | [5.1, 3.5, 1.4, 0.2] | 0 | 0 |
| 1 | 2023-07-05 15:11:29.776 | [4.9, 3.0, 1.4, 0.2] | 0 | 0 |
How to upload a RF Regressor Tutorial to Wallaroo
How to upload a XGBoost Classification to Wallaroo
How to upload a XGBoost Regressor to Wallaroo
How to upload a XGBoost RF Classification to Wallaroo