Wallaroo SDK Essentials Guide: Model Uploads and Registrations: XGBoost

How to upload and use XGBoost ML Models with Wallaroo

Model Naming Requirements

Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must lower case ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

Wallaroo supports XGBoost models by containerizing the model and running as an image.

ParameterDescription
Web Sitehttps://xgboost.ai/
Supported Libraries
  • scikit-learn==1.3.0
  • xgboost==1.7.4
FrameworkFramework.XGBOOST aka xgboost
Supported File Typespickle (XGB files are not supported.)
Runtimeonnx / flight

During the model upload process, the Wallaroo instance will attempt to convert the model to a Native Wallaroo Runtime. If unsuccessful based , it will create a Wallaroo Containerized Runtime for the model. See the model deployment section for details on how to configure pipeline resources based on the model’s runtime.

XGBoost Schema Inputs

XGBoost schema follows a different format than other models. To prevent inputs from being out of order, the inputs should be submitted in a single row in the order the model is trained to accept, with all of the data types being the same. If a model is originally trained to accept inputs of different data types, it will need to be retrained to only accept one data type for each column - typically pa.float64() is a good choice.

For example, the following DataFrame has 4 columns, each column a float.

 sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)
05.13.51.40.2
14.93.01.40.2

For submission to an XGBoost model, the data input schema will be a single array with 4 float values.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

When submitting as an inference, the DataFrame is converted to rows with the column data expressed as a single array. The data must be in the same order as the model expects, which is why the data is submitted as a single array rather than JSON labeled columns: this insures that the data is submitted in the exact order as the model is trained to accept.

Original DataFrame:

 sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)
05.13.51.40.2
14.93.01.40.2

Converted DataFrame:

 inputs
0[5.1, 3.5, 1.4, 0.2]
1[4.9, 3.0, 1.4, 0.2]

XGBoost Schema Outputs

Outputs for XGBoost are labeled based on the trained model outputs. For this example, the output is simply a single output listed as output. In the Wallaroo inference result, it is grouped with the metadata out as out.output.

output_schema = pa.schema([
    pa.field('output', pa.int32())
])
pipeline.infer(dataframe)
 timein.inputsout.outputcheck_failures
02023-07-05 15:11:29.776[5.1, 3.5, 1.4, 0.2]00
12023-07-05 15:11:29.776[4.9, 3.0, 1.4, 0.2]00

Uploading XGBoost Models

XGBoost models are uploaded to Wallaroo through the Wallaroo Client upload_model method.

Upload XGBoost Model Parameters

The following parameters are required for XGBoost models. Note that while some fields are considered as optional for the upload_model method, they are required for proper uploading of a XGBoost model to Wallaroo.

ParameterTypeDescription
namestring (Required)The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
pathstring (Required)The path to the model file being uploaded.
frameworkstring (Required)Set as the Framework.XGBOOST.
input_schemapyarrow.lib.Schema (Required)The input schema in Apache Arrow schema format.
output_schemapyarrow.lib.Schema (Required)The output schema in Apache Arrow schema format.
convert_waitbool (Optional) (Default: True)
  • True: Waits in the script for the model conversion completion.
  • False: Proceeds with the script without waiting for the model conversion process to display complete.

Once the upload process starts, the model is containerized by the Wallaroo instance. This process may take up to 10 minutes.

Model Config Options

Model version configurations are updated with the wallaroo.model_version.config and include the following parameters. Most are optional unless specified.

ParameterTypeDescription
runtimeString (Optional)The model runtime from wallaroo.framework. }
tensor_fields(List[string]) (Optional)A list of alternate input fields. For example, if the model accepts the input fields ['variable1', 'variable2'], tensor_fields allows those inputs to be overridden to ['square_feet', 'house_age'], or other values as required.
input_schemapyarrow.lib.SchemaThe input schema for the model in pyarrow.lib.Schema format.
output_schemapyarrow.lib.SchemaThe output schema for the model in pyarrow.lib.Schema format.
batch_config(List[string]) (Optional)Batch config is either None for multiple-input inferences, or single to accept an inference request with only one row of data.

Upload XGBoost Model Return

The following is returned with a successful model upload and conversion.

FieldTypeDescription
namestringThe name of the model.
versionstringThe model version as a unique UUID.
file_namestringThe file name of the model as stored in Wallaroo.
image_pathstringThe image used to deploy the model in the Wallaroo engine.
last_update_timeDateTimeWhen the model was last updated.

Upload XGBoost Model Example

The following example is of uploading a PyTorch ML Model to a Wallaroo instance.

input_schema = pa.schema([
    pa.field('inputs', pa.list_(pa.float64(), list_size=4))
])

output_schema = pa.schema([
    pa.field('output', pa.float64())
])

model = wl.upload_model('xgboost-classification', 
                        './models/model-auto-conversion_xgboost_xgb_classification_iris.pkl', 
                        framework=Framework.XGBOOST, 
                        input_schema=input_schema, 
                        output_schema=output_schema)

Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a native runtime..
Model is attempting loading to a native runtime..incompatible

Model is pending loading to a container runtime.
Model is attempting loading to a container runtime............successful

Ready

Pipeline Deployment Configurations

Pipeline deployments allocate resources from the cluster to the pipeline and its models with the wallaroo.pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]) method. The wallaroo.deployment_config.DeploymentConfig.DeploymentConfigBuilder class creates DeploymentConfig settings such as the number of CPUs, the amount of RAM, the architecture, etc. For full details, see the Pipeline deployment configurations guides.

The settings for a pipeline configuration are dependent on whether the model is converted to the Native Runtime space, or Containerized Model Runtime space during the model upload process. The method wallaroo.model_config.runtime() displays which runtime the uploaded model was converted to.

RuntimeTypePipeline Deployment Details
onnxWallaroo NativeSee Native Runtime Configuration Methods
flightWallaroo ContainerSee Containerized Runtime Configuration Methods

Wallaroo Native Runtime Deployment

Wallaroo Native Runtime models typically use the following settings for pipeline resource allocation. See See Native Runtime Configuration Methods for complete options.

ResourceMethodDescription
Replicaswallaroo.deployment_config.DeploymentConfigBuilder.replica_count(count: int)The number of replicas of the Wallaroo Native pipeline resources to allocate. Each replica has the same number of cpus, ram, etc. For example: DeploymentConfigBuilder.replica_count(2)
Auto-allocated replicaswallaroo.deployment_config.DeploymentConfigBuilder.replica_autoscale_min_max(maximum: int, minimum: int = 0)Replicas that will auto-allocate more replicas to the pipeline from 0 to the set maximum as more inference requests are made.
CPUwallaroo.deployment_config.DeploymentConfigBuilder.cpus(core_count: float)Fractional number of cpus to allocate. For example: DeploymentConfigBuilder.cpus(0.5)
Memorywallaroo.deployment_config.DeploymentConfigBuilder.memory(memory_spec: string)Memory resources in Kubernetes Memory resource units
GPUswallaroo.deployment_config.DeploymentConfigBuilder.gpus(core_count: int)Number of GPU’s to deploy; GPUs can only be deployed in whole increments. If used, must be paired with the deployment_label pipeline configuration option.
Deployment Labelwallaroo.deployment_config.DeploymentConfigBuilder.deployment_label(label:string)Required if gpus are set and must match the GPU nodepool label.

The following example shows deploying a Native Wallaroo Runtime model with the pipeline configuration of one replica, half a cpu and 1 Gi of RAM.

Note that for native runtime models, total pipeline resources are shared by all the native runtime models for each replica.

model.config().runtime()

'onnx'

# add the model as a pipeline step
pipeline.add_model_step(model)

# DeploymentConfigBuilder is used to create the pipeline's deployment configuration object
from wallaroo.deployment_config import DeploymentConfigBuilder

# deploy using native runtime deployment
deployment_config_native = DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(0.5) \
    .memory('1Gi') \
    .build()

# deploy the pipeline with the pipeline configuration
pipeline.deploy(deployment_config=deployment_config_native)

Wallaroo Containerized Runtime Deployment

Wallaroo Containerized Runtime models typically use the following settings for pipeline resource allocation. See See Containerized Runtime Configuration Methods for complete options.

Containerized Runtime models resources are allocated with the sidekick name, with the containerized model specified for resources.

ResourceMethodDescription
Replicaswallaroo.deployment_config.DeploymentConfigBuilder.replica_count(count: int)The number of replicas of the Wallaroo Native pipeline resources to allocate. Each replica has the same number of cpus, ram, etc.
Auto-allocated replicaswallaroo.deployment_config.DeploymentConfigBuilder.replica_autoscale_min_max(maximum: int, minimum: int = 0)Replicas that will auto-allocate more replicas to the pipeline from 0 to the set maximum as more inference requests are made.
CPUwallaroo.deployment_config.DeploymentConfigBuilder.sidekick_cpus(model: wallaroo.model.Model, core_count: float)Fractional number of cpus to allocate for the containerized model.
Memorywallaroo.deployment_config.DeploymentConfigBuilder.sidekick_memory(model: wallaroo.model.Model, memory_spec: string)Memory resources in Kubernetes Memory resource units
GPUswallaroo.deployment_config.DeploymentConfigBuilder.sidekick_gpus(model: wallaroo.model.Model, core_count: int)Number of GPU’s to deploy; GPUs can only be deployed in whole increments. If used, must be paired with the deployment_label pipeline configuration option.
Deployment Labelwallaroo.deployment_config.DeploymentConfigBuilder.deployment_label(label:string)Required if gpus are set and must match the GPU nodepool label.

The following example shows deploying a Containerized Wallaroo Runtime model with the pipeline configuration of one replica, half a cpu and 1 Gi of RAM.

Note that for containerized models, each containerized model’s resources are set independently of each other and duplicated for each pipeline replica, and are considered separate from the native runtime models.

model_native.config().runtime()

'onnx'

model_containerized.config().runtime()

'flight'

# add the models as a pipeline steps
pipeline.add_model_step(model_native)
pipeline.add_model_step(model_containerized)


# DeploymentConfigBuilder is used to create the pipeline's deployment configuration object
from wallaroo.deployment_config import DeploymentConfigBuilder

# deploy using containerized runtime deployment
deployment_config_containerized = DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(0.5) \ # shared by the native runtime models
    .memory('1Gi') \ # shared by the native runtime models
    .sidekick_cpus(model_containerized, 0.5) \ # 0.5 cpu allocated solely for the containerized model
    .sidekick_memory(model_containerized, '1Gi') \ #1 Gi allocated solely for the containerized model
    .build()

# deploy the pipeline with the pipeline configuration
pipeline.deploy(deployment_config=deployment_config_containerized)

Pipeline Deployment Timeouts

Pipeline deployments typically take 45 seconds for Wallaroo Native Runtimes, and 90 seconds for Wallaroo Containerized Runtimes.

If Wallaroo Pipeline deployment times out from a very large or complex ML model being deployed, the timeout is extended from with the wallaroo.Client.Client(request_timeout:int) setting, where request_timeout is in integer seconds. Wallaroo Native Runtime deployments are scaled at 1x the request_timeout setting. Wallaroo Containerized Runtimes are scaled at 2x the request_timeout setting.

The following example shows extending the request_timeout to 2 minutes.

wl = wallaroo.Client(request_timeout=120)

wl.timeout

120