Wallaroo SDK Essentials Guide: Model Uploads and Registrations: Containerized MLFlow

How to upload and use Containerized MLFlow with Wallaroo

Parameter	Description
Web Site	https://mlflow.org
Supported Libraries	mlflow==1.3.0
Runtime	Containerized aka `mlflow`

For models that do not fall under the supported model frameworks, organizations can use containerized MLFlow ML Models.

This guide details how to add ML Models from a model registry service into Wallaroo.

Wallaroo supports both public and private containerized model registries. See the Wallaroo Private Containerized Model Container Registry Guide for details on how to configure a Wallaroo instance with a private model registry.

Wallaroo users can register their trained MLFlow ML Models from a containerized model container registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.

As of this time, Wallaroo only supports MLFlow 1.30.0 containerized models. For information on how to containerize an MLFlow model, see the MLFlow Documentation.

Model Naming Requirements

Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must lower case ASCII alpha-numeric characters or dash (-) only. . and _ are not allowed.

Containerized MLFlow Model Operations

Register a Containerized MLFlow Model

Parameter	Description
Web Site	https://mlflow.org
Supported Libraries	mlflow==1.3.0
Runtime	Containerized aka `mlflow`

For models that do not fall under the supported model frameworks, organizations can use containerized MLFlow ML Models.

This guide details how to add ML Models from a model registry service into Wallaroo.

Wallaroo users can register their trained MLFlow ML Models from a containerized model container registry into their Wallaroo instance and perform inferences with it through a Wallaroo pipeline.

As of this time, Wallaroo only supports MLFlow 1.30.0 containerized models. For information on how to containerize an MLFlow model, see the MLFlow Documentation.

Containerized MLFlow models are not uploaded, but registered from a container registry service. This is performed through the wallaroo.client.register_model_image(options), and wallaroo.model_version.configure(options) method.

IMPORTANT NOTICE

Models registered through the Wallaroo SDK are associated with the current workspace in the SDK session, assigned as the user’s Default Workspace by default. See Wallaroo SDK Essentials Guide: Workspace Management for full details on creating and working with workspaces.

Register a Containerized MLFlow Model Parameters

The following parameters must be set for wallaroo.client.register_model_image(options) and wallaroo.model_version.configure(options) for a Containerized MLFlow model to be registered in Wallaroo.

Register Model Image Parameters

Parameter	Type	Description
`model_name`	`string` (Required)	The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
`image`	`string` (Required)	The URL to the containerized MLFlow model in the MLFlow Registry..

Model Version Configuration Parameters

Model version configurations are updated with the wallaroo.model_version.config and include the following parameters. Most are optional unless specified.

Parameter	Type	Description
runtime	String (Optional)	The model runtime from wallaroo.framework, plus `mlflow` for MLFlow containerized model registrations. }
tensor_fields	(List[string]) (Optional)	A list of alternate input fields. For example, if the model accepts the input fields `['variable1', 'variable2']`, `tensor_fields` allows those inputs to be overridden to `['square_feet', 'house_age']`, or other values as required.
input_schema	pyarrow.lib.Schema	The input schema for the model in `pyarrow.lib.Schema` format.
output_schema	pyarrow.lib.Schema	The output schema for the model in `pyarrow.lib.Schema` format.
batch_config	(List[string]) (Optional)	Batch config is either `None` for multiple-input inferences, or `single` to accept an inference request with only one row of data.

For model version configuration for MLFlow models, the following must be defined:

runtime: Set as mlflow.
input_schema: The input schema from the Apache Arrow pyarrow.lib.Schema format.
output_schema: The output schema from the Apache Arrow pyarrow.lib.Schema format.

Register a Containerized MLFlow Model Returns

wallaroo.client.register_model_image(options) returns the model version. The model version refers to the version of the model object in Wallaroo. In Wallaroo, a model version update happens when we upload a new model file (artifact) against the same model object name.

Note that models are uploaded to the current workspace assigned in the SDK session. By default, this is the user’s Default Workspace.

Field	Type	Description
`id`	Integer	The numerical identifier of the model version.
`name`	string	The name of the model.
`version`	string	The model version as a unique UUID.
`file_name`	string	The file name of the model as stored in Wallaroo.
`image_path`	string	The image used to deploy the model in the Wallaroo engine.
`last_update_time`	DateTime	When the model was last updated.

Register a Containerized MLFlow Model Example

The following example demonstrates registering a Statsmodel model stored in a MLFLow container with a Wallaroo instance.

sm_input_schema = pa.schema([
  pa.field('temp', pa.float32()),
  pa.field('holiday', pa.uint8()),
  pa.field('workingday', pa.uint8()),
  pa.field('windspeed', pa.float32())
])

sm_output_schema = pa.schema([
    pa.field('predicted_mean', pa.float32())
])

sm_model = wl.register_model_image(
    name="mlflow-statmodels",
    image="ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1"
    ).configure("mlflow", 
            input_schema=sm_input_schema, 
            output_schema=sm_output_schema
    )

sm_model

Name	mlflowstatmodels
Version	eb1bcec8-63fe-4a82-98ea-fc4945786973
File Name	none
SHA	3afd13d9c5070679e284050cd099e84aa2e5cb7c08a788b21d6cb2397615d018
Status	ready
Image Path	ghcr.io/wallaroolabs/wallaroo_tutorials/mlflow-statsmodels-example:2023.1
Architecture	None
Updated At	2024-30-Jan 16:11:55

MLFlow Data Formats

When using containerized MLFlow models with Wallaroo, the inputs and outputs must be named. For example, the following output:

[-12.045839810372835]

Would need to be wrapped with the data values named:

[{"prediction": -12.045839810372835}]

A short sample code for wrapping data may be:

output_df = pd.DataFrame(prediction, columns=["prediction"])
return output_df

Pipeline Deployment Configurations

Pipeline deployments allocate resources from the cluster to the pipeline and its models with the wallaroo.pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig]) method. The wallaroo.deployment_config.DeploymentConfig.DeploymentConfigBuilder class creates DeploymentConfig settings such as the number of CPUs, the amount of RAM, the architecture, etc. For full details, see the Pipeline deployment configurations guides.

The settings for a pipeline configuration are dependent on whether the model is converted to the Native Runtime space, or Containerized Model Runtime space during the model upload process. The method wallaroo.model_config.runtime() displays which runtime the uploaded model was converted to.

Runtime	Type	Pipeline Deployment Details
`onnx`	Wallaroo Native	See Native Runtime Configuration Methods
`flight`	Wallaroo Container	See Containerized Runtime Configuration Methods

Wallaroo Native Runtime Deployment

Wallaroo Native Runtime models typically use the following settings for pipeline resource allocation. See See Native Runtime Configuration Methods for complete options.

Resource	Method	Description
Replicas	`wallaroo.deployment_config.DeploymentConfigBuilder.replica_count(count: int)`	The number of replicas of the Wallaroo Native pipeline resources to allocate. Each replica has the same number of cpus, ram, etc. For example: `DeploymentConfigBuilder.replica_count(2)`
Auto-allocated replicas	`wallaroo.deployment_config.DeploymentConfigBuilder.replica_autoscale_min_max(maximum: int, minimum: int = 0)`	Replicas that will auto-allocate more replicas to the pipeline from 0 to the set maximum as more inference requests are made.
CPU	`wallaroo.deployment_config.DeploymentConfigBuilder.cpus(core_count: float)`	Fractional number of cpus to allocate. For example: `DeploymentConfigBuilder.cpus(0.5)`
Memory	`wallaroo.deployment_config.DeploymentConfigBuilder.memory(memory_spec: string)`	Memory resources in Kubernetes Memory resource units
GPUs	`wallaroo.deployment_config.DeploymentConfigBuilder.gpus(core_count: int)`	Number of GPU’s to deploy; GPUs can only be deployed in whole increments. If used, must be paired with the `deployment_label` pipeline configuration option.
Deployment Label	`wallaroo.deployment_config.DeploymentConfigBuilder.deployment_label(label:string)`	Required if `gpus` are set and must match the GPU nodepool label.

The following example shows deploying a Native Wallaroo Runtime model with the pipeline configuration of one replica, half a cpu and 1 Gi of RAM.

Note that for native runtime models, total pipeline resources are shared by all the native runtime models for each replica.

model.config().runtime()

'onnx'

# add the model as a pipeline step
pipeline.add_model_step(model)

# DeploymentConfigBuilder is used to create the pipeline's deployment configuration object
from wallaroo.deployment_config import DeploymentConfigBuilder

# deploy using native runtime deployment
deployment_config_native = DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(0.5) \
    .memory('1Gi') \
    .build()

# deploy the pipeline with the pipeline configuration
pipeline.deploy(deployment_config=deployment_config_native)

Wallaroo Containerized Runtime Deployment

Wallaroo Containerized Runtime models typically use the following settings for pipeline resource allocation. See See Containerized Runtime Configuration Methods for complete options.

Containerized Runtime models resources are allocated with the sidekick name, with the containerized model specified for resources.

Resource	Method	Description
Replicas	`wallaroo.deployment_config.DeploymentConfigBuilder.replica_count(count: int)`	The number of replicas of the Wallaroo Native pipeline resources to allocate. Each replica has the same number of cpus, ram, etc.
Auto-allocated replicas	`wallaroo.deployment_config.DeploymentConfigBuilder.replica_autoscale_min_max(maximum: int, minimum: int = 0)`	Replicas that will auto-allocate more replicas to the pipeline from 0 to the set maximum as more inference requests are made.
CPU	`wallaroo.deployment_config.DeploymentConfigBuilder.sidekick_cpus(model: wallaroo.model.Model, core_count: float)`	Fractional number of cpus to allocate for the containerized model.
Memory	`wallaroo.deployment_config.DeploymentConfigBuilder.sidekick_memory(model: wallaroo.model.Model, memory_spec: string)`	Memory resources in Kubernetes Memory resource units
GPUs	`wallaroo.deployment_config.DeploymentConfigBuilder.sidekick_gpus(model: wallaroo.model.Model, core_count: int)`	Number of GPU’s to deploy; GPUs can only be deployed in whole increments. If used, must be paired with the `deployment_label` pipeline configuration option.
Deployment Label	`wallaroo.deployment_config.DeploymentConfigBuilder.deployment_label(label:string)`	Required if `gpus` are set and must match the GPU nodepool label.

The following example shows deploying a Containerized Wallaroo Runtime model with the pipeline configuration of one replica, half a cpu and 1 Gi of RAM.

Note that for containerized models, each containerized model’s resources are set independently of each other and duplicated for each pipeline replica, and are considered separate from the native runtime models.

model_native.config().runtime()

'onnx'

model_containerized.config().runtime()

'flight'

# add the models as a pipeline steps
pipeline.add_model_step(model_native)
pipeline.add_model_step(model_containerized)


# DeploymentConfigBuilder is used to create the pipeline's deployment configuration object
from wallaroo.deployment_config import DeploymentConfigBuilder

# deploy using containerized runtime deployment
deployment_config_containerized = DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(0.5) \ # shared by the native runtime models
    .memory('1Gi') \ # shared by the native runtime models
    .sidekick_cpus(model_containerized, 0.5) \ # 0.5 cpu allocated solely for the containerized model
    .sidekick_memory(model_containerized, '1Gi') \ #1 Gi allocated solely for the containerized model
    .build()

# deploy the pipeline with the pipeline configuration
pipeline.deploy(deployment_config=deployment_config_containerized)

Pipeline Deployment Timeouts

Pipeline deployments typically take 45 seconds for Wallaroo Native Runtimes, and 90 seconds for Wallaroo Containerized Runtimes.

If Wallaroo Pipeline deployment times out from a very large or complex ML model being deployed, the timeout is extended from with the wallaroo.Client.Client(request_timeout:int) setting, where request_timeout is in integer seconds. Wallaroo Native Runtime deployments are scaled at 1x the request_timeout setting. Wallaroo Containerized Runtimes are scaled at 2x the request_timeout setting.

The following example shows extending the request_timeout to 2 minutes.

wl = wallaroo.Client(request_timeout=120)

wl.timeout

120