Deployment Configuration with the Wallaroo Dashboard
Deployment Configuration via the Wallaroo Dashboard
Pipeline deployment configurations are modified through the Wallaroo Dashboard Pipeline Details page. The following preconditions must be met before editing the deployment configuration through the user interface:
- The pipeline is previously deployed through the Wallaroo SDK or the Wallaroo MLOps API.
- The pipeline is currently undeployed.
Editing Deployment Configuration Steps
The following steps are used for updating the a pipeline’s deployment configuration through the Wallaroo Dashboard.
- From the Wallaroo Dashboard, select the workspace the target pipeline is associated with.
- Select View Pipelines.
- Select the pipeline to update.
- From the Details page, verify that the pipeline is Undeployed - the Deploy/Undeploy button will display Deploy if the pipeline is currently undeployed.
- Scroll down to Deployment Configuration and select Edit.
- Edit each field as required. It is highly recommended to only edit existing settings when possible and make major modifications through the Wallaroo SDK or Wallaroo MLOps API.
- When finished, select Save and Deploy. The pipeline will be deployed as a new verison with the new deployment configuration.
Edit Configuration Deployment Examples
Edit Native Runtime Deployment Configuration Example
The following demonstrates editing the deployment configuration for a Wallaroo Native Runtime deployment.
Edit Containerized Runtime Deployment Configuration Example
The following demonstrates editing the deployment configuration for a Wallaroo Containerized Runtime deployment.
Deployment Configuration Parameters
The following deployment configurations parameters are available for editing. Before starting, the following conditions must be noted:
- Deployment configurations are only available to previously deployed pipelines, whether they were were deployed through the Wallaroo SDK or the Wallaroo MLOps API.
- Deployment configurations are only editible through the Wallaroo Dashboard when the pipeline is undeployed.
- Field and value types must match the deployment configurations and types. For example: string values for labels, integer values for gpus, etc. The following tables show the deployment configuration parameters for Wallaroo Native Runtimes and Wallaroo Containerized Runtimes.
Deployment configuration parameters fall under the following elements:
engine
: These elements are specific to Wallaroo Native Runtimes.engineAux
: These elements are specific to Wallaroo Containerized Runtimes.
The following elements are not editable from the Wallaroo Dashboard Pipeline Details page:
workspace_id
engine_lb
The following examples show different deployment parameters based on the Runtime and configurations.
- Native Runtime Deployment Configuration: All models run on Wallaroo Native Runtime.
{
"engine": {
"cpu": 0.25,
"arch": "x86",
"accel": "none",
"resources": {
"limits": {
"cpu": 0.25,
"memory": "4Gi"
},
"requests": {
"cpu": 0.25,
"memory": "4Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {}
},
"workspace_id": 9,
"node_selector": {}
}
- Containerized Runtime Deployment Configuration: A model is deployed to the Wallaroo Containerized Runtime with no models deployed to the Wallaroo Native runtime.
{
"engine": {
"cpu": 0.25,
"arch": "x86",
"accel": "none",
"resources": {
"limits": {
"cpu": 0.25,
"memory": "1Gi"
},
"requests": {
"cpu": 0.25,
"memory": "1Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {
"clip-vit-2": {
"arch": "x86",
"accel": "none",
"resources": {
"limits": {
"cpu": 2,
"memory": "4Gi"
},
"requests": {
"cpu": 2,
"memory": "4Gi"
}
}
}
}
},
"workspace_id": 10,
"node_selector": {}
}
- Native Runtime Deployment Configuration: All models run on Wallaroo Native Runtime.
{
"engine": {
"cpu": 4,
"arch": "x86",
"accel": "none",
"replicas": 5,
"resources": {
"limits": {
"cpu": 4,
"memory": "3Gi"
},
"requests": {
"cpu": 4,
"memory": "3Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {}
},
"workspace_id": 9,
"node_selector": {}
}
- Containerized Runtime Deployment Configuration: A model is deployed to the Wallaroo Containerized Runtime with no models deployed to the Wallaroo Native runtime.
{
"engine": {
"cpu": 0.25,
"arch": "x86",
"accel": "none",
"replicas": 5,
"resources": {
"limits": {
"cpu": 0.25,
"memory": "1Gi"
},
"requests": {
"cpu": 0.25,
"memory": "1Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {
"clip-vit-2": {
"arch": "x86",
"accel": "none",
"resources": {
"limits": {
"cpu": 4,
"memory": "3Gi"
},
"requests": {
"cpu": 4,
"memory": "3Gi"
}
}
}
}
},
"workspace_id": 10,
"node_selector": {}
}
- Native Runtime Deployment Configuration: All models run on Wallaroo Native Runtime.
{
"engine": {
"cpu": "0.5",
"gpu": 1,
"arch": "x86",
"accel": "none",
"replicas": 5,
"resources": {
"limits": {
"cpu": "0.5",
"nvidia.com/gpu":1,
"memory": "2Gi"
},
"requests": {
"cpu": "0.5",
"nvidia.com/gpu":1,
"memory": "2Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {}
},
"workspace_id": 10,
"node_selector":"wallaroo.ai/accelerator: t4",
}
- Containerized Runtime Deployment Configuration: A model is deployed to the Wallaroo Containerized Runtime with no models deployed to the Wallaroo Native runtime.
{
"engine": {
"cpu": 0.25,
"arch": "x86",
"accel": "none",
"replicas": 5,
"resources": {
"limits": {
"cpu": 0.25,
"memory": "1Gi"
},
"requests": {
"cpu": 0.25,
"memory": "1Gi"
}
},
},
"enginelb": {},
"engineAux": {
"images": {
"llama-cpp-sdk-3": {
"arch": "x86",
"accel": "none",
"resources": {
"limits": {
"cpu": 4,
"nvidia.com/gpu":1,
"memory": "10Gi"
},
"requests": {
"cpu": 4,
"nvidia.com/gpu":1,
"memory": "10Gi"
}
}
}
}
},
"workspace_id": 10,
"node_selector":"wallaroo.ai/accelerator: t4"
}
- Native Runtime Deployment Configuration: All models run on Wallaroo Native Runtime.
{
"engine": {
"cpu": 0.25,
"arch": "x86",
"accel": "none",
"autoscale":{
"type":"cpu"
"replica_max": 5
"replica_min": 0
"cpu_utilization": 75
}
"replicas": 2,
"resources": {
"limits": {
"cpu": 0.25,
"memory": "1Gi"
},
"requests": {
"cpu": 0.25,
"memory": "1Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {}
},
"workspace_id": 10,
"node_selector": {}
}
- Containerized Runtime Deployment Configuration: A model is deployed to the Wallaroo Containerized Runtime with no models deployed to the Wallaroo Native runtime.
{
"engine": {
"cpu": 0.25,
"arch": "x86",
"accel": "none",
"autoscale":{
"type":"cpu"
"replica_max": 5
"replica_min": 0
"cpu_utilization": 75
}
"replicas": 2,
"resources": {
"limits": {
"cpu": 0.25,
"memory": "1Gi"
},
"requests": {
"cpu": 0.25,
"memory": "1Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {
"clip-vit-2": {
"arch": "x86",
"accel": "none",
"resources": {
"limits": {
"cpu": 2,
"memory": "4Gi"
},
"requests": {
"cpu": 2,
"memory": "4Gi"
}
}
}
}
},
"workspace_id": 10,
"node_selector": {}
}
When autoscaling with GPU, the recommended parameters are scale_up_queue_depth
, scale_down_queue_depth
and autoscaling_window
. For more details, see Wallaroo Deployment via the Wallaroo SDK: Deployment Replicas and Autoscale.
- Native Runtime Deployment Configuration: All models run on Wallaroo Native Runtime.
{
"engine": {
"cpu": 0.25,
"gpu": 1,
"arch": "x86",
"accel": "none",
"autoscale":{
"type": "queue",
"replica_max": 2,
"replica_min": 0,
"autoscaling_window": 60,
"scale_up_queue_depth": 5,
"scale_down_queue_depth": 1
}
"resources": {
"limits": {
"cpu": 0.25,
"nvidia.com/gpu":1,
"memory": "1Gi"
},
"requests": {
"cpu": 0.25,
"nvidia.com/gpu":1,
"memory": "1Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {}
},
"workspace_id": 10,
"node_selector":"wallaroo.ai/accelerator: t4"
}
- Containerized Runtime Deployment Configuration: A model is deployed to the Wallaroo Containerized Runtime with no models deployed to the Wallaroo Native runtime.
{
"engine": {
"cpu": 0.25,
"arch": "x86",
"accel": "none",
"autoscale":{
"type": "queue",
"replica_max": 2,
"replica_min": 0,
"autoscaling_window": 60,
"scale_up_queue_depth": 5,
"scale_down_queue_depth": 1
},
"resources": {
"limits": {
"cpu": 0.25,
"memory": "1Gi"
},
"requests": {
"cpu": 0.25,
"memory": "1Gi"
}
}
},
"enginelb": {},
"engineAux": {
"images": {
"clip-vit-2": {
"arch": "x86",
"accel": "none",
"resources": {
"limits": {
"cpu": 2,
"nvidia.com/gpu":1,
"memory": "4Gi"
},
"requests": {
"cpu": 2,
"nvidia.com/gpu":1,
"memory": "4Gi"
}
}
}
}
},
"workspace_id": 10,
"node_selector":"wallaroo.ai/accelerator: t4"
}
Deployment Replicas and Autoscale Parameters
The following parameters are available for controlling replicas and autoscaling options. Note that certain options are mutually exclusive - for example, engine.replicas
are mutually exclusive with engine.autoscale.replica_max
and engine.autoscale.replica_min
. For more details, see Wallaroo Deployment via the Wallaroo SDK: Deployment Replicas and Autoscale.
Replica and autoscale settings apply to both Native and Containerized Runtimes.
Parameters | Type | Description | Related Parameters |
---|---|---|---|
engine.replicas | Integer | The number of replicas to deploy. This allows for multiple deployments of the same models to be deployed to increase inferences through parallelization. | None |
engine.autoscale.type | String | The type of autoscaling. Defaults to cpu . Valid options include:
| None |
engine.autoscale.replica_max | Integer | The maximum number of replicas scaled from 0 to some maximum number of replicas. This allows deployments to spin up additional replicas as more resources are required, then spin them back down to save on resources and costs. | None |
engine.autoscale.replica_min | Integer | The minimum number of replicas scaled from the replica_min to some maximum number of replicas. This allows deployments to spin up additional replicas as more resources are required, then spin them back down to save on resources and costs. | None |
engine.autoscale.cpu_utilization | Float | Sets the average CPU percentage metric for when to load or unload another replica. | None |
engine.autoscale.scale_up_queue_depth | Integer | The queue trigger for autoscaling additional replicas up. This requires the deployment configuration parameter replica_autoscale_min_max is set. | None |
engine.autoscale.scale_down_queue_depth | Integer Default: 1 | Only applies with scale_up_queue_depth is configured. The queue trigger for autoscaling replicas down. | None |
engine.autoscale.autoscaling_window | Integer (Default: 300, Minimum allowed: 60) | The period over which to scale up or scale down resources. Only applies when scale_up_queue_depth is configured. | None |
Native Runtime Configuration Parameters
The following parameters are available for Wallaroo Native Runtime deployments. Note that resources assigned to the Wallaroo Native Runtime are shared with all models that run in the Native Runtime.
Related Parameters must be edited together. For example, engine.cpu
settings must match the ones for engine.resources.limits.cpu
and engine.requests.cpu
.
The following is a sample Native Runtime Configuration, followed by a table of the Native Runtime Configuration parameters.
Parameters | Type | Description | Related Parameters |
---|---|---|---|
engine.cpu | Float | The fractional number of cpus assigned to the Wallaroo Native Runtime per replica. |
|
engine.gpu | Bool | Whether to assign a GPU to the Wallaroo Native Runtime. For GPU configurations the default is NVIDIA when no acceleration is specified. For other GPU configurations please see Inference with Acceleration Libraries during model upload. | |
engine.resources.limits.memory | String | Sets the amount of RAM to allocate the deployment. The memory_spec string is in the format “{size as number}{unit value}”. The accepted unit values are:
| engine.requests.memory |
Containerized Runtime Configuration Methods
The following editable are available for Wallaroo Containerized Runtime deployments. Note that resources assigned to the Wallaroo Containerized Runtime are specific per model. For example, one model may have more cpus and memory assigned to another model, and those resources are exclusive to each model in the Containerized Runtime.
Related Parameters must be edited together. For Containerized Runtime settings, the resources are assigned to each model, so each setting is in the format engineAux.images.{model name}.parameter
- for example, the number of cpus assigned to a model named sample-llm
would be engineAux.images.sample-llm.resources.limits.cpu
and engineAux.images.sample-llm-requests.cpu
.
The following is a sample Containerized Runtime Configuration, followed by a table of the Containerized Runtime Configuration parameters.
Parameters | Type | Description | Related Parameters |
---|---|---|---|
engineAux.images.{model_name}.resources.limits.cpu | Float | The fractional number of cpus assigned to the model per replica. |
|
engineAux.images.{model_name}.resources.gpu | Bool | Whether to assign a GPU to the model. For GPU configurations the default is NVIDIA when no acceleration is specified. For other GPU configurations please see Inference with Acceleration Libraries during model upload. | |
engineAux.images.{model_name}.resources.limits.memory | String | Sets the amount of RAM to allocate to the model. The memory_spec string is in the format “{size as number}{unit value}”. The accepted unit values are:
| engineAux.images.{model_name}.requests.memory |
Troubleshooting
Uneditable Fields
The following fields can not be edited through the Wallaroo Dashboard Pipeline Details page:
workspace_id
engine_lb