wallaroo.deployment_config
Inherited Members
- builtins.dict
- get
- setdefault
- pop
- popitem
- keys
- items
- values
- update
- fromkeys
- clear
- copy
Configures the minimum and maximum for autoscaling
Sets the average CPU metric to scale on in a percentage
Sets the number of GPUs to be used for the model's sidekick container. Only affects image-based models (e.g. MLFlow models) in a deployment.
Parameters
- ModelVersion model_version: The sidekick model to configure.
- int core_count: Number of GPUs to use in this sidekick.
Returns
This DeploymentConfigBuilder instance for chaining.
Sets the number of CPUs to be used for the model's sidekick container. Only affects image-based models (e.g. MLFlow models) in a deployment.
Parameters
- ModelVersion model_version: The sidekick model to configure.
- int core_count: Number of CPU cores to use in this sidekick.
Returns
This DeploymentConfigBuilder instance for chaining.
Sets the memory to be used for the model's sidekick container. Only affects image-based models (e.g. MLFlow models) in a deployment.
Parameters
- ModelVersion model_version: The sidekick model to configure.
- str memory_spec: Specification of amount of memory (e.g., "2Gi", "500Mi") to use in this sidekick.
Returns
This DeploymentConfigBuilder instance for chaining.
Sets the environment variables to be set for the model's sidekick container. Only affects image-based models (e.g. MLFlow models) in a deployment.
Parameters
- ModelVersion model_version: The sidekick model to configure.
- Dict[str, str] environment: Dictionary of environment variables names and their corresponding values to be set in the sidekick container.
Returns
This DeploymentConfigBuilder instance for chaining.
Sets the machine architecture for the model's sidekick container. Only affects image-based models (e.g. MLFlow models) in a deployment.
Parameters
- model_version: ModelVersion: The sidekick model to configure.
- arch: Optional[Architecture]: Machine architecture for this sidekick.
Returns
This DeploymentConfigBuilder instance for chaining.
Sets the acceleration option for the model's sidekick container. Only affects image-based models (e.g. MLFlow models) in a deployment.
Parameters
- model_version: ModelVersion: The sidekick model to configure.
- accel: Optional[Acceleration]: Acceleration option for this sidekick.
Returns
This DeploymentConfigBuilder instance for chaining.
Configure the scale_up_queue_depth threshold as an autoscaling trigger.
This method sets a queue depth threshold above which all pipeline components (including the engine and LLM sidekicks) will incrementally scale up.
The scale_up_queue_depth is calculated as: (number of requests in queue + requests being processed) / number of available replicas over a scaling window.
Notes: - This parameter must be configured to activate queue-based autoscaling. - No default value is provided. - When configured, scale_up_queue_depth overrides the default autoscaling trigger (cpu_utilization). - The setting applies to all components of the pipeline. - When set, scale_down_queue_depth is automatically set to 1 if not already configured.
Parameters
- queue_depth (int): The threshold value for queue-based autoscaling.
Returns
DeploymentConfigBuilder: The current instance for method chaining.
Configure the scale_down_queue_depth threshold as an autoscaling trigger.
This method sets a queue depth threshold below which all pipeline components (including the engine and LLM sidekicks) will incrementally scale down.
The scale_down_queue_depth is calculated as: (number of requests in queue + requests being processed) / number of available replicas over a scaling window.
Notes: - This parameter is optional and defaults to 1 if not set. - scale_down_queue_depth is only applicable when scale_up_queue_depth is configured. - The setting applies to all components of the pipeline. - This threshold helps prevent unnecessary scaling down when the workload is still significant but below the scale-up threshold.
Parameters
- queue_depth (int): The threshold value for queue-based downscaling.
Returns
DeploymentConfigBuilder: The current instance for method chaining.
Raises
- ValueError: If scale_up_queue_depth is not configured.
Configure the autoscaling window for incrementally scaling up/down pipeline components.
This method sets the time window over which the autoscaling metrics are evaluated for making scaling decisions. It applies to all components of the pipeline, including the engine and LLM sidekicks.
Notes: - The default value is 300 seconds if not specified. - This setting is only applicable when scale_up_queue_depth is configured. - The autoscaling window helps smooth out short-term fluctuations in workload and prevents rapid scaling events.
Parameters
- window_seconds: Optional[int], the duration of the autoscaling window in seconds. If None, the default value of 300 seconds is used.
Returns
DeploymentConfigBuilder: The current instance for method chaining.
Raises
- ValueError: If scale_up_queue_depth is not configured.