Model Naming Requirements
Model names map onto Kubernetes objects, and must be DNS compliant. The strings for model names must lower case ASCII alpha-numeric characters or dash (-) only. .
and _
are not allowed.
Wallaroo supports Hugging Face models by containerizing the model and running as an image.
Parameter | Description |
---|---|
Web Site | https://huggingface.co/models |
Supported Libraries |
|
Frameworks | The following Hugging Face pipelines are supported by Wallaroo.
|
Runtime | Containerized flight |
During the model upload process, the Wallaroo instance will attempt to convert the model to a Native Wallaroo Runtime. If unsuccessful based , it will create a Wallaroo Containerized Runtime for the model. See the model deployment section for details on how to configure pipeline resources based on the model’s runtime.
Hugging Face Schemas
Input and output schemas for each Hugging Face pipeline are defined below. Note that adding additional inputs not specified below will raise errors, except for the following:
Framework.HUGGING_FACE_IMAGE_TO_TEXT
Framework.HUGGING_FACE_TEXT_CLASSIFICATION
Framework.HUGGING_FACE_SUMMARIZATION
Framework.HUGGING_FACE_TRANSLATION
Additional inputs added to these Hugging Face pipelines will be added as key/pair value arguments to the model’s generate method. If the argument is not required, then the model will default to the values coded in the original Hugging Face model’s source code.
See the Hugging Face Pipeline documentation for more details on each pipeline and framework.
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_FEATURE_EXTRACTION |
Schemas:
input_schema = pa.schema([
pa.field('inputs', pa.string())
])
output_schema = pa.schema([
pa.field('output', pa.list_(
pa.list_(
pa.float64(),
list_size=128
),
))
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_IMAGE_CLASSIFICATION |
Schemas:
input_schema = pa.schema([
pa.field('inputs', pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=100
),
list_size=100
)),
pa.field('top_k', pa.int64()),
])
output_schema = pa.schema([
pa.field('score', pa.list_(pa.float64(), list_size=2)),
pa.field('label', pa.list_(pa.string(), list_size=2)),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_IMAGE_SEGMENTATION |
Schemas:
input_schema = pa.schema([
pa.field('inputs',
pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=100
),
list_size=100
)),
pa.field('threshold', pa.float64()),
pa.field('mask_threshold', pa.float64()),
pa.field('overlap_mask_area_threshold', pa.float64()),
])
output_schema = pa.schema([
pa.field('score', pa.list_(pa.float64())),
pa.field('label', pa.list_(pa.string())),
pa.field('mask',
pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=100
),
list_size=100
),
)),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_IMAGE_TO_TEXT |
Any parameter that is not part of the required inputs
list will be forwarded to the model as a key/pair value to the underlying models generate
method. If the additional input is not supported by the model, an error will be returned.
Schemas:
input_schema = pa.schema([
pa.field('inputs', pa.list_( #required
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=100
),
list_size=100
)),
# pa.field('max_new_tokens', pa.int64()), # optional
])
output_schema = pa.schema([
pa.field('generated_text', pa.list_(pa.string())),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_OBJECT_DETECTION |
Schemas:
input_schema = pa.schema([
pa.field('inputs',
pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=100
),
list_size=100
)),
pa.field('threshold', pa.float64()),
])
output_schema = pa.schema([
pa.field('score', pa.list_(pa.float64())),
pa.field('label', pa.list_(pa.string())),
pa.field('box',
pa.list_( # dynamic output, i.e. dynamic number of boxes per input image, each sublist contains the 4 box coordinates
pa.list_(
pa.int64(),
list_size=4
),
),
),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_QUESTION_ANSWERING |
Schemas:
input_schema = pa.schema([
pa.field('question', pa.string()),
pa.field('context', pa.string()),
pa.field('top_k', pa.int64()),
pa.field('doc_stride', pa.int64()),
pa.field('max_answer_len', pa.int64()),
pa.field('max_seq_len', pa.int64()),
pa.field('max_question_len', pa.int64()),
pa.field('handle_impossible_answer', pa.bool_()),
pa.field('align_to_words', pa.bool_()),
])
output_schema = pa.schema([
pa.field('score', pa.float64()),
pa.field('start', pa.int64()),
pa.field('end', pa.int64()),
pa.field('answer', pa.string()),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_STABLE_DIFFUSION_TEXT_2_IMG |
Schemas:
input_schema = pa.schema([
pa.field('prompt', pa.string()),
pa.field('height', pa.int64()),
pa.field('width', pa.int64()),
pa.field('num_inference_steps', pa.int64()), # optional
pa.field('guidance_scale', pa.float64()), # optional
pa.field('negative_prompt', pa.string()), # optional
pa.field('num_images_per_prompt', pa.string()), # optional
pa.field('eta', pa.float64()) # optional
])
output_schema = pa.schema([
pa.field('images', pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=128
),
list_size=128
)),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_SUMMARIZATION |
Any parameter that is not part of the required inputs
list will be forwarded to the model as a key/pair value to the underlying models generate
method. If the additional input is not supported by the model, an error will be returned.
Schemas:
input_schema = pa.schema([
pa.field('inputs', pa.string()),
pa.field('return_text', pa.bool_()),
pa.field('return_tensors', pa.bool_()),
pa.field('clean_up_tokenization_spaces', pa.bool_()),
# pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair
])
output_schema = pa.schema([
pa.field('summary_text', pa.string()),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_TEXT_CLASSIFICATION |
Schemas
input_schema = pa.schema([
pa.field('inputs', pa.string()), # required
pa.field('top_k', pa.int64()), # optional
pa.field('function_to_apply', pa.string()), # optional
])
output_schema = pa.schema([
pa.field('label', pa.list_(pa.string(), list_size=2)), # list with a number of items same as top_k, list_size can be skipped but may lead in worse performance
pa.field('score', pa.list_(pa.float64(), list_size=2)), # list with a number of items same as top_k, list_size can be skipped but may lead in worse performance
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_TRANSLATION |
Any parameter that is not part of the required inputs
list will be forwarded to the model as a key/pair value to the underlying models generate
method. If the additional input is not supported by the model, an error will be returned.
Schemas:
input_schema = pa.schema([
pa.field('inputs', pa.string()), # required
pa.field('return_tensors', pa.bool_()), # optional
pa.field('return_text', pa.bool_()), # optional
pa.field('clean_up_tokenization_spaces', pa.bool_()), # optional
pa.field('src_lang', pa.string()), # optional
pa.field('tgt_lang', pa.string()), # optional
# pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair
])
output_schema = pa.schema([
pa.field('translation_text', pa.string()),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_ZERO_SHOT_CLASSIFICATION |
Schemas:
input_schema = pa.schema([
pa.field('inputs', pa.string()), # required
pa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # required
pa.field('hypothesis_template', pa.string()), # optional
pa.field('multi_label', pa.bool_()), # optional
])
output_schema = pa.schema([
pa.field('sequence', pa.string()),
pa.field('scores', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance
pa.field('labels', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_ZERO_SHOT_IMAGE_CLASSIFICATION |
Schemas:
input_schema = pa.schema([
pa.field('inputs', # required
pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=100
),
list_size=100
)),
pa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # required
pa.field('hypothesis_template', pa.string()), # optional
])
output_schema = pa.schema([
pa.field('score', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels
pa.field('label', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_ZERO_SHOT_OBJECT_DETECTION |
Schemas:
input_schema = pa.schema([
pa.field('images',
pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3
),
list_size=640
),
list_size=480
)),
pa.field('candidate_labels', pa.list_(pa.string(), list_size=3)),
pa.field('threshold', pa.float64()),
# pa.field('top_k', pa.int64()), # we want the model to return exactly the number of predictions, we shouldn't specify this
])
output_schema = pa.schema([
pa.field('score', pa.list_(pa.float64())), # variable output, depending on detected objects
pa.field('label', pa.list_(pa.string())), # variable output, depending on detected objects
pa.field('box',
pa.list_( # dynamic output, i.e. dynamic number of boxes per input image, each sublist contains the 4 box coordinates
pa.list_(
pa.int64(),
list_size=4
),
),
),
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_SENTIMENT_ANALYSIS | Hugging Face Sentiment Analysis |
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_TEXT_GENERATION |
Any parameter that is not part of the required inputs
list will be forwarded to the model as a key/pair value to the underlying models generate
method. If the additional input is not supported by the model, an error will be returned.
input_schema = pa.schema([
pa.field('inputs', pa.string()),
pa.field('return_tensors', pa.bool_()), # optional
pa.field('return_text', pa.bool_()), # optional
pa.field('return_full_text', pa.bool_()), # optional
pa.field('clean_up_tokenization_spaces', pa.bool_()), # optional
pa.field('prefix', pa.string()), # optional
pa.field('handle_long_generation', pa.string()), # optional
# pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair
])
output_schema = pa.schema([
pa.field('generated_text', pa.list_(pa.string(), list_size=1))
])
Wallaroo Framework | Reference |
---|---|
Framework.HUGGING_FACE_AUTOMATIC_SPEECH_RECOGNITION |
Sample input and output schema.
input_schema = pa.schema([
pa.field('inputs', pa.list_(pa.float32())), # required: the audio stored in numpy arrays of shape (num_samples,) and data type `float32`
pa.field('return_timestamps', pa.string()) # optional: return start & end times for each predicted chunk
])
output_schema = pa.schema([
pa.field('text', pa.string()), # required: the output text corresponding to the audio input
pa.field('chunks', pa.list_(pa.struct([('text', pa.string()), ('timestamp', pa.list_(pa.float32()))]))), # required (if `return_timestamps` is set), start & end times for each predicted chunk
])
Uploading Hugging Face Models
Hugging Face models are uploaded to Wallaroo through the Wallaroo Client upload_model
method.
Upload Hugging Face Model Parameters
The following parameters are required for Hugging Face models. Note that while some fields are considered as optional for the upload_model
method, they are required for proper uploading of a Hugging Face model to Wallaroo.
Parameter | Type | Description |
---|---|---|
name | string (Required) | The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model. |
path | string (Required) | The path to the model file being uploaded. |
framework | string (Required) | Set as the framework - see the list above for all supported Hugging Face frameworks. |
input_schema | pyarrow.lib.Schema (Required) | The input schema in Apache Arrow schema format. |
output_schema | pyarrow.lib.Schema (Required) | The output schema in Apache Arrow schema format. |
convert_wait | bool (Optional) (Default: True) |
|
arch | wallaroo.engine_config.Architecture | The architecture the model is deployed to. If a model is intended for deployment to an ARM architecture, it must be specified during this step. Values include: X86 (Default): x86 based architectures. ARM : ARM based architectures. |
Once the upload process starts, the model is containerized by the Wallaroo instance. This process may take up to 10 minutes.
Model Config Options
Model version configurations are updated with the wallaroo.model_version.config
and include the following parameters. Most are optional unless specified.
Parameter | Type | Description |
---|---|---|
runtime | String (Optional) | The model runtime from wallaroo.framework. } |
tensor_fields | (List[string]) (Optional) | A list of alternate input fields. For example, if the model accepts the input fields ['variable1', 'variable2'] , tensor_fields allows those inputs to be overridden to ['square_feet', 'house_age'] , or other values as required. |
input_schema | pyarrow.lib.Schema | The input schema for the model in pyarrow.lib.Schema format. |
output_schema | pyarrow.lib.Schema | The output schema for the model in pyarrow.lib.Schema format. |
batch_config | (List[string]) (Optional) | Batch config is either None for multiple-input inferences, or single to accept an inference request with only one row of data. |
Upload Hugging Face Model Return
The following is returned with a successful model upload and conversion.
Field | Type | Description |
---|---|---|
name | string | The name of the model. |
version | string | The model version as a unique UUID. |
file_name | string | The file name of the model as stored in Wallaroo. |
image_path | string | The image used to deploy the model in the Wallaroo engine. |
last_update_time | DateTime | When the model was last updated. |
Upload Hugging Face Model Example
The following example is of uploading a Hugging Face Zero Shot Classification ML Model to a Wallaroo instance.
input_schema = pa.schema([
pa.field('inputs', pa.string()), # required
pa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # required
pa.field('hypothesis_template', pa.string()), # optional
pa.field('multi_label', pa.bool_()), # optional
])
output_schema = pa.schema([
pa.field('sequence', pa.string()),
pa.field('scores', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance
pa.field('labels', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance
])
model = wl.upload_model("hf-zero-shot-classification",
"./models/model-auto-conversion_hugging-face_dummy-pipelines_zero-shot-classification-pipeline.zip",
framework=Framework.HUGGING_FACE_ZERO_SHOT_CLASSIFICATION,
input_schema=input_schema,
output_schema=output_schema,
convert_wait=True)
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime................................................successful
Ready
Pipeline Deployment Configurations
Pipeline deployments allocate resources from the cluster to the pipeline and its models with the wallaroo.pipeline.deploy(deployment_config: Optional[wallaroo.deployment_config.DeploymentConfig])
method. The wallaroo.deployment_config.DeploymentConfig.DeploymentConfigBuilder
class creates DeploymentConfig
settings such as the number of CPUs, the amount of RAM, the architecture, etc. For full details, see the Pipeline deployment configurations guides.
The settings for a pipeline configuration are dependent on whether the model is converted to the Native Runtime space, or Containerized Model Runtime space during the model upload process. The method wallaroo.model_config.runtime()
displays which runtime the uploaded model was converted to.
Runtime | Type | Pipeline Deployment Details |
---|---|---|
onnx | Wallaroo Native | See Native Runtime Configuration Methods |
flight | Wallaroo Container | See Containerized Runtime Configuration Methods |
Wallaroo Native Runtime Deployment
Wallaroo Native Runtime models typically use the following settings for pipeline resource allocation. See See Native Runtime Configuration Methods for complete options.
Resource | Method | Description | |
---|---|---|---|
Replicas | wallaroo.deployment_config.DeploymentConfigBuilder.replica_count(count: int) | The number of replicas of the Wallaroo Native pipeline resources to allocate. Each replica has the same number of cpus, ram, etc. For example: DeploymentConfigBuilder.replica_count(2) | |
Auto-allocated replicas | wallaroo.deployment_config.DeploymentConfigBuilder.replica_autoscale_min_max(maximum: int, minimum: int = 0) | Replicas that will auto-allocate more replicas to the pipeline from 0 to the set maximum as more inference requests are made. | |
CPU | wallaroo.deployment_config.DeploymentConfigBuilder.cpus(core_count: float) | Fractional number of cpus to allocate. For example: DeploymentConfigBuilder.cpus(0.5) | |
Memory | wallaroo.deployment_config.DeploymentConfigBuilder.memory(memory_spec: string) | Memory resources in Kubernetes Memory resource units | |
GPUs | wallaroo.deployment_config.DeploymentConfigBuilder.gpus(core_count: int) | Number of GPU’s to deploy; GPUs can only be deployed in whole increments. If used, must be paired with the deployment_label pipeline configuration option. | |
Deployment Label | wallaroo.deployment_config.DeploymentConfigBuilder.deployment_label(label:string) | Required if gpus are set and must match the GPU nodepool label. |
The following example shows deploying a Native Wallaroo Runtime model with the pipeline configuration of one replica, half a cpu and 1 Gi of RAM.
Note that for native runtime models, total pipeline resources are shared by all the native runtime models for each replica.
model.config().runtime()
'onnx'
# add the model as a pipeline step
pipeline.add_model_step(model)
# DeploymentConfigBuilder is used to create the pipeline's deployment configuration object
from wallaroo.deployment_config import DeploymentConfigBuilder
# deploy using native runtime deployment
deployment_config_native = DeploymentConfigBuilder() \
.replica_count(1) \
.cpus(0.5) \
.memory('1Gi') \
.build()
# deploy the pipeline with the pipeline configuration
pipeline.deploy(deployment_config=deployment_config_native)
Wallaroo Containerized Runtime Deployment
Wallaroo Containerized Runtime models typically use the following settings for pipeline resource allocation. See See Containerized Runtime Configuration Methods for complete options.
Containerized Runtime models resources are allocated with the sidekick
name, with the containerized model specified for resources.
Resource | Method | Description | |
---|---|---|---|
Replicas | wallaroo.deployment_config.DeploymentConfigBuilder.replica_count(count: int) | The number of replicas of the Wallaroo Native pipeline resources to allocate. Each replica has the same number of cpus, ram, etc. | |
Auto-allocated replicas | wallaroo.deployment_config.DeploymentConfigBuilder.replica_autoscale_min_max(maximum: int, minimum: int = 0) | Replicas that will auto-allocate more replicas to the pipeline from 0 to the set maximum as more inference requests are made. | |
CPU | wallaroo.deployment_config.DeploymentConfigBuilder.sidekick_cpus(model: wallaroo.model.Model, core_count: float) | Fractional number of cpus to allocate for the containerized model. | |
Memory | wallaroo.deployment_config.DeploymentConfigBuilder.sidekick_memory(model: wallaroo.model.Model, memory_spec: string) | Memory resources in Kubernetes Memory resource units | |
GPUs | wallaroo.deployment_config.DeploymentConfigBuilder.sidekick_gpus(model: wallaroo.model.Model, core_count: int) | Number of GPU’s to deploy; GPUs can only be deployed in whole increments. If used, must be paired with the deployment_label pipeline configuration option. | |
Deployment Label | wallaroo.deployment_config.DeploymentConfigBuilder.deployment_label(label:string) | Required if gpus are set and must match the GPU nodepool label. |
The following example shows deploying a Containerized Wallaroo Runtime model with the pipeline configuration of one replica, half a cpu and 1 Gi of RAM.
Note that for containerized models, each containerized model’s resources are set independently of each other and duplicated for each pipeline replica, and are considered separate from the native runtime models.
model_native.config().runtime()
'onnx'
model_containerized.config().runtime()
'flight'
# add the models as a pipeline steps
pipeline.add_model_step(model_native)
pipeline.add_model_step(model_containerized)
# DeploymentConfigBuilder is used to create the pipeline's deployment configuration object
from wallaroo.deployment_config import DeploymentConfigBuilder
# deploy using containerized runtime deployment
deployment_config_containerized = DeploymentConfigBuilder() \
.replica_count(1) \
.cpus(0.5) \ # shared by the native runtime models
.memory('1Gi') \ # shared by the native runtime models
.sidekick_cpus(model_containerized, 0.5) \ # 0.5 cpu allocated solely for the containerized model
.sidekick_memory(model_containerized, '1Gi') \ #1 Gi allocated solely for the containerized model
.build()
# deploy the pipeline with the pipeline configuration
pipeline.deploy(deployment_config=deployment_config_containerized)
Pipeline Deployment Timeouts
Pipeline deployments typically take 45 seconds for Wallaroo Native Runtimes, and 90 seconds for Wallaroo Containerized Runtimes.
If Wallaroo Pipeline deployment times out from a very large or complex ML model being deployed, the timeout is extended from with the wallaroo.Client.Client(request_timeout:int)
setting, where request_timeout
is in integer seconds. Wallaroo Native Runtime deployments are scaled at 1x the request_timeout
setting. Wallaroo Containerized Runtimes are scaled at 2x the request_timeout
setting.
The following example shows extending the request_timeout
to 2 minutes.
wl = wallaroo.Client(request_timeout=120)
wl.timeout
120