During the model upload process, the Wallaroo instance will attempt to convert the model to a Native Wallaroo Runtime. If unsuccessful based , it will create a Wallaroo Containerized Runtime for the model. See the model deployment section for details on how to configure pipeline resources based on the model’s runtime.
Hugging Face Schemas
Input and output schemas for each Hugging Face pipeline are defined below. Note that adding additional inputs not specified below will raise errors, except for the following:
Framework.HUGGING_FACE_IMAGE_TO_TEXT
Framework.HUGGING_FACE_TEXT_CLASSIFICATION
Framework.HUGGING_FACE_SUMMARIZATION
Framework.HUGGING_FACE_TRANSLATION
Additional inputs added to these Hugging Face pipelines will be added as key/pair value arguments to the model’s generate method. If the argument is not required, then the model will default to the values coded in the original Hugging Face model’s source code.
Any parameter that is not part of the required inputs list will be forwarded to the model as a key/pair value to the underlying models generate method. If the additional input is not supported by the model, an error will be returned.
Any parameter that is not part of the required inputs list will be forwarded to the model as a key/pair value to the underlying models generate method. If the additional input is not supported by the model, an error will be returned.
Schemas:
input_schema=pa.schema([
pa.field('inputs', pa.string()),
pa.field('return_text', pa.bool_()),
pa.field('return_tensors', pa.bool_()),
pa.field('clean_up_tokenization_spaces', pa.bool_()),
# pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair])
output_schema=pa.schema([
pa.field('summary_text', pa.string()),
])
input_schema=pa.schema([
pa.field('inputs', pa.string()), # requiredpa.field('top_k', pa.int64()), # optionalpa.field('function_to_apply', pa.string()), # optional])
output_schema=pa.schema([
pa.field('label', pa.list_(pa.string(), list_size=2)), # list with a number of items same as top_k, list_size can be skipped but may lead in worse performancepa.field('score', pa.list_(pa.float64(), list_size=2)), # list with a number of items same as top_k, list_size can be skipped but may lead in worse performance])
Any parameter that is not part of the required inputs list will be forwarded to the model as a key/pair value to the underlying models generate method. If the additional input is not supported by the model, an error will be returned.
Schemas:
input_schema=pa.schema([
pa.field('inputs', pa.string()), # requiredpa.field('return_tensors', pa.bool_()), # optionalpa.field('return_text', pa.bool_()), # optionalpa.field('clean_up_tokenization_spaces', pa.bool_()), # optionalpa.field('src_lang', pa.string()), # optionalpa.field('tgt_lang', pa.string()), # optional# pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair])
output_schema=pa.schema([
pa.field('translation_text', pa.string()),
])
input_schema=pa.schema([
pa.field('inputs', pa.string()), # requiredpa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # requiredpa.field('hypothesis_template', pa.string()), # optionalpa.field('multi_label', pa.bool_()), # optional])
output_schema=pa.schema([
pa.field('sequence', pa.string()),
pa.field('scores', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performancepa.field('labels', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance])
input_schema=pa.schema([
pa.field('images',
pa.list_(
pa.list_(
pa.list_(
pa.int64(),
list_size=3 ),
list_size=640 ),
list_size=480 )),
pa.field('candidate_labels', pa.list_(pa.string(), list_size=3)),
pa.field('threshold', pa.float64()),
# pa.field('top_k', pa.int64()), # we want the model to return exactly the number of predictions, we shouldn't specify this])
output_schema=pa.schema([
pa.field('score', pa.list_(pa.float64())), # variable output, depending on detected objectspa.field('label', pa.list_(pa.string())), # variable output, depending on detected objectspa.field('box',
pa.list_( # dynamic output, i.e. dynamic number of boxes per input image, each sublist contains the 4 box coordinates pa.list_(
pa.int64(),
list_size=4 ),
),
),
])
Any parameter that is not part of the required inputs list will be forwarded to the model as a key/pair value to the underlying models generate method. If the additional input is not supported by the model, an error will be returned.
input_schema=pa.schema([
pa.field('inputs', pa.string()),
pa.field('return_tensors', pa.bool_()), # optionalpa.field('return_text', pa.bool_()), # optionalpa.field('return_full_text', pa.bool_()), # optionalpa.field('clean_up_tokenization_spaces', pa.bool_()), # optionalpa.field('prefix', pa.string()), # optionalpa.field('handle_long_generation', pa.string()), # optional# pa.field('extra_field', pa.int64()), # every extra field you specify will be forwarded as a key/value pair])
output_schema=pa.schema([
pa.field('generated_text', pa.list_(pa.string(), list_size=1))
])
input_schema=pa.schema([
pa.field('inputs', pa.list_(pa.float32())), # required: the audio stored in numpy arrays of shape (num_samples,) and data type `float32`pa.field('return_timestamps', pa.string()) # optional: return start & end times for each predicted chunk])
output_schema=pa.schema([
pa.field('text', pa.string()), # required: the output text corresponding to the audio inputpa.field('chunks', pa.list_(pa.struct([('text', pa.string()), ('timestamp', pa.list_(pa.float32()))]))), # required (if `return_timestamps` is set), start & end times for each predicted chunk])
1 - Wallaroo API Upload Tutorial: Hugging Face Zero Shot Classification
How to upload a Hugging Face Zero Shot Classification model to Wallaroo via the MLOps API.
To perform the various Wallaroo MLOps API requests, we will use the Wallaroo SDK to generate the necessary tokens. For details on other methods of requesting and using authentication tokens with the Wallaroo MLOps API, see the Wallaroo API Connection Guide.
This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). For more information on Wallaroo Client settings, see the Client Connection guide.
wl=wallaroo.Client()
Variables
The following variables will be set for the rest of the tutorial to set the following:
Wallaroo Workspace
Wallaroo Pipeline
Wallaroo Model name and path
Wallaroo Model Framework
The DNS prefix and suffix for the Wallaroo instance.
To allow this tutorial to be run multiple times or by multiple users in the same Wallaroo instance, a random 4 character prefix will be added to the workspace, pipeline, and model.
Verify that the DNS prefix and suffix match the Wallaroo instance used for this tutorial. See the DNS Integration Guide for more details.
importstringimportrandom# make a random 4 character suffix to prevent overwriting other user's workspacessuffix=''.join(random.choice(string.ascii_lowercase) foriinrange(4))
workspace_name=f'hugging-face-zero-shot-api{suffix}'pipeline_name=f'hugging-face-zero-shot'model_name=f'zero-shot-classification'model_file_name="./models/model-auto-conversion_hugging-face_dummy-pipelines_zero-shot-classification-pipeline.zip"framework="hugging-face-zero-shot-classification"wallarooPrefix="YOUR PREFIX."wallarooPrefix="YOUR SUFFIX"APIURL=f"https://{wallarooPrefix}api.{wallarooSuffix}"APIURL
'https://doc-test.api.wallarooexample.ai'
Create the Workspace
In a production environment, the Wallaroo workspace that contains the pipeline and models would be created and deployed. We will quickly recreate those steps using the MLOps API.
Workspaces are created through the MLOps API with the /v1/api/workspaces/create command. This requires the workspace name be provided, and that the workspace not already exist in the Wallaroo instance.
# Retrieve the tokenheaders=wl.auth.auth_header()
# set Content-Type typeheaders['Content-Type']='application/json'# Create workspaceapiRequest=f"{APIURL}/v1/api/workspaces/create"data= {
"workspace_name": workspace_name}
response=requests.post(apiRequest, json=data, headers=headers, verify=True).json()
display(response)
# Stored for future examplesworkspaceId=response['workspace_id']
{'workspace_id': 9}
Upload the Model
Endpoint:
/v1/api/models/upload_and_convert
Headers:
Content-Type: multipart/form-data
Parameters
name (StringRequired): The model name.
visibility (StringRequired): Either public or private.
workspace_id (StringRequired): The numerical ID of the workspace to upload the model to.
conversion (StringRequired): The conversion parameters that include the following:
framework (StringRequired): The framework of the model being uploaded. See the list of supported models for more details.
python_version (StringRequired): The version of Python required for model.
requirements (StringRequired): Required libraries. Can be [] if the requirements are default Wallaroo JupyterHub libraries.
input_schema (StringOptional): The input schema from the Apache Arrow pyarrow.lib.Schema format, encoded with base64.b64encode. Only required for non-native runtime models.
output_schema (StringOptional): The output schema from the Apache Arrow pyarrow.lib.Schema format, encoded with base64.b64encode. Only required for non-native runtime models.
Set the Schemas
The input and output schemas will be defined according to the Wallaroo Hugging Face schema requirements. The inputs are then base64 encoded for attachment in the API request.
input_schema=pa.schema([
pa.field('inputs', pa.string()), # requiredpa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # requiredpa.field('hypothesis_template', pa.string()), # optionalpa.field('multi_label', pa.bool_()), # optional])
output_schema=pa.schema([
pa.field('sequence', pa.string()),
pa.field('scores', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performancepa.field('labels', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance])
We will now build the request to include the required data. We will be using the workspaceId returned when we created our workspace in a previous step, specifying the input and output schemas, and the framework.
# Get the model details# Retrieve the tokenheaders=wl.auth.auth_header()
# set Content-Type typeheaders['Content-Type']='application/json'apiRequest=f"{APIURL}/v1/api/models/list_versions"data= {
"model_id": model_name,
"models_pk_id" : modelId}
status=Nonewhilestatus!='ready':
response=requests.post(apiRequest, json=data, headers=headers, verify=True).json()
# verify we have the right versiondisplay(model)
model=next(modelformodelinresponseifmodel["id"] ==modelId)
display(model)
status=model['status']
The next step is connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client() command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client(). If logging in externally, update the wallarooPrefix and wallarooSuffix variables with the proper DNS information. For more information on Wallaroo DNS settings, see the Wallaroo DNS Integration Guide.
wl=wallaroo.Client()
Set Variables and Helper Functions
We’ll set the name of our workspace, pipeline, models and files. Workspace names must be unique across the Wallaroo workspace. For this, we’ll add in a randomly generated 4 characters to the workspace name to prevent collisions with other users’ workspaces. If running this tutorial, we recommend hard coding the workspace name so it will function in the same workspace each time it’s run.
We’ll set up some helper functions that will either use existing workspaces and pipelines, or create them if they do not already exist.
defget_workspace(name):
workspace=Noneforwsinwl.list_workspaces():
ifws.name() ==name:
workspace=wsif(workspace==None):
workspace=wl.create_workspace(name)
returnworkspacedefget_pipeline(name):
try:
pipeline=wl.pipelines_by_name(name)[0]
exceptEntityNotFoundError:
pipeline=wl.build_pipeline(name)
returnpipelineimportstringimportrandom# make a random 4 character suffix to prevent overwriting other user's workspacessuffix=''.join(random.choice(string.ascii_lowercase) foriinrange(4))
suffix=''workspace_name=f'hf-zero-shot-classification{suffix}'pipeline_name=f'hf-zero-shot-classification'model_name='hf-zero-shot-classification'model_file_name='./models/model-auto-conversion_hugging-face_dummy-pipelines_zero-shot-classification-pipeline.zip'
Create Workspace and Pipeline
We will now create the Wallaroo workspace to store our model and set it as the current workspace. Future commands will default to this workspace for pipeline creation, model uploads, etc. We’ll create our Wallaroo pipeline to deploy our model.
The following parameters are required for Hugging Face models. Note that while some fields are considered as optional for the upload_model method, they are required for proper uploading of a Hugging Face model to Wallaroo.
Parameter
Type
Description
name
string (Required)
The name of the model. Model names are unique per workspace. Models that are uploaded with the same name are assigned as a new version of the model.
path
string (Required)
The path to the model file being uploaded.
framework
string (Upload Method Optional, Hugging Face model Required)
Set as the framework.
input_schema
pyarrow.lib.Schema (Upload Method Optional, Hugging Face model Required)
The input schema in Apache Arrow schema format.
output_schema
pyarrow.lib.Schema (Upload Method Optional, Hugging Face model Required)
The output schema in Apache Arrow schema format.
convert_wait
bool (Upload Method Optional, Hugging Face model Optional) (Default: True)
True: Waits in the script for the model conversion completion.
False: Proceeds with the script without waiting for the model conversion process to display complete.
The input and output schemas will be configured for the data inputs and outputs. More information on the available inputs under the official 🤗 Hugging Face source code.
input_schema=pa.schema([
pa.field('inputs', pa.string()), # requiredpa.field('candidate_labels', pa.list_(pa.string(), list_size=2)), # requiredpa.field('hypothesis_template', pa.string()), # optionalpa.field('multi_label', pa.bool_()), # optional])
output_schema=pa.schema([
pa.field('sequence', pa.string()),
pa.field('scores', pa.list_(pa.float64(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performancepa.field('labels', pa.list_(pa.string(), list_size=2)), # same as number of candidate labels, list_size can be skipped by may result in slightly worse performance])
Upload Model
The model will be uploaded with the framework set as Framework.HUGGING_FACE_ZERO_SHOT_CLASSIFICATION.
framework=Framework.HUGGING_FACE_ZERO_SHOT_CLASSIFICATIONmodel=wl.upload_model(model_name,
model_file_name,
framework=framework,
input_schema=input_schema,
output_schema=output_schema,
convert_wait=True)
model
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime................................................successful
The model is uploaded and ready for use. We’ll add it as a step in our pipeline, then deploy the pipeline. For this example we’re allocated 0.25 cpu and 4 Gi RAM to the pipeline through the pipeline’s deployment configuration.
pipeline=get_pipeline(pipeline_name)
# clear the pipeline if used previouslypipeline.undeploy()
pipeline.clear()
pipeline.add_model_step(model)
pipeline.deploy(deployment_config=deployment_config)
pipeline.status()
Run Inference
A sample inference will be run. First the pandas DataFrame used for the inference is created, then the inference run through the pipeline’s infer method.
input_data= {
"inputs": ["this is a test", "this is another test"], # required"candidate_labels": [["english", "german"], ["english", "german"]], # optional: using the defaults, similar to not passing this parameter"hypothesis_template": ["This example is {}.", "This example is {}."], # optional: using the defaults, similar to not passing this parameter"multi_label": [False, False], # optional: using the defaults, similar to not passing this parameter}
dataframe=pd.DataFrame(input_data)
dataframe
inputs
candidate_labels
hypothesis_template
multi_label
0
this is a test
[english, german]
This example is {}.
False
1
this is another test
[english, german]
This example is {}.
False
%timepipeline.infer(dataframe)
CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.48 µs
time
in.candidate_labels
in.hypothesis_template
in.inputs
in.multi_label
out.labels
out.scores
out.sequence
check_failures
0
2023-10-20 15:52:07.129
[english, german]
This example is {}.
this is a test
False
[english, german]
[0.504054605960846, 0.49594545364379883]
this is a test
0
1
2023-10-20 15:52:07.129
[english, german]
This example is {}.
this is another test
False
[english, german]
[0.5037839412689209, 0.4962160289287567]
this is another test
0
Undeploy Pipelines
With the tutorial complete, the pipeline is undeployed to return the resources back to the cluster.
pipeline.undeploy()
Waiting for undeployment - this will take up to 45s ........................................ ok