whisper-large-v2 Demonstration with Wallaroo
Features:
Models:
This tutorial can be downloaded as part of the Wallaroo Tutorials repository.
Whisper Demo
The following tutorial demonstrates deploying the openai/whisper-large-v2 on a Wallaroo
pipeline and performing inferences on it using the BYOP feature.
Data Prepartions
For this example, the following Python libraries were used:
These can be installed with the following command:
pip install librosa datasets --user
For these libraries, a sample of audio files was retrieved and converted using the following code.
import librosa
from datasets import load_dataset
# load the sample dataset and retrieve the audio files
dataset = load_dataset("Narsil/asr_dummy")
# the following is used to play them
audio_1, sr_1 = librosa.load(dataset["test"][0]["file"])
audio_2, sr_2 = librosa.load(dataset["test"][1]["file"])
audio_files = [(audio_1, sr_1), (audio_2, sr_2)]
# convert the audio files to numpy values in a DataFrame
input_data = {
"inputs": [audio_1, audio_2],
"return_timestamps": ["word", "word"],
}
dataframe = pd.DataFrame(input_data)
# the following will provide a UI to play the audio file samples
def display_audio(audio: np.array, sr: int) -> None:
IPython.display.display(Audio(data=audio, rate=sr))
for audio, sr in audio_files:
display_audio(audio, sr)
The resulting pandas DataFrame can either be submitted directly to a deployed Wallaroo pipeline using wallaroo.pipeline.infer
, or the DataFrame exported to a pandas Record file in pandas JSON format, and used for an inference request using wallaroo.pipeline.infer_from_file
.
For this example, the audio files are pre-converted to a JSON pandas Record table file, and used for the inference result. This removes the requirements to add additional Python libraries to a virtual environment or Wallaroo JupyterHub service. The code above is provided as an example of converting the dataset audio into values for inference requests.
Tutorial Steps
Import Libraries
The first step is to import the libraries we’ll be using. These are included by default in the Wallaroo instance’s JupyterHub service or are installed with the Wallaroo SDK.
- References
import json
import os
import wallaroo
from wallaroo.pipeline import Pipeline
from wallaroo.deployment_config import DeploymentConfigBuilder
from wallaroo.framework import Framework
import pyarrow as pa
import numpy as np
import pandas as pd
# ignoring warnings for demonstration
import warnings
warnings.filterwarnings('ignore')
Open a Connection to Wallaroo
The next step is connect to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. For more details on logging in through Wallaroo, see the Wallaroo SDK Essentials Guide: Client Connection.
For this tutorial, the request_timeout
option is increased to allow the model conversion and pipeline deployment to proceed without any warning messages.
wl = wallaroo.Client(request_timeout=600)
Set Variables
We’ll set the name of our workspace, pipeline, models and files. Workspace names must be unique across the Wallaroo workspace. For this, we’ll add in a randomly generated 4 characters to the workspace name to prevent collisions with other users’ workspaces. If running this tutorial, we recommend hard coding the workspace name so it will function in the same workspace each time it’s run.
The names for our workspace, pipeline, model, and model files are set here to make updating this tutorial easier.
- IMPORTANT NOTE: Workspace names must be unique across the Wallaroo instance. To verify unique names, the randomization code below is provided to allow the workspace name to be unique. If this is not required, set
suffix
to''
.
workspace_name = f'whisper-tiny-demo'
pipeline_name = 'whisper-hf-byop'
model_name = 'whisper-byop'
model_file_name = './models/model-auto-conversion_hugging-face_complex-pipelines_asr-whisper-tiny.zip'
Create Workspace and Pipeline
We will now create the Wallaroo workspace to store our model and set it as the current workspace. Future commands will default to this workspace for pipeline creation, model uploads, etc. We’ll create our Wallaroo pipeline that is used to deploy our Custom Model.
workspace = wl.get_workspace(name=workspace_name, create_if_not_exist=True)
wl.set_current_workspace(workspace)
pipeline = wl.build_pipeline(pipeline_name)
display(wl.get_current_workspace())
{'name': 'whisper-tiny-demo', 'id': 20, 'archived': False, 'created_by': 'fb2916bc-551e-4a76-88e8-0f7d7720a0f9', 'created_at': '2024-07-31T16:43:14.11736+00:00', 'models': [{'name': 'whisper-byop', 'versions': 3, 'owner_id': '""', 'last_update_time': datetime.datetime(2024, 7, 31, 16, 49, 34, 758116, tzinfo=tzutc()), 'created_at': datetime.datetime(2024, 7, 31, 16, 43, 21, 150329, tzinfo=tzutc())}], 'pipelines': [{'name': 'whisper-hf-byop', 'create_time': datetime.datetime(2024, 7, 31, 16, 43, 14, 261908, tzinfo=tzutc()), 'definition': '[]'}]}
Configure & Upload Model
For this example, we will use the openai/whisper-tiny
model for the automatic-speech-recognition
pipeline task from the official 🤗 Hugging Face
hub.
To manually create an automatic-speech-recognition
pipeline from the 🤗 Hugging Face
hub link above:
- Download the original model from the the official
🤗 Hugging Face
hub.
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-tiny")
pipe.save_pretrained("asr-whisper-tiny/")
As a last step, you can zip
the folder containing all needed files as follows:
zip -r asr-whisper-tiny.zip asr-whisper-tiny/
Configure PyArrow Schema
You can find more info on the available inputs for the automatic-speech-recognition
pipeline under the official source code from 🤗 Hugging Face
.
The input and output schemas are defined in Apache pyarrow Schema format.
The model is then uploaded with the wallaroo.client.model_upload
method, where we define:
- The name to assign the model.
- The model file path.
- The input and output schemas.
The model is uploaded to the Wallaroo instance, where it is containerized to run with the Wallaroo Inference Engine.
- References
input_schema = pa.schema([
pa.field('inputs', pa.list_(pa.float32())), # required: the audio stored in numpy arrays of shape (num_samples,) and data type `float32`
pa.field('return_timestamps', pa.string()) # optional: return start & end times for each predicted chunk
])
output_schema = pa.schema([
pa.field('text', pa.string()), # required: the output text corresponding to the audio input
pa.field('chunks', pa.list_(pa.struct([('text', pa.string()), ('timestamp', pa.list_(pa.float32()))]))), # required (if `return_timestamps` is set), start & end times for each predicted chunk
])
model = wl.upload_model(model_name,
model_file_name,
framework=Framework.HUGGING_FACE_AUTOMATIC_SPEECH_RECOGNITION,
input_schema=input_schema,
output_schema=output_schema)
model
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime....................successful
Ready
Name | whisper-byop |
Version | e8a165c0-b284-44ec-8334-d852120cced2 |
File Name | model-auto-conversion_hugging-face_complex-pipelines_asr-whisper-tiny.zip |
SHA | ddd57c9c8d3ed5417783ebb7101421aa1e79429365d20326155c9c02ae1e8a13 |
Status | ready |
Image Path | proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.2.0-main-5473 |
Architecture | x86 |
Acceleration | none |
Updated At | 2024-01-Aug 15:04:24 |
Workspace id | 20 |
Workspace name | whisper-tiny-demo |
Deploy Pipeline
The model is deployed with the wallaroo.pipeline.deploy(deployment_config)
command. For the deployment configuration, we set the containerized aka sidekick
memory to 8 GB to accommodate the size of the model, and CPUs to at least 4. To optimize performance, a GPU could be assigned to the containerized model.
- References
deployment_config = DeploymentConfigBuilder() \
.cpus(0.25).memory('1Gi') \
.sidekick_memory(model, '8Gi') \
.sidekick_cpus(model, 4.0) \
.build()
pipeline = wl.build_pipeline(pipeline_name)
pipeline.add_model_step(model)
pipeline.deploy(deployment_config=deployment_config)
name | whisper-hf-byop |
---|---|
created | 2024-07-31 16:43:14.261908+00:00 |
last_updated | 2024-08-01 15:04:27.100177+00:00 |
deployed | True |
workspace_id | 20 |
workspace_name | whisper-tiny-demo |
arch | x86 |
accel | none |
tags | |
versions | 2eae56f0-6092-4243-9e9c-941ffe161c80, 6e3cc0f9-0593-4935-8632-7984dd5fb14a, e9e72cbc-f75d-45b1-86e3-709d8f526adc, 648838fd-4fcf-47b5-9315-393b5a51a389 |
steps | whisper-byop |
published | False |
After a couple of minutes we verify the pipeline deployment was successful.
pipeline.status()
{'status': 'Running',
'details': [],
'engines': [{'ip': '10.28.1.43',
'name': 'engine-7988747c5d-h5ff9',
'status': 'Running',
'reason': None,
'details': [],
'pipeline_statuses': {'pipelines': [{'id': 'whisper-hf-byop',
'status': 'Running',
'version': '2eae56f0-6092-4243-9e9c-941ffe161c80'}]},
'model_statuses': {'models': [{'name': 'whisper-byop',
'sha': 'ddd57c9c8d3ed5417783ebb7101421aa1e79429365d20326155c9c02ae1e8a13',
'status': 'Running',
'version': 'e8a165c0-b284-44ec-8334-d852120cced2'}]}}],
'engine_lbs': [{'ip': '10.28.1.42',
'name': 'engine-lb-6b59985857-cv6ht',
'status': 'Running',
'reason': None,
'details': []}],
'sidekicks': [{'ip': '10.28.1.41',
'name': 'engine-sidekick-whisper-byop-35-6c4c65695b-kfnh2',
'status': 'Running',
'reason': None,
'details': [],
'statuses': '\n'}]}
Run inference on the example dataset
We perform a sample inference with the provided DataFrame, and display the results.
%%time
result = pipeline.infer_from_file('./data/sound-examples.df.json', timeout=10000)
CPU times: user 127 ms, sys: 49.2 ms, total: 176 ms
Wall time: 5.89 s
display(result)
time | in.inputs | in.return_timestamps | out.chunks | out.text | anomaly.count | |
---|---|---|---|---|---|---|
0 | 2024-08-01 15:05:15.616 | [0.0003229662, 0.0003370901, 0.0002854846, 0.0... | word | [{'text': ' He', 'timestamp': [0.0, 1.08]}, {'... | He hoped there would be Stu for dinner, turni... | 0 |
1 | 2024-08-01 15:05:15.616 | [0.0010076478, 0.0012469155, 0.0008045971, 0.0... | word | [{'text': ' Stuff', 'timestamp': [29.78, 29.78... | Stuff it into you. His belly calcled him. | 0 |
Evaluate results
Let’s compare the results side by side with the audio inputs.
for transcription in result['out.text'].values:
print(f"Transcription: {transcription}\n")
Transcription: He hoped there would be Stu for dinner, turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick, peppered, flour-fat and sauce.
Transcription: Stuff it into you. His belly calcled him.
Undeploy Pipelines
With the demonstration complete, we undeploy the pipelines to return the resources back to the Wallaroo instance.
pipeline.undeploy()
name | whisper-hf-byop |
---|---|
created | 2024-07-31 16:43:14.261908+00:00 |
last_updated | 2024-08-01 15:04:27.100177+00:00 |
deployed | False |
workspace_id | 20 |
workspace_name | whisper-tiny-demo |
arch | x86 |
accel | none |
tags | |
versions | 2eae56f0-6092-4243-9e9c-941ffe161c80, 6e3cc0f9-0593-4935-8632-7984dd5fb14a, e9e72cbc-f75d-45b1-86e3-709d8f526adc, 648838fd-4fcf-47b5-9315-393b5a51a389 |
steps | whisper-byop |
published | False |