The following tutorial is available on the Wallaroo Github Repository.
This notebook is used in conjunction with the Wallaroo Inference Server Free Edition for Hugging Face Summarizer. This provides a free license for performing inferences through the Hugging Face Summarizer model. For full demonstrations of this model, see Wallaroo Edge Hugging Face LLM Summarization Deployment Demonstration.
Note that GPU inference server require a VM with Nvidia GPU CUDA support.
The Hugging Face LLM Summarizer Model takes the following inputs.
Field | Type | Description |
---|---|---|
inputs | String (Required) | One or more articles to summarize. |
return_text | Bool (Optional) | Whether or not to include the decoded texts in the outputs. |
return_tensor | Bool (Optional) | Whether or not to include the tensors of predictions (as token indices) in the outputs. |
clean_up_tokenization_spaces | Bool (Optional) | Whether or not to clean up the potential extra spaces in the text output. |
Field | Type | Description |
---|---|---|
summary_text | String | The summary of the corresponding input. |
The following HTTPS API endpoints are available for Wallaroo Inference Server.
/pipelines
pipelines
with the following fields.Running
indicates the pipeline is available for inferences.The following demonstrates using curl
to retrieve the Pipelines endpoint. Replace the HOSTNAME with the address of your Wallaroo Inference Server.
!curl HOSTNAME:8080/pipelines
{"pipelines":[{"id":"hf-summarizer-standard","status":"Running"}]}
/models
models
with the following fields.sha
hash of the model.Running
indicates the models is available for inferences.The following demonstrates using curl
to retrieve the Models endpoint. Replace the HOSTNAME with the address of your Wallaroo Inference Server.
!curl doc-example-hf-summarizer.westus2.cloudapp.azure.com:8080/models
{"models":[{"name":"hf-summarizer-standard","sha":"ee71d066a83708e7ca4a3c07caf33fdc528bb000039b6ca2ef77fa2428dc6268","status":"Running","version":"7dbae7b4-20d0-40f7-a3f5-eeabdd77f418"}]}
/pipelines/hf-summarizer-standard
Content-Type: application/vnd.apache.arrow.file
: For Apache Arrow tables.Content-Type: application/json; format=pandas-records
: For pandas DataFrame in record format./pipelines/hf-summarizer-standard
OR Apache Arrow table in application/vnd.apache.arrow.file
with the following inputs:Content-Type: application/json; format=pandas-records
: pandas DataFrame in record format.null
if the input may be too long for a proper return.[1,1]
for this model deployment.1
for this mnodel deployment.The following example performs an inference using the pandas record input ./data/test_summarization.df.json
with a text string to summarize.
!curl -X POST HOSTNAME:8080/pipelines/hf-summarizer-standard \
-H "Content-Type: application/json; format=pandas-records" \
-d @./data/test_summarization.df.json
[{"check_failures":[],"elapsed":[37000,3245048360],"model_name":"hf-summarizer-standard","model_version":"7dbae7b4-20d0-40f7-a3f5-eeabdd77f418","original_data":null,"outputs":[{"String":{"data":["LinkedIn is a business and employment-focused social media platform that works through websites and mobile apps. It launched on May 5, 2003. LinkedIn allows members (both workers and employers) to create profiles and connect with each other in an online social network which may represent real-world professional relationships."],"dim":[1,1],"v":1}}],"pipeline_name":"hf-summarizer-standard","shadow_data":{},"time":1696454765559}]