The following tutorial is available on the Wallaroo Github Repository.
This notebook is used in conjunction with the Wallaroo Inference Server Free Edition for LLama 2. This provides a free license for performing inferences through the Hugging Face Summarizer model. For more information, see the Llama 2 reference page.
The Llama 2 Model takes the following inputs.
Field | Type | Description |
---|---|---|
text | String (Required) | The prompt for the llama model. |
Field | Type | Description |
---|---|---|
generated_text | String | The generated text output. |
The following HTTPS API endpoints are available for Wallaroo Inference Server.
/pipelines
pipelines
with the following fields.Running
indicates the pipeline is available for inferences.The following demonstrates using curl
to retrieve the Pipelines endpoint. Replace the HOSTNAME with the address of your Wallaroo Inference Server.
!curl HOSTNAME:8080/pipelines
{"pipelines":[{"id":"llama","status":"Running"}]}
/models
models
with the following fields.sha
hash of the model.Running
indicates the models is available for inferences.The following demonstrates using curl
to retrieve the Models endpoint. Replace the HOSTNAME with the address of your Wallaroo Inference Server.
!curl HOSTNAME:8080/models
{"models":[{"name":"llama","sha":"0bf8b42da8d35dac656048c53230d8d645abdbef281ec5d230fd80aef18aec95","status":"Running","version":"5291a743-5c38-4448-8122-bd5edec73011"}]}
The following inference endpoint is available from the Wallaroo Server for HuggingFace Summarizer.
/pipelines/hf-summarizer-standard
Content-Type: application/vnd.apache.arrow.file
: For Apache Arrow tables.Content-Type: application/json; format=pandas-records
: For pandas DataFrame in record format./pipelines/hf-summarizer-standard
OR Apache Arrow table in application/vnd.apache.arrow.file
with the following inputs:Content-Type: application/json; format=pandas-records
: pandas DataFrame in record format.null
if the input may be too long for a proper return.[1,1]
for this model deployment.1
for this mnodel deployment.The following example performs an inference using the pandas record input ./data/test_summarization.df.json
with a text string to summarize.
!curl -X POST HOSTNAME:8080/pipelines/llama \
-H "Content-Type: application/json; format=pandas-records" \
-d '[{"text":"What is a number that can divide 0 evenly?"}]'