LLM Validation Listeners

LLM Validation Listeners are in-line monitors that validate LLMs’ inferences during the inference process. These validations are implemented as an in-line step in the same Wallaroo pipeline with the LLM. These validations are customized for whatever monitoring the user request, such as summary quality, translation quality score, and other use cases.

For access to these sample models and a demonstration on using LLMs with Wallaroo:

A LLM Validation Listener follows this process:

  • Each validation step is uploaded as Bring Your Own Predict (BYOP) or Hugging Face model into Wallaroo. These models monitor the outputs of the LLM and score them based on whatever criteria the data scientist developers.
  • These model steps evaluate inference data directly from the LLM, creating additional fields based on the LLM’s inference output.
    • For example, if the LLM outputs the field text, the validation model could output the fields summary_quality, translation_quality_score, etc.
  • These steps are monitored with Wallaroo assays to analyze the scores each validation step produces and publish assay analyses based on established criteria.

LLM Validation Listener Example

The following condensed example provides in-line monitoring for a Llama v3 Llamacpp LLM. This LLM was previously uploaded and set to the variable llamav3_llamacpp. For a full example, see the tutorial LLM In-Line Monitoring Example or download the sample Jupyter Notebooks from the Wallaroo Tutorials repository.

For access to these sample models and a demonstration on using LLMs with Wallaroo:

This LLM has the following inputs and outputs:

  • Inputs
    • text: String
  • Outputs
    • generated_text: String

We add the BYOP model summarisation_quality_final.zip which evaluates the LLM’s generated_text output and scores it. This has the following inputs and outputs:

  • Inputs
    • text: String
    • generated_text: String ; This is the output of the Llama V3 model.
  • Outputs
    • generated_text: String ; This is the same generated_text from the Llama v3 model, passed through as an inference output.
    • score: Float64; The total score based on the generated_text field.

The following is an example of performing a sample inference the Llama v3 model deployed in Wallaroo from the pipeline pipeline_llm. For details on uploading and deploying LLMs to Wallaroo, see How to Upload and Deploy LLM Models in Wallaroo.

# set the input_data
text = "Please summarize this text: Simplify production AI for seamless self-checkout or cashierless experiences at scale, enabling any retail store to offer a modern shopping journey. We reduce the technical overhead and complexity for delivering a checkout experience that’s easy and efficient no matter where your stores are located.Eliminate Checkout Delays: Easy and fast model deployment for a smooth self-checkout process, allowing customers to enjoy faster, hassle-free shopping experiences. Drive Operational Efficiencies: Simplifying the process of scaling AI-driven self-checkout solutions to multiple retail locations ensuring uniform customer experiences no matter the location of the store while reducing in-store labor costs. Continuous Improvement: Enabling integrated data insights for informing self-checkout improvements across various locations, ensuring the best customer experience, regardless of where they shop."

# convert to an Apache Arrow table
input_data = pa.Table.from_pydict({"text" : [text]})

# perform the inference
result = pipeline_llm.infer(input_data)

# display the result, output as an Apache Arrow table
display(result)

pyarrow.Table
time: timestamp[ms]
in.text: string not null
out.generated_text: string not null
out.score: float not null
check_failures: int8
----
time: [[2024-05-23 19:17:37.617]]
in.text: [["Please summarize this text: Simplify production AI for seamless self-checkout or cashierless experiences at scale, enabling any retail store to offer a modern shopping journey. We reduce the technical overhead and complexity for delivering a checkout experience that’s easy and efficient no matter where your stores are located.Eliminate Checkout Delays: Easy and fast model deployment for a smooth self-checkout process, allowing customers to enjoy faster, hassle-free shopping experiences. Drive Operational Efficiencies: Simplifying the process of scaling AI-driven self-checkout solutions to multiple retail locations ensuring uniform customer experiences no matter the location of the store while reducing in-store labor costs. Continuous Improvement: Enabling integrated data insights for informing self-checkout improvements across various locations, ensuring the best customer experience, regardless of where they shop."]]
out.generated_text: [[" Here's a summary of the text:

This AI technology simplifies and streamlines self-checkout processes for retail stores, allowing them to offer efficient and modern shopping experiences at scale. It reduces technical complexity and makes it easy to deploy AI-driven self-checkout solutions across multiple locations. The system eliminates checkout delays, drives operational efficiencies by reducing labor costs, and enables continuous improvement through data insights, ensuring a consistent customer experience regardless of location."]]
anomaly.count: [[0]]

Upload and Deploy LLM Validation Listener

The BYOP model summarisation_quality_final.zip is uploaded and deployed as a model step within the same pipeline as the LLM is deployed in.

# set the input and output schemas
input_schema = pa.schema([
    pa.field('text', pa.string()),
    pa.field('generated_text', pa.string())
]) 

output_schema = pa.schema([
    pa.field('generated_text', pa.string()),
    pa.field('score', pa.float64()),
])

# upload the model with the BYOP framework
validation_model = wl.upload_model('summquality', 
    'summarisation_quality_final.zip',
    framework=Framework.CUSTOM,
    input_schema=input_schema,
    output_schema=output_schema
)

With the validation model uploaded, we add it to the same pipeline as the Llama v3 LLM. The models are provided with the following resources:

  • llamav3_llamacpp: 6 cpus, 10 Gi RAM
  • validation_model: 2 cpus, 8 Gi RAM
# create the pipeline
pipeline_llm = wl.build_pipeline("llm-summ-quality")

# add the model steps
pipeline.add_model_step(llamav3_llamacpp)
pipeline.add_model_step(validation_model)

# set the deployment configuration
deployment_config = DeploymentConfigBuilder() \
    .cpus(2).memory('2Gi') \
    .sidekick_cpus(model, 2) \
    .sidekick_memory(model, '8Gi') \
    .sidekick_cpus(llama, 6) \
    .sidekick_memory(llama, '10Gi') \
    .build()

# deploy the LLM and validation model with the deployment configuration
pipeline.deploy(deployment_config=deployment_config)

LLM Validation Listener Sample Inference

Once deployed, we perform the same inference with the same inputs. The validation model will add the additional field score with the inference output.

# perform the inference
result = pipeline_llm.infer(input_data)

# display the result, output as an Apache Arrow table
display(result)

pyarrow.Table
time: timestamp[ms]
in.text: string not null
out.generated_text: string not null
out.score: float not null
check_failures: int8
----
time: [[2024-05-23 20:08:00.423]]
in.text: [["Please summarize this text: Simplify production AI for seamless self-checkout or cashierless experiences at scale, enabling any retail store to offer a modern shopping journey. We reduce the technical overhead and complexity for delivering a checkout experience that’s easy and efficient no matter where your stores are located.Eliminate Checkout Delays: Easy and fast model deployment for a smooth self-checkout process, allowing customers to enjoy faster, hassle-free shopping experiences. Drive Operational Efficiencies: Simplifying the process of scaling AI-driven self-checkout solutions to multiple retail locations ensuring uniform customer experiences no matter the location of the store while reducing in-store labor costs. Continuous Improvement: Enabling integrated data insights for informing self-checkout improvements across various locations, ensuring the best customer experience, regardless of where they shop."]]
out.generated_text: [[" Here's a summary of the text:

This AI technology simplifies and streamlines self-checkout processes for retail stores, allowing them to offer efficient and modern shopping experiences at scale. It reduces technical complexity and makes it easy to deploy AI-driven self-checkout solutions across multiple locations. The system eliminates checkout delays, drives operational efficiencies by reducing labor costs, and enables continuous improvement through data insights, ensuring a consistent customer experience regardless of location."]]
out.score: [[0.837221]]
anomaly.count: [[0]]

The out.score field is observed through Model Inference Results or by using Wallaroo assays.

For access to these sample models and a demonstration on using LLMs with Wallaroo: