Automated Vector Database Enrichment in Wallaroo
This tutorial and the assets can be downloaded as part of the Wallaroo Tutorials repository.
RAG LLMs: Automated Vector Database Enrichment in Wallaroo
The following demonstrates using a Bidirectional Attentive Autoencoder for Inducing Semantics (BAAI) general embedding (BGE) model to update embeddings in a vector database. This process uses Wallaroo features to:
- Deploy the BGE model for embedding computation.
- Create a Wallaroo Data Connector to connect to a vector database.
- Use Wallaroo Inference Automations to batch process documents on a regular basis to update embeddings in the vector database.
These embeddings are used in a vector database to generate context for RAG LLMs - text matching the embedding is used by the RAG LLM to narrow its responses and prevent hallucinations.
For this example, the Mongo Atlas Vector Database is used as the representational database.
For access to these sample models and for a demonstration of how to use a LLM Validation Listener.
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today.
Library Import
For this tutorial, import the pymongo package. This is used later to interact with the MongoDB Atlas Vector Database.
!pip install pymongo
Collecting pymongo
Using cached pymongo-4.7.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (668 kB)
Collecting dnspython<3.0.0,>=1.16.0
Using cached dnspython-2.6.1-py3-none-any.whl (307 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.6.1 pymongo-4.7.3
Next we import the libraries used for this tutorial into the notebook.
import json
import os
import pymongo
import wallaroo
from wallaroo.pipeline import Pipeline
from wallaroo.deployment_config import DeploymentConfigBuilder
from wallaroo.framework import Framework
from wallaroo.engine_config import Architecture
import pyarrow as pa
import numpy as np
import pandas as pd
import zipfile
import time
Upload and Deploy BGE Model
This process shows how to upload the sample BGE model to Wallaroo and perform sample inferences through it that generate the embeddings.
Connect to the Wallaroo Instance
This step sets a connection to Wallaroo through the Wallaroo client. The Python library is included in the Wallaroo install and available through the Jupyter Hub interface provided with your Wallaroo environment.
This is accomplished using the wallaroo.Client()
command, which provides a URL to grant the SDK permission to your specific Wallaroo environment. When displayed, enter the URL into a browser and confirm permissions. Store the connection into a variable that can be referenced later.
If logging into the Wallaroo instance through the internal JupyterHub service, use wl = wallaroo.Client()
. For more information on Wallaroo Client settings, see the Client Connection guide.
wl = wallaroo.Client(request_timeout=480)
workspace = wl.get_workspace("embedding-computation", create_if_not_exist=True)
_ = wl.set_current_workspace(ws)
Upload BGE Model
Before uploading the BGE model, we define the input and output schemas in Apache PyArrow Schema format.
input_schema = pa.schema([
pa.field('text', pa.string())
])
output_schema = pa.schema([
pa.field('embedding',
pa.list_(
pa.float64(), list_size=768
),
)
])
The BGE model is a Hugging Face model in a Wallaroo BYOP framework in the file byop_bge_base2.zip
. We upload it to Wallaroo via the wallaroo.client.Client.upload_model
method, providing the following parameters:
- The name to assign to the BGE model.
- The file path to upload the model.
- The Framework set to
wallaroo.framework.Framework.CUSTOM
for our Hugging Face model encapsulated in the BYOP framework. - The input and output schemas.
For more information, see the Wallaroo Model Upload guide.
model = wl.upload_model('byop-bge-base-v2',
'byop_bge_base2.zip',
framework=Framework.CUSTOM,
input_schema=input_schema,
output_schema=output_schema,
)
model
Waiting for model loading - this will take up to 10.0min.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime.......................................successful
Ready
Name | byop-bge-base-v2 |
Version | c5bb0af6-eb8a-403b-9ada-bd92d8bdcdc7 |
File Name | byop_bge_base2.zip |
SHA | 4854c685c46258ecbbfe55cf4e516b9f4b578bd87cc14cd0a9be4775e91ced6d |
Status | ready |
Image Path | proxy.replicated.com/proxy/wallaroo/ghcr.io/wallaroolabs/mac-deploy:v2024.1.0-5208 |
Architecture | x86 |
Acceleration | none |
Updated At | 2024-26-Jun 18:11:30 |
Deployment Configuration Settings
Before deploying the model, we set the deployment configuration, which sets what resources are allocated to the model through the method wallaroo.deployment_config.DeploymentConfigBuilder
. For this example, the following resources are allocated to the BGE model:
- Cpus: 4
- Memory: 3 Gi
For more details, see Model Deployment Configuration.
deployment_config = DeploymentConfigBuilder() \
.cpus(1).memory('2Gi') \
.sidekick_cpus(model, 4) \
.sidekick_memory(model, '3Gi') \
.build()
Deploy BGE Model
The BGE model is deployed through the following steps:
- Create a Wallaroo pipeline.
- Set the BGE model as a pipeline step.
- Deploy the pipeline with the
wallaroo.pipeline.Pipeline.deploy(deployment_config)
method. This deploys the pipeline and sets the deployment configuration.
Once deployed, the BGE model is ready for inference requests.
pipeline = wl.build_pipeline("byop-bge-pipe-base-v2")
pipeline.add_model_step(model)
pipeline.deploy(deployment_config=deployment_config)
Waiting for deployment - this will take up to 480s ....................................................................................................... ok
name | byop-bge-pipe-base-v2 |
---|---|
created | 2024-06-26 18:12:21.032554+00:00 |
last_updated | 2024-06-26 18:12:21.085707+00:00 |
deployed | True |
arch | x86 |
accel | none |
tags | |
versions | e9e8a7de-9aff-400c-a3b3-4bd280ee1923, a0e0949b-01f2-413a-8a09-8875c32f00e8 |
steps | byop-bge-base-v2 |
published | False |
Sample Inference
Models deployed in Wallaroo accept either pandas DataFrames or Apache Arrow tables as inputs. For our inference example, we submit a pandas DataFrame, then see the BGE embedding results in the column out.embedding
.
pipeline.infer(pd.DataFrame({"text": ["embed this sentence."]}))
time | in.text | out.embedding | anomaly.count | |
---|---|---|---|---|
0 | 2024-06-26 20:09:20.979 | embed this sentence. | [0.02977638, -0.017274762, 0.048839126, -0.023... | 0 |
Vector Database Connection with Wallaroo
Wallaroo Data Connections define settings that are stored and used for connecting to different data sources. For full details, see Data Connections Management.
The following shows creating a Wallaroo Data Connection and saving the artifacts used to connect to the sample Mongo Atlas Vector database. The Data Connection is assigned to the workspace for use by other workspace team members.
connect = wl.create_connection("mongodb_atlas",
"mongodb",
details={"uri":"mongodb+srv://<username>:<password>@cluster0.lfnspv5.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"}
)
workspace.add_connection("mongodb_atlas")
Field | Value |
---|---|
Name | mongodb_atlas |
Connection Type | mongodb |
Details | ***** |
Created At | 2024-06-20T22:00:39.396887+00:00 |
Linked Workspaces | [] |
We test the connection by using the connection details()
method, which retrieves the stored credentials and other data, and store the movie
data into the collection
variable for use later.
client = pymongo.MongoClient(connect.details()["uri"])
db = client.sample_mflix
collection = db.movies
try:
client.admin.command('ping')
print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
print(e)
Pinged your deployment. You successfully connected to MongoDB!
Run Inference on Documents
With out collection of movie data, we’ll scan through and find any that have the plot
value, then use that to create a DataFrame from those values. For this example, we’ll limit our selection to 10 elements.
texts = []
for doc in collection.find({'plot':{"$exists": True}}).limit(10):
texts.append(doc['plot'])
data = pd.DataFrame({'text': texts})
display(data)
text | |
---|---|
0 | A group of bandits stage a brazen train hold-u... |
1 | A greedy tycoon decides, on a whim, to corner ... |
2 | Cartoon figures announce, via comic strip ball... |
3 | A woman, with the aid of her police officer sw... |
4 | The cartoonist, Winsor McCay, brings the Dinos... |
5 | Original advertising for the film describes it... |
6 | Young Pauline is left a lot of money when her ... |
7 | An immigrant leaves his sweetheart in Italy to... |
8 | At 10 years old, Owens becomes a ragged orphan... |
9 | Christ takes on the form of a pacifist count t... |
We submit an inference request with our data and get the new embedding values from each submission.
result = pipeline.infer(data, timeout=10000)
result
time | in.text | out.embedding | anomaly.count | |
---|---|---|---|---|
0 | 2024-06-26 20:15:35.259 | A group of bandits stage a brazen train hold-u... | [-0.027950192, -0.054571882, -0.002392033, 0.0... | 0 |
1 | 2024-06-26 20:15:35.259 | A greedy tycoon decides, on a whim, to corner ... | [-0.071634166, -0.0073989113, -0.025931077, -0... | 0 |
2 | 2024-06-26 20:15:35.259 | Cartoon figures announce, via comic strip ball... | [-0.00864067, -0.020116393, 0.035886534, -0.00... | 0 |
3 | 2024-06-26 20:15:35.259 | A woman, with the aid of her police officer sw... | [-0.06523778, -0.09331782, -0.02681339, -0.007... | 0 |
4 | 2024-06-26 20:15:35.259 | The cartoonist, Winsor McCay, brings the Dinos... | [-0.07010095, -0.035720695, -0.03118671, 0.026... | 0 |
5 | 2024-06-26 20:15:35.259 | Original advertising for the film describes it... | [-0.02530954, 0.012174658, -0.016730076, -0.00... | 0 |
6 | 2024-06-26 20:15:35.259 | Young Pauline is left a lot of money when her ... | [-0.03885297, -0.018563386, 0.010222761, -0.00... | 0 |
7 | 2024-06-26 20:15:35.259 | An immigrant leaves his sweetheart in Italy to... | [-0.07279091, -0.050980825, 0.029236948, 0.016... | 0 |
8 | 2024-06-26 20:15:35.259 | At 10 years old, Owens becomes a ragged orphan... | [-0.10594661, 0.0073492057, -0.0008419599, -0.... | 0 |
9 | 2024-06-26 20:15:35.259 | Christ takes on the form of a pacifist count t... | [-0.04602558, -0.013552995, 0.01844381, -0.022... | 0 |
Inference Automation Embedding Generation
This step demonstrates using Wallaroo Inference Automation to generate the embeddings and store the results in our vector database either as a single task, or as a repeated task that scans the database and generates new embeddings on a regular schedule.
Inference Automation Script
The BGE Inference Automation contains the following items:
main.py
: A Python script that uses the Wallaroo connection defined in the step Vector Database Connection with Wallaroo to retrieve Movie plot information, run the text through the BGE model, then upload the embeddings into the vector database.requirements.txt
: A list of the Python libraries required for themain.py
script to execute, which includespymongo==4.7.3
.
Before executing the embedding script, the database table doesn’t contain the vector index values. The following is an example of database pre-embedding:
title | plot | plot_embedding_hf |
---|---|---|
The Great Train Robbery | A group of bandits stage a brazen train hold-u… | |
A Corner in Wheat | A greedy tycoon decides, on a whim, to corner … | |
Little Nemo | Cartoon figures announce, via comic strip ball… | |
Traffic in Souls | A woman, with the aid of her police officer sw… | |
Gertie the Dinosaur | The cartoonist, Winsor McCay, brings the Dinos… | |
In the Land of the Head Hunters | Original advertising for the film describes it… | |
The Perils of Pauline | Young Pauline is left a lot of money when her … | |
The Italian | An immigrant leaves his sweetheart in Italy to… | |
The Regeneration | At 10 years old, Owens becomes a ragged orphan… | |
Civilization | Christ takes on the form of a pacifist count t… |
The following is a snippet from the main.py
script showing the database connection, inferencing the text to create the embeddings, and uploading the embeddings into the vector database.
for doc in collection.find({'plot':{"$exists": True}}):
myquery = { 'plot': doc['plot']} # retrieve the plot
data = pd.DataFrame({'text': doc['plot']}) # convert the plot into a pandas DataFrame
embedding = pipeline.infer(data)['out.embedding'] # infer on the plot text to create the embedding
update = { '$set': { 'plot_embedding_hf': embedding } } # add the embedding to the data set
collection.updateOne(myquery, update) # update the vector database with the new embedding
The following is an example of database post-embedding:
title | plot | plot_embedding_hf |
---|---|---|
The Great Train Robbery | A group of bandits stage a brazen train hold-u… | [-0.027950192, -0.054571882, -0.002392033, 0.0…] |
A Corner in Wheat | A greedy tycoon decides, on a whim, to corner … | [-0.071634166, -0.0073989113, -0.025931077, -0…] |
Little Nemo | Cartoon figures announce, via comic strip ball… | [-0.00864067, -0.020116393, 0.035886534, -0.00…] |
Traffic in Souls | A woman, with the aid of her police officer sw… | [-0.06523778, -0.09331782, -0.02681339, -0.007…] |
Gertie the Dinosaur | The cartoonist, Winsor McCay, brings the Dinos… | [-0.07010095, -0.035720695, -0.03118671, 0.026…] |
In the Land of the Head Hunters | Original advertising for the film describes it… | [-0.02530954, 0.012174658, -0.016730076, -0.00…] |
The Perils of Pauline | Young Pauline is left a lot of money when her … | [-0.03885297, -0.018563386, 0.010222761, -0.00…] |
The Italian | An immigrant leaves his sweetheart in Italy to… | [-0.07279091, -0.050980825, 0.029236948, 0.016…] |
The Regeneration | At 10 years old, Owens becomes a ragged orphan… | [-0.10594661, 0.0073492057, -0.0008419599, -0….] |
Civilization | Christ takes on the form of a pacifist count t… | [-0.04602558, -0.013552995, 0.01844381, -0.022…] |
Upload the Inference Automation as Orchestration
To generate the inference automation, we we zip all of the components:
- The Python script that executes the task with the supplied arguments.
- A
requirements.txt
file to set what Python libraries to use.
files_to_include = [
'orchestration/main.py', # execution script
'orchestration/requirements.txt' # required if you have additional package dependencies beyond what's included in wallaroo environment
]
zipfile_name = 'orchestration.zip'
with zipfile.ZipFile(zipfile_name, mode='w') as archive:
for filename in files_to_include:
archive.write(filename, filename.split('/')[-1])
# verify the contents
with zipfile.ZipFile(zipfile_name, mode='r') as archive:
archive.printdir()
File Name Modified Size
main.py 2024-06-22 20:40:32 1725
requirements.txt 2024-06-22 20:10:28 42
With the inference automation stored in our .zip file, we upload it. Once it’s status is ready
, we can generate new tasks from the inference automation.
Next we can upload our orchestration:
orchestration = wl.upload_orchestration(name='automated-embedding-generation', path='orchestration.zip')
while orchestration.status() != 'ready':
print(orchestration.status())
time.sleep(15)
pending_packaging
packaging
packaging
packaging
BGE Embedding Run Once Task
With our inference automation uploaded and prepared, we can create two types of tasks:
- Run Once Task: Parameters are passed to the inference automation that generates one execution of the script.
- Run Schedule Tasks: The parameters and schedule are passed, which generates a new task from the automation inference every time the
cron
schedule is met.
For more details, see Inference Automation: Task Methods.
The following generates a Run Once task, specifying the Wallaroo Connection, pipeline, and workspace. The write_db
parameter indicates whether to write the new embeddings to the database or just retrieve the data and run through the embeddings generation process.
task = orchestration.run_once(name = 'sample embedding generation',
json_args = {'connection_name': 'mongodb_atlas',
'pipeline_name': 'byop-bge-pipe-base-v2',
'workspace_name': 'embedding-computation',
'write_db': False
})
task
Field | Value |
---|---|
ID | cd125107-7663-40a7-a1e2-b41025288559 |
Name | sample embedding generation |
Last Run Status | failure |
Type | Temporary Run |
Active | True |
Schedule | - |
Created At | 2024-22-Jun 20:37:49 |
Updated At | 2024-22-Jun 20:37:55 |
The following generates the Run Scheduled version of the same task, set to execute every 12:01 AM. This allows for new embeddings as the database is updated.
task = orchestration.run_scheduled(name = 'sample embedding generation',
schedule = '1 0 * * *',
json_args = {'connection_name': 'mongodb_atlas',
'pipeline_name': 'byop-bge-pipe-base-v2',
'workspace_name': 'embedding-computation',
'write_db': False
})
With the tutorial complete, we undeploy the model and return the resources back to the cluster.
pipeline.undeploy()
For access to these sample models and for a demonstration of how to use a LLM Validation Listener.
- Contact your Wallaroo Support Representative OR
- Schedule Your Wallaroo.AI Demo Today.