Vector
Q&A
why vector database used for RAG?
why vector database used for RAG?
Vector databases
Vector databases are a key part of building scalable AI-powered applications. Vector databases provide long term memory, on top of an existing machine learning model.
Without a vector database, you would need to train your model (or models) or re-run your dataset through a model before making a query, which would be slow and expensive.
Why is a vector database useful?
A vector database determines what other data (represented as vectors) is near your input query. This allows you to build different use-cases on top of a vector database, including:
Semantic search, used to return results similar to the input of the query. Classification, used to return the grouping (or groupings) closest to the input query. Recommendation engines, used to return content similar to the input based on different criteria (for example previous product sales, or user history). Anomaly detection, used to identify whether specific data points are similar to existing data, or different. Vector databases can also power Retrieval Augmented Generation (RAG) tasks, which allow you to bring additional context to LLMs (Large Language Models) by using the context from a vector search to augment the user prompt.
Vector search
In a traditional vector search use-case, queries are made against a vector database by passing it a query vector, and having the vector database return a configurable list of vectors with the shortest distance (“most similar”) to the query vector.
The step-by-step workflow resembles the below:
A developer converts their existing dataset (documentation, images, logs stored in R2) into a set of vector embeddings (a one-way representation) by passing them through a machine learning model that is trained for that data type. The output embeddings are inserted into a Vectorize database index. A search query, classification request or anomaly detection query is also passed through the same ML model, returning a vector embedding representation of the query. Vectorize is queried with this embedding, and returns a set of the most similar vector embeddings to the provided query. The returned embeddings are used to retrieve the original source objects from dedicated storage (for example, R2, KV, and D1) and returned back to the user. In a workflow without a vector database, you would need to pass your entire dataset alongside your query each time, which is neither practical (models have limits on input size) and would consume significant resources and time.
Retrieval Augmented Generation
Retrieval Augmented Generation (RAG) is an approach used to improve the context provided to an LLM (Large Language Model) in generative AI use-cases, including chatbot and general question-answer applications. The vector database is used to enhance the prompt passed to the LLM by adding additional context alongside the query.
Instead of passing the prompt directly to the LLM, in the RAG approach you:
Generate vector embeddings from an existing dataset or corpus (for example, the dataset you want to use to add additional context to the LLMs response). An existing dataset or corpus could be a product documentation, research data, technical specifications, or your product catalog and descriptions. Store the output embeddings in a Vectorize database index. When a user initiates a prompt, instead of passing it (without additional context) to the LLM, you augment it with additional context:
The user prompt is passed into the same ML model used for your dataset, returning a vector embedding representation of the query. This embedding is used as the query (semantic search) against the vector database, which returns similar vectors. These vectors are used to look up the content they relate to (if not embedded directly alongside the vectors as metadata). This content is provided as context alongside the original user prompt, providing additional context to the LLM and allowing it to return an answer that is likely to be far more contextual than the standalone prompt. Refer to the RAG using Workers AI tutorial to learn how to combine Workers AI and Vectorize for generative AI use-cases.
1 You can learn more about the theory behind RAG by reading the RAG paper.
Terminology ## Databases and indexes In Vectorize, a database and an index are the same concept. Each index you create is separate from other indexes you create. Vectorize automatically manages optimizing and re-generating the index for you when you insert new data.
Vector Embeddings
Vector embeddings represent the features of a machine learning model as a numerical vector (array of numbers). They are a one-way representation that encodes how a machine learning model understands the input(s) provided to it, based on how the model was originally trained and its’ internal structure.
For example, a text embedding model available in Workers AI is able to take text input and represent it as a 768-dimension vector. The text This is a story about an orange cloud, when represented as a vector embedding, resembles the following:
1
[-0.019273685291409492,-0.01913292706012726,<764 dimensions here>,0.0007094172760844231,0.043409910053014755]
When a model considers the features of an input as “similar” (based on its understanding), the distance between the vector embeddings for those two inputs will be short.
Dimensions
Vector dimensions describe the width of a vector embedding. The width of a vector embedding is the number of floating point elements that comprise a given vector.
The number of dimensions are defined by the machine learning model used to generate the vector embeddings, and how it represents input features based on its internal model and complexity. More dimensions (“wider” vectors) may provide more accuracy at the cost of compute and memory resources, as well as latency (speed) of vector search.
Refer to the dimensions documentation to learn how to configure the accepted vector dimension size when creating a Vectorize index.
Distance metrics
The distance metric is an index used for vector search. It defines how it determines how close your query vector is to other vectors within the index.
Distance metrics determine how the vector search engine assesses similarity between vectors. Cosine, Euclidean (L2), and Dot Product are the most commonly used distance metrics in vector search. The machine learning model and type of embedding you use will determine which distance metric is best suited for your use-case. Different metrics determine different scoring characteristics. For example, the cosine distance metric is well suited to text, sentence similarity and/or document search use-cases. euclidean can be better suited for image or speech recognition use-cases. Refer to the distance metrics documentation to learn how to configure a distance metric when creating a Vectorize index.
Lesson 1: Advanced RAG Pipeline
Lesson 1: Advanced RAG Pipeline
1. Importing Libraries and Loading Documents
- The code begins by importing necessary libraries and setting up the OpenAI API key.
- It loads a PDF document, extracts its text, and prints some information about the loaded documents.
2. Basic RAG Pipeline
- A basic RAG pipeline is set up using the llama_index library. It involves creating a Document object, an OpenAI language model (LLM), and a VectorStoreIndex.
- A query is performed on the index using the query engine.
3. Evaluation Setup using TruLens
- TruLens, a tool for evaluating retrieval-based models, is introduced.
- A list of evaluation questions is read from a file, and a Tru object is created for evaluation.
4. TruLens Recording and Dashboard
- TruLens recording and dashboard functionalities are utilized to evaluate the basic RAG pipeline.
- The code demonstrates the use of helper functions to set up TruLens recording.
5. Advanced RAG Pipeline - Sentence Window Retrieval
- A more advanced RAG pipeline is introduced, focusing on sentence window retrieval.
- A custom index and query engine are built for sentence-level retrieval.
- TruLens recording and evaluation are performed for the advanced pipeline.
6. Advanced RAG Pipeline - Auto-Merging Retrieval
- Another aspect of the advanced pipeline is auto-merging retrieval.
- A custom index and query engine are built for merging hierarchical document structures.
- TruLens recording and evaluation are performed for this aspect of the pipeline.
Lesson 2: RAG Triad of Metrics
1. Importing Libraries and Setting up TruLens
- The code begins by importing necessary libraries, including TruLens.
- TruLens is initialized, and a list of evaluation questions is read from a file.
2. Setting up RAG Metrics - Answer Relevance
- Feedback functions are introduced for evaluating answer relevance.
- The code sets up a feedback function for answer relevance using TruLens.
3. Setting up RAG Metrics - Context Relevance
- Feedback functions are introduced for evaluating context relevance.
- The code sets up feedback functions for context relevance using TruLens.
4. Setting up RAG Metrics - Groundedness
- Feedback functions are introduced for evaluating groundedness.
- The code sets up a feedback function for groundedness using TruLens.
5. Evaluation of the RAG Application
- TruLens is used to record and evaluate the RAG application using the defined metrics.
- Evaluation questions are queried, and the results, including feedback, are displayed.
Lesson 3: Sentence Window Retrieval
1. Importing Libraries and Loading Documents
- Similar to Lesson 1, the code starts by importing libraries and loading documents.
2. Window-Sentence Retrieval Setup
- The code demonstrates the setup for window-sentence retrieval, explaining how a hierarchical node parser is used.
3. Building the Index
- The code builds an index for sentence window retrieval using a custom node parser.
4. Putting it all Together
- Helper functions are provided for building the sentence window index and getting the query engine.
- The overall process of building the index, setting up postprocessors, and running queries is explained.
5. TruLens Evaluation
- TruLens is used for the evaluation of the sentence window retrieval system.
- The code sets up questions for evaluation and runs the evaluations using TruLens.
Lesson 4: Auto-Merging Retrieval
1. Importing Libraries and Loading Documents
- Similar to the previous lessons, the code starts by importing libraries and loading documents.
2. Auto-Merging Retrieval Setup
- The code introduces the setup for auto-merging retrieval using a hierarchical node parser.
- It builds an index for auto-merging retrieval and explains the hierarchical structure.
3. Defining the Retriever and Running the Query Engine
- A retriever is defined for auto-merging retrieval, and a query engine is set up for querying questions.
- The code uses a SentenceTransformerRerank for post-processing.
4. Putting it all Together
- Helper functions are provided for building the auto-merging index and getting the query engine.
- The overall process of building the index, defining the retriever, and running queries is explained.
5. TruLens Evaluation
- TruLens is used for the evaluation of the auto-merging retrieval system.
- The code sets up questions for evaluation and runs the evaluations using TruLens.
This breakdown provides a high-level overview of the code in markdown format for each lesson. Let me know if you need further clarification on any specific part.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
import utils
import os
import openai
openai.api_key = utils.get_openai_api_key()
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader(
input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])
# Basic RAG pipeline
from llama_index import Document
document = Document(text="\n\n".join([doc.text for doc in documents]))
from llama_index import VectorStoreIndex
from llama_index import ServiceContext
from llama_index.llms import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
service_context = ServiceContext.from_defaults(
llm=llm, embed_model="local:BAAI/bge-small-en-v1.5"
)
index = VectorStoreIndex.from_documents([document],
service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query(
"What are steps to take when finding projects to build your experience?"
)
print(str(response))
Evaluation setup using TruLens
eval_questions = []
with open('eval_questions.txt', 'r') as file:
for line in file:
# Remove newline character and convert to integer
item = line.strip()
print(item)
eval_questions.append(item)
# You can try your own question:
new_question = "What is the right AI job for me?"
eval_questions.append(new_question)
print(eval_questions)
from trulens_eval import Tru
tru = Tru()
tru.reset_database()
# For the classroom, we've written some of the code in helper functions inside a utils.py file.
# You can view the utils.py file in the file directory by clicking on the "Jupyter" logo at the top of the notebook.
# In later lessons, you'll get to work directly with the code that's currently wrapped inside these helper functions, to give you more options to customize your RAG pipeline.
from utils import get_prebuilt_trulens_recorder
tru_recorder = get_prebuilt_trulens_recorder(query_engine,
app_id="Direct Query Engine")
with tru_recorder as recording:
for question in eval_questions:
response = query_engine.query(question)
records, feedback = tru.get_records_and_feedback(app_ids=[])
records.head()
# launches on http://localhost:8501/
tru.run_dashboard()
# Advanced RAG pipeline
# 1. Sentence Window retrieval
from llama_index.llms import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
from utils import build_sentence_window_index
sentence_index = build_sentence_window_index(
document,
llm,
embed_model="local:BAAI/bge-small-en-v1.5",
save_dir="sentence_index"
)
from utils import get_sentence_window_query_engine
sentence_window_engine = get_sentence_window_query_engine(sentence_index)
window_response = sentence_window_engine.query(
"how do I get started on a personal project in AI?"
)
print(str(window_response))
tru.reset_database()
tru_recorder_sentence_window = get_prebuilt_trulens_recorder(
sentence_window_engine,
app_id = "Sentence Window Query Engine"
)
for question in eval_questions:
with tru_recorder_sentence_window as recording:
response = sentence_window_engine.query(question)
print(question)
print(str(response))
tru.get_leaderboard(app_ids=[])
# launches on http://localhost:8501/
tru.run_dashboard()
# 2. Auto-merging retrieval
from utils import build_automerging_index
automerging_index = build_automerging_index(
documents,
llm,
embed_model="local:BAAI/bge-small-en-v1.5",
save_dir="merging_index"
)
from utils import get_automerging_query_engine
automerging_query_engine = get_automerging_query_engine(
automerging_index,
)
auto_merging_response = automerging_query_engine.query(
"How do I build a portfolio of AI projects?"
)
print(str(auto_merging_response))
tru.reset_database()
tru_recorder_automerging = get_prebuilt_trulens_recorder(automerging_query_engine,
app_id="Automerging Query Engine")
for question in eval_questions:
with tru_recorder_automerging as recording:
response = automerging_query_engine.query(question)
print(question)
print(response)
tru.get_leaderboard(app_ids=[])
# launches on http://localhost:8501/
tru.run_dashboard()
Lesson 2: RAG Triad of metrics
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
import warnings
warnings.filterwarnings('ignore')
import utils
import os
import openai
openai.api_key = utils.get_openai_api_key()
from trulens_eval import Tru
tru = Tru()
tru.reset_database()
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader(
input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
from llama_index import Document
document = Document(text="\n\n".\
join([doc.text for doc in documents]))
from utils import build_sentence_window_index
from llama_index.llms import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
sentence_index = build_sentence_window_index(
document,
llm,
embed_model="local:BAAI/bge-small-en-v1.5",
save_dir="sentence_index"
)
from utils import get_sentence_window_query_engine
sentence_window_engine = \
get_sentence_window_query_engine(sentence_index)
output = sentence_window_engine.query(
"How do you create your AI portfolio?")
output.response
# Feedback functions
import nest_asyncio
nest_asyncio.apply()
from trulens_eval import OpenAI as fOpenAI
provider = fOpenAI()
1. Answer Relevance
from trulens_eval import Feedback
f_qa_relevance = Feedback(
provider.relevance_with_cot_reasons,
name="Answer Relevance"
).on_input_output()
# 2. Context Relevance
from trulens_eval import TruLlama
context_selection = TruLlama.select_source_nodes().node.text
import numpy as np
f_qs_relevance = (
Feedback(provider.qs_relevance,
name="Context Relevance")
.on_input()
.on(context_selection)
.aggregate(np.mean)
)
import numpy as np
f_qs_relevance = (
Feedback(provider.qs_relevance_with_cot_reasons,
name="Context Relevance")
.on_input()
.on(context_selection)
.aggregate(np.mean)
)
# 3. Groundedness
from trulens_eval.feedback import Groundedness
grounded = Groundedness(groundedness_provider=provider)
f_groundedness = (
Feedback(grounded.groundedness_measure_with_cot_reasons,
name="Groundedness"
)
.on(context_selection)
.on_output()
.aggregate(grounded.grounded_statements_aggregator)
)
# Evaluation of the RAG application
from trulens_eval import TruLlama
from trulens_eval import FeedbackMode
tru_recorder = TruLlama(
sentence_window_engine,
app_id="App_1",
feedbacks=[
f_qa_relevance,
f_qs_relevance,
f_groundedness
]
)
eval_questions = []
with open('eval_questions.txt', 'r') as file:
for line in file:
# Remove newline character and convert to integer
item = line.strip()
eval_questions.append(item)
eval_questions
eval_questions.append("How can I be successful in AI?")
eval_questions
for question in eval_questions:
with tru_recorder as recording:
sentence_window_engine.query(question)
records, feedback = tru.get_records_and_feedback(app_ids=[])
records.head()
import pandas as pd
pd.set_option("display.max_colwidth", None)
records[["input", "output"] + feedback]
tru.get_leaderboard(app_ids=[])
tru.run_dashboard()
Lesson 3: Sentence Window Retrieval
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
import warnings
warnings.filterwarnings('ignore')
import utils
import os
import openai
openai.api_key = utils.get_openai_api_key()
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader(
input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])
from llama_index import Document
document = Document(text="\n\n".join([doc.text for doc in documents]))
Window-sentence retrieval setup
from llama_index.node_parser import SentenceWindowNodeParser
# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
text = "hello. how are you? I am fine! "
nodes = node_parser.get_nodes_from_documents([Document(text=text)])
print([x.text for x in nodes])
print(nodes[1].metadata["window"])
text = "hello. foo bar. cat dog. mouse"
nodes = node_parser.get_nodes_from_documents([Document(text=text)])
print([x.text for x in nodes])
print(nodes[0].metadata["window"])
Building the index
from llama_index.llms import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
from llama_index import ServiceContext
sentence_context = ServiceContext.from_defaults(
llm=llm,
embed_model="local:BAAI/bge-small-en-v1.5",
# embed_model="local:BAAI/bge-large-en-v1.5"
node_parser=node_parser,
)
from llama_index import VectorStoreIndex
sentence_index = VectorStoreIndex.from_documents(
[document], service_context=sentence_context
)
sentence_index.storage_context.persist(persist_dir="./sentence_index")
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it
import os
from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index import load_index_from_storage
if not os.path.exists("./sentence_index"):
sentence_index = VectorStoreIndex.from_documents(
[document], service_context=sentence_context
)
sentence_index.storage_context.persist(persist_dir="./sentence_index")
else:
sentence_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir="./sentence_index"),
service_context=sentence_context
)
Building the postprocessor
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
postproc = MetadataReplacementPostProcessor(
target_metadata_key="window"
)
from llama_index.schema import NodeWithScore
from copy import deepcopy
scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
nodes_old = [deepcopy(n) for n in nodes]
nodes_old[1].text
replaced_nodes = postproc.postprocess_nodes(scored_nodes)
print(replaced_nodes[1].text)
Adding a reranker
from llama_index.indices.postprocessor import SentenceTransformerRerank
# BAAI/bge-reranker-base
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
top_n=2, model="BAAI/bge-reranker-base"
)
from llama_index import QueryBundle
from llama_index.schema import TextNode, NodeWithScore
query = QueryBundle("I want a dog.")
scored_nodes = [
NodeWithScore(node=TextNode(text="This is a cat"), score=0.6),
NodeWithScore(node=TextNode(text="This is a dog"), score=0.4),
]
reranked_nodes = rerank.postprocess_nodes(
scored_nodes, query_bundle=query
)
print([(x.text, x.score) for x in reranked_nodes])
Runing the query engine
sentence_window_engine = sentence_index.as_query_engine(
similarity_top_k=6, node_postprocessors=[postproc, rerank]
)
window_response = sentence_window_engine.query(
"What are the keys to building a career in AI?"
)
from llama_index.response.notebook_utils import display_response
display_response(window_response)
Putting it all Together
import os
from llama_index import ServiceContext, VectorStoreIndex, StorageContext
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index import load_index_from_storage
def build_sentence_window_index(
documents,
llm,
embed_model="local:BAAI/bge-small-en-v1.5",
sentence_window_size=3,
save_dir="sentence_index",
):
# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=sentence_window_size,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
sentence_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
node_parser=node_parser,
)
if not os.path.exists(save_dir):
sentence_index = VectorStoreIndex.from_documents(
documents, service_context=sentence_context
)
sentence_index.storage_context.persist(persist_dir=save_dir)
else:
sentence_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir=save_dir),
service_context=sentence_context,
)
return sentence_index
def get_sentence_window_query_engine(
sentence_index, similarity_top_k=6, rerank_top_n=2
):
# define postprocessors
postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
rerank = SentenceTransformerRerank(
top_n=rerank_top_n, model="BAAI/bge-reranker-base"
)
sentence_window_engine = sentence_index.as_query_engine(
similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
)
return sentence_window_engine
from llama_index.llms import OpenAI
index = build_sentence_window_index(
[document],
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
save_dir="./sentence_index",
)
query_engine = get_sentence_window_query_engine(index, similarity_top_k=6)
TruLens Evaluation
eval_questions = []
with open('generated_questions.text', 'r') as file:
for line in file:
# Remove newline character and convert to integer
item = line.strip()
eval_questions.append(item)
from trulens_eval import Tru
def run_evals(eval_questions, tru_recorder, query_engine):
for question in eval_questions:
with tru_recorder as recording:
response = query_engine.query(question)
from utils import get_prebuilt_trulens_recorder
from trulens_eval import Tru
Tru().reset_database()
Sentence window size = 1
sentence_index_1 = build_sentence_window_index(
documents,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
embed_model="local:BAAI/bge-small-en-v1.5",
sentence_window_size=1,
save_dir="sentence_index_1",
)
sentence_window_engine_1 = get_sentence_window_query_engine(
sentence_index_1
)
tru_recorder_1 = get_prebuilt_trulens_recorder(
sentence_window_engine_1,
app_id='sentence window engine 1'
)
run_evals(eval_questions, tru_recorder_1, sentence_window_engine_1)
Tru().run_dashboard()
Note about the dataset of questions
Since this evaluation process takes a long time to run, the following file generated_questions.text contains one question (the one mentioned in the lecture video).
If you would like to explore other possible questions, feel free to explore the file directory by clicking on the "Jupyter" logo at the top right of this notebook. You'll see the following .text files:
generated_questions_01_05.text
generated_questions_06_10.text
generated_questions_11_15.text
generated_questions_16_20.text
generated_questions_21_24.text
Note that running an evaluation on more than one question can take some time, so we recommend choosing one of these files (with 5 questions each) to run and explore the results.
For evaluating a personal project, an eval set of 20 is reasonable.
For evaluating business applications, you may need a set of 100+ in order to cover all the use cases thoroughly.
Note that since API calls can sometimes fail, you may occasionally see null responses, and would want to re-run your evaluations. So running your evaluations in smaller batches can also help you save time and cost by only re-running the evaluation on the batches with issues.
eval_questions = []
with open('generated_questions.text', 'r') as file:
for line in file:
# Remove newline character and convert to integer
item = line.strip()
eval_questions.append(item)
Sentence window size = 3
sentence_index_3 = build_sentence_window_index(
documents,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
embed_model="local:BAAI/bge-small-en-v1.5",
sentence_window_size=3,
save_dir="sentence_index_3",
)
sentence_window_engine_3 = get_sentence_window_query_engine(
sentence_index_3
)
tru_recorder_3 = get_prebuilt_trulens_recorder(
sentence_window_engine_3,
app_id='sentence window engine 3'
)
run_evals(eval_questions, tru_recorder_3, sentence_window_engine_3)
Tru().run_dashboard()
Lesson 4: Auto-merging Retrieval
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
import warnings
warnings.filterwarnings('ignore')
import utils
import os
import openai
openai.api_key = utils.get_openai_api_key()
from llama_index import SimpleDirectoryReader
documents = SimpleDirectoryReader(
input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])
Auto-merging retrieval setup
from llama_index import Document
document = Document(text="\n\n".join([doc.text for doc in documents]))
from llama_index.node_parser import HierarchicalNodeParser
# create the hierarchical node parser w/ default settings
node_parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128]
)
nodes = node_parser.get_nodes_from_documents([document])
from llama_index.node_parser import get_leaf_nodes
leaf_nodes = get_leaf_nodes(nodes)
print(leaf_nodes[30].text)
nodes_by_id = {node.node_id: node for node in nodes}
parent_node = nodes_by_id[leaf_nodes[30].parent_node.node_id]
print(parent_node.text)
Building the index
from llama_index.llms import OpenAI
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
from llama_index import ServiceContext
auto_merging_context = ServiceContext.from_defaults(
llm=llm,
embed_model="local:BAAI/bge-small-en-v1.5",
node_parser=node_parser,
)
from llama_index import VectorStoreIndex, StorageContext
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
automerging_index = VectorStoreIndex(
leaf_nodes, storage_context=storage_context, service_context=auto_merging_context
)
automerging_index.storage_context.persist(persist_dir="./merging_index")
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it
import os
from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index import load_index_from_storage
if not os.path.exists("./merging_index"):
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
automerging_index = VectorStoreIndex(
leaf_nodes,
storage_context=storage_context,
service_context=auto_merging_context
)
automerging_index.storage_context.persist(persist_dir="./merging_index")
else:
automerging_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir="./merging_index"),
service_context=auto_merging_context
)
Defining the retriever and running the query engine
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.retrievers import AutoMergingRetriever
from llama_index.query_engine import RetrieverQueryEngine
automerging_retriever = automerging_index.as_retriever(
similarity_top_k=12
)
retriever = AutoMergingRetriever(
automerging_retriever,
automerging_index.storage_context,
verbose=True
)
rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")
auto_merging_engine = RetrieverQueryEngine.from_args(
automerging_retriever, node_postprocessors=[rerank]
)
auto_merging_response = auto_merging_engine.query(
"What is the importance of networking in AI?"
)
from llama_index.response.notebook_utils import display_response
display_response(auto_merging_response)
Putting it all Together
import os
from llama_index import (
ServiceContext,
StorageContext,
VectorStoreIndex,
load_index_from_storage,
)
from llama_index.node_parser import HierarchicalNodeParser
from llama_index.node_parser import get_leaf_nodes
from llama_index import StorageContext, load_index_from_storage
from llama_index.retrievers import AutoMergingRetriever
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.query_engine import RetrieverQueryEngine
def build_automerging_index(
documents,
llm,
embed_model="local:BAAI/bge-small-en-v1.5",
save_dir="merging_index",
chunk_sizes=None,
):
chunk_sizes = chunk_sizes or [2048, 512, 128]
node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
nodes = node_parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)
merging_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
if not os.path.exists(save_dir):
automerging_index = VectorStoreIndex(
leaf_nodes, storage_context=storage_context, service_context=merging_context
)
automerging_index.storage_context.persist(persist_dir=save_dir)
else:
automerging_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir=save_dir),
service_context=merging_context,
)
return automerging_index
def get_automerging_query_engine(
automerging_index,
similarity_top_k=12,
rerank_top_n=6,
):
base_retriever = automerging_index.as_retriever(similarity_top_k=similarity_top_k)
retriever = AutoMergingRetriever(
base_retriever, automerging_index.storage_context, verbose=True
)
rerank = SentenceTransformerRerank(
top_n=rerank_top_n, model="BAAI/bge-reranker-base"
)
auto_merging_engine = RetrieverQueryEngine.from_args(
retriever, node_postprocessors=[rerank]
)
return auto_merging_engine
from llama_index.llms import OpenAI
index = build_automerging_index(
[document],
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
save_dir="./merging_index",
)
query_engine = get_automerging_query_engine(index, similarity_top_k=6)
TruLens Evaluation
from trulens_eval import Tru
Tru().reset_database()
Two layers
auto_merging_index_0 = build_automerging_index(
documents,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
embed_model="local:BAAI/bge-small-en-v1.5",
save_dir="merging_index_0",
chunk_sizes=[2048,512],
)
auto_merging_engine_0 = get_automerging_query_engine(
auto_merging_index_0,
similarity_top_k=12,
rerank_top_n=6,
)
from utils import get_prebuilt_trulens_recorder
tru_recorder = get_prebuilt_trulens_recorder(
auto_merging_engine_0,
app_id ='app_0'
)
eval_questions = []
with open('generated_questions.text', 'r') as file:
for line in file:
# Remove newline character and convert to integer
item = line.strip()
eval_questions.append(item)
def run_evals(eval_questions, tru_recorder, query_engine):
for question in eval_questions:
with tru_recorder as recording:
response = query_engine.query(question)
run_evals(eval_questions, tru_recorder, auto_merging_engine_0)
from trulens_eval import Tru
Tru().get_leaderboard(app_ids=[])
Tru().run_dashboard()
Three layers
auto_merging_index_1 = build_automerging_index(
documents,
llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
embed_model="local:BAAI/bge-small-en-v1.5",
save_dir="merging_index_1",
chunk_sizes=[2048,512,128],
)
auto_merging_engine_1 = get_automerging_query_engine(
auto_merging_index_1,
similarity_top_k=12,
rerank_top_n=6,
)
tru_recorder = get_prebuilt_trulens_recorder(
auto_merging_engine_1,
app_id ='app_1'
)
run_evals(eval_questions, tru_recorder, auto_merging_engine_1)
from trulens_eval import Tru
Tru().get_leaderboard(app_ids=[])
Tru().run_dashboard()