Vector

Posted Jul 18, 2023

By ainotes 26 min read

Q&A

why vector database used for RAG?

Approximate Nearest Neighbors (ANN) is a technique used in data retrieval and machine learning for efficiently finding approximate nearest neighbors to a given query point or vector in a high-dimensional space. Here are some key points about ANN: 1. **Nearest Neighbor Search**: In many applications, especially those involving large datasets with high-dimensional feature spaces (such as text data represented as embeddings), finding the exact nearest neighbor can be computationally expensive and impractical. 2. **Approximate Solution**: ANN algorithms provide a trade-off between accuracy and efficiency by finding an approximate nearest neighbor that is close to the query point but not necessarily the closest possible neighbor. 3. **Efficiency**: ANN methods are designed to work efficiently even in high-dimensional spaces, where traditional exact nearest neighbor search methods may become prohibitively slow. 4. **Applications**: ANN algorithms are widely used in various fields, including information retrieval (such as in search engines for finding similar documents), recommendation systems (finding similar items or users), computer vision (image retrieval), and natural language processing (semantic similarity tasks). 5. **Techniques**: There are several techniques and algorithms for ANN, including but not limited to: - **Locality-Sensitive Hashing (LSH)**: Uses hashing functions to map similar points to the same buckets, allowing for efficient retrieval of approximate nearest neighbors. - **Tree-based Methods**: Structures like KD-trees, Ball trees, and their variants organize the data to speed up nearest neighbor searches. - **Graph-Based Approaches**: Constructing graphs where nodes represent data points and edges represent proximity, allowing for efficient traversal to find approximate nearest neighbors. ANN methods play a crucial role in many machine learning and data retrieval tasks where quick retrieval of similar items or data points is essential.

why vector database used for RAG?

Vector databases

Vector databases are a key part of building scalable AI-powered applications. Vector databases provide long term memory, on top of an existing machine learning model.

Without a vector database, you would need to train your model (or models) or re-run your dataset through a model before making a query, which would be slow and expensive.

Why is a vector database useful?

A vector database determines what other data (represented as vectors) is near your input query. This allows you to build different use-cases on top of a vector database, including:

Semantic search, used to return results similar to the input of the query. Classification, used to return the grouping (or groupings) closest to the input query. Recommendation engines, used to return content similar to the input based on different criteria (for example previous product sales, or user history). Anomaly detection, used to identify whether specific data points are similar to existing data, or different. Vector databases can also power Retrieval Augmented Generation (RAG) tasks, which allow you to bring additional context to LLMs (Large Language Models) by using the context from a vector search to augment the user prompt.

Vector search

In a traditional vector search use-case, queries are made against a vector database by passing it a query vector, and having the vector database return a configurable list of vectors with the shortest distance (“most similar”) to the query vector.

The step-by-step workflow resembles the below:

A developer converts their existing dataset (documentation, images, logs stored in R2) into a set of vector embeddings (a one-way representation) by passing them through a machine learning model that is trained for that data type. The output embeddings are inserted into a Vectorize database index. A search query, classification request or anomaly detection query is also passed through the same ML model, returning a vector embedding representation of the query. Vectorize is queried with this embedding, and returns a set of the most similar vector embeddings to the provided query. The returned embeddings are used to retrieve the original source objects from dedicated storage (for example, R2, KV, and D1) and returned back to the user. In a workflow without a vector database, you would need to pass your entire dataset alongside your query each time, which is neither practical (models have limits on input size) and would consume significant resources and time.

Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is an approach used to improve the context provided to an LLM (Large Language Model) in generative AI use-cases, including chatbot and general question-answer applications. The vector database is used to enhance the prompt passed to the LLM by adding additional context alongside the query.

Instead of passing the prompt directly to the LLM, in the RAG approach you:

Generate vector embeddings from an existing dataset or corpus (for example, the dataset you want to use to add additional context to the LLMs response). An existing dataset or corpus could be a product documentation, research data, technical specifications, or your product catalog and descriptions. Store the output embeddings in a Vectorize database index. When a user initiates a prompt, instead of passing it (without additional context) to the LLM, you augment it with additional context:

The user prompt is passed into the same ML model used for your dataset, returning a vector embedding representation of the query. This embedding is used as the query (semantic search) against the vector database, which returns similar vectors. These vectors are used to look up the content they relate to (if not embedded directly alongside the vectors as metadata). This content is provided as context alongside the original user prompt, providing additional context to the LLM and allowing it to return an answer that is likely to be far more contextual than the standalone prompt. Refer to the RAG using Workers AI tutorial to learn how to combine Workers AI and Vectorize for generative AI use-cases.

1 You can learn more about the theory behind RAG by reading the RAG paper.

Terminology ## Databases and indexes In Vectorize, a database and an index are the same concept. Each index you create is separate from other indexes you create. Vectorize automatically manages optimizing and re-generating the index for you when you insert new data.

Vector Embeddings

Vector embeddings represent the features of a machine learning model as a numerical vector (array of numbers). They are a one-way representation that encodes how a machine learning model understands the input(s) provided to it, based on how the model was originally trained and its’ internal structure.

For example, a text embedding model available in Workers AI is able to take text input and represent it as a 768-dimension vector. The text This is a story about an orange cloud, when represented as a vector embedding, resembles the following:

[-0.019273685291409492,-0.01913292706012726,<764 dimensions here>,0.0007094172760844231,0.043409910053014755]

When a model considers the features of an input as “similar” (based on its understanding), the distance between the vector embeddings for those two inputs will be short.

Dimensions

Vector dimensions describe the width of a vector embedding. The width of a vector embedding is the number of floating point elements that comprise a given vector.

The number of dimensions are defined by the machine learning model used to generate the vector embeddings, and how it represents input features based on its internal model and complexity. More dimensions (“wider” vectors) may provide more accuracy at the cost of compute and memory resources, as well as latency (speed) of vector search.

Refer to the dimensions documentation to learn how to configure the accepted vector dimension size when creating a Vectorize index.

Distance metrics

The distance metric is an index used for vector search. It defines how it determines how close your query vector is to other vectors within the index.

Distance metrics determine how the vector search engine assesses similarity between vectors. Cosine, Euclidean (L2), and Dot Product are the most commonly used distance metrics in vector search. The machine learning model and type of embedding you use will determine which distance metric is best suited for your use-case. Different metrics determine different scoring characteristics. For example, the cosine distance metric is well suited to text, sentence similarity and/or document search use-cases. euclidean can be better suited for image or speech recognition use-cases. Refer to the distance metrics documentation to learn how to configure a distance metric when creating a Vectorize index.

Lesson 1: Advanced RAG Pipeline

1. Importing Libraries and Loading Documents

The code begins by importing necessary libraries and setting up the OpenAI API key.
It loads a PDF document, extracts its text, and prints some information about the loaded documents.

2. Basic RAG Pipeline

A basic RAG pipeline is set up using the llama_index library. It involves creating a Document object, an OpenAI language model (LLM), and a VectorStoreIndex.
A query is performed on the index using the query engine.

3. Evaluation Setup using TruLens

TruLens, a tool for evaluating retrieval-based models, is introduced.
A list of evaluation questions is read from a file, and a Tru object is created for evaluation.

4. TruLens Recording and Dashboard

TruLens recording and dashboard functionalities are utilized to evaluate the basic RAG pipeline.
The code demonstrates the use of helper functions to set up TruLens recording.

5. Advanced RAG Pipeline - Sentence Window Retrieval

A more advanced RAG pipeline is introduced, focusing on sentence window retrieval.
A custom index and query engine are built for sentence-level retrieval.
TruLens recording and evaluation are performed for the advanced pipeline.

6. Advanced RAG Pipeline - Auto-Merging Retrieval

Another aspect of the advanced pipeline is auto-merging retrieval.
A custom index and query engine are built for merging hierarchical document structures.
TruLens recording and evaluation are performed for this aspect of the pipeline.

Lesson 2: RAG Triad of Metrics

1. Importing Libraries and Setting up TruLens

The code begins by importing necessary libraries, including TruLens.
TruLens is initialized, and a list of evaluation questions is read from a file.

2. Setting up RAG Metrics - Answer Relevance

Feedback functions are introduced for evaluating answer relevance.
The code sets up a feedback function for answer relevance using TruLens.

3. Setting up RAG Metrics - Context Relevance

Feedback functions are introduced for evaluating context relevance.
The code sets up feedback functions for context relevance using TruLens.

4. Setting up RAG Metrics - Groundedness

Feedback functions are introduced for evaluating groundedness.
The code sets up a feedback function for groundedness using TruLens.

5. Evaluation of the RAG Application

TruLens is used to record and evaluate the RAG application using the defined metrics.
Evaluation questions are queried, and the results, including feedback, are displayed.

Lesson 3: Sentence Window Retrieval

1. Importing Libraries and Loading Documents

Similar to Lesson 1, the code starts by importing libraries and loading documents.

2. Window-Sentence Retrieval Setup

The code demonstrates the setup for window-sentence retrieval, explaining how a hierarchical node parser is used.

3. Building the Index

The code builds an index for sentence window retrieval using a custom node parser.

4. Putting it all Together

Helper functions are provided for building the sentence window index and getting the query engine.
The overall process of building the index, setting up postprocessors, and running queries is explained.

5. TruLens Evaluation

TruLens is used for the evaluation of the sentence window retrieval system.
The code sets up questions for evaluation and runs the evaluations using TruLens.

Lesson 4: Auto-Merging Retrieval

1. Importing Libraries and Loading Documents

Similar to the previous lessons, the code starts by importing libraries and loading documents.

2. Auto-Merging Retrieval Setup

The code introduces the setup for auto-merging retrieval using a hierarchical node parser.
It builds an index for auto-merging retrieval and explains the hierarchical structure.

3. Defining the Retriever and Running the Query Engine

A retriever is defined for auto-merging retrieval, and a query engine is set up for querying questions.
The code uses a SentenceTransformerRerank for post-processing.

4. Putting it all Together

Helper functions are provided for building the auto-merging index and getting the query engine.
The overall process of building the index, defining the retriever, and running queries is explained.

5. TruLens Evaluation

TruLens is used for the evaluation of the auto-merging retrieval system.
The code sets up questions for evaluation and runs the evaluations using TruLens.

This breakdown provides a high-level overview of the code in markdown format for each lesson. Let me know if you need further clarification on any specific part.

  
import utils
​
import os
import openai
openai.api_key = utils.get_openai_api_key()
from llama_index import SimpleDirectoryReader
​
documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])

# Basic RAG pipeline
from llama_index import Document
​
document = Document(text="\n\n".join([doc.text for doc in documents]))
from llama_index import VectorStoreIndex
from llama_index import ServiceContext
from llama_index.llms import OpenAI
​
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
service_context = ServiceContext.from_defaults(
    llm=llm, embed_model="local:BAAI/bge-small-en-v1.5"
)
index = VectorStoreIndex.from_documents([document],
                                        service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query(
    "What are steps to take when finding projects to build your experience?"
)
print(str(response))
Evaluation setup using TruLens
eval_questions = []
with open('eval_questions.txt', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        print(item)
        eval_questions.append(item)
# You can try your own question:
new_question = "What is the right AI job for me?"
eval_questions.append(new_question)
print(eval_questions)
from trulens_eval import Tru
tru = Tru()
​
tru.reset_database()
# For the classroom, we've written some of the code in helper functions inside a utils.py file.

# You can view the utils.py file in the file directory by clicking on the "Jupyter" logo at the top of the notebook.
# In later lessons, you'll get to work directly with the code that's currently wrapped inside these helper functions, to give you more options to customize your RAG pipeline.
from utils import get_prebuilt_trulens_recorder
​
tru_recorder = get_prebuilt_trulens_recorder(query_engine,
                                             app_id="Direct Query Engine")
with tru_recorder as recording:
    for question in eval_questions:
        response = query_engine.query(question)
records, feedback = tru.get_records_and_feedback(app_ids=[])
records.head()
# launches on http://localhost:8501/
tru.run_dashboard()
# Advanced RAG pipeline
# 1. Sentence Window retrieval
from llama_index.llms import OpenAI
​
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
from utils import build_sentence_window_index
​
sentence_index = build_sentence_window_index(
    document,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)
from utils import get_sentence_window_query_engine
​
sentence_window_engine = get_sentence_window_query_engine(sentence_index)
window_response = sentence_window_engine.query(
    "how do I get started on a personal project in AI?"
)
print(str(window_response))
tru.reset_database()
​
tru_recorder_sentence_window = get_prebuilt_trulens_recorder(
    sentence_window_engine,
    app_id = "Sentence Window Query Engine"
)
for question in eval_questions:
    with tru_recorder_sentence_window as recording:
        response = sentence_window_engine.query(question)
        print(question)
        print(str(response))
tru.get_leaderboard(app_ids=[])
# launches on http://localhost:8501/
tru.run_dashboard()
# 2. Auto-merging retrieval
from utils import build_automerging_index
​
automerging_index = build_automerging_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index"
)
from utils import get_automerging_query_engine
​
automerging_query_engine = get_automerging_query_engine(
    automerging_index,
)
auto_merging_response = automerging_query_engine.query(
    "How do I build a portfolio of AI projects?"
)
print(str(auto_merging_response))
tru.reset_database()
​
tru_recorder_automerging = get_prebuilt_trulens_recorder(automerging_query_engine,
                                                         app_id="Automerging Query Engine")
for question in eval_questions:
    with tru_recorder_automerging as recording:
        response = automerging_query_engine.query(question)
        print(question)
        print(response)
tru.get_leaderboard(app_ids=[])
# launches on http://localhost:8501/
tru.run_dashboard()

Lesson 2: RAG Triad of metrics

  
import warnings
warnings.filterwarnings('ignore')
import utils
​
import os
import openai
openai.api_key = utils.get_openai_api_key()
from trulens_eval import Tru
​
tru = Tru()
tru.reset_database()
from llama_index import SimpleDirectoryReader
​
documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
from llama_index import Document
​
document = Document(text="\n\n".\
                    join([doc.text for doc in documents]))
from utils import build_sentence_window_index
​
from llama_index.llms import OpenAI
​
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
​
sentence_index = build_sentence_window_index(
    document,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="sentence_index"
)
from utils import get_sentence_window_query_engine
​
sentence_window_engine = \
get_sentence_window_query_engine(sentence_index)
output = sentence_window_engine.query(
    "How do you create your AI portfolio?")
output.response
# Feedback functions
import nest_asyncio
​
nest_asyncio.apply()
from trulens_eval import OpenAI as fOpenAI
​
provider = fOpenAI()
1. Answer Relevance
from trulens_eval import Feedback
​
f_qa_relevance = Feedback(
    provider.relevance_with_cot_reasons,
    name="Answer Relevance"
).on_input_output()
# 2. Context Relevance
from trulens_eval import TruLlama
​
context_selection = TruLlama.select_source_nodes().node.text
import numpy as np
​
f_qs_relevance = (
    Feedback(provider.qs_relevance,
             name="Context Relevance")
    .on_input()
    .on(context_selection)
    .aggregate(np.mean)
)
import numpy as np
​
f_qs_relevance = (
    Feedback(provider.qs_relevance_with_cot_reasons,
             name="Context Relevance")
    .on_input()
    .on(context_selection)
    .aggregate(np.mean)
)

# 3. Groundedness
from trulens_eval.feedback import Groundedness
​
grounded = Groundedness(groundedness_provider=provider)
f_groundedness = (
    Feedback(grounded.groundedness_measure_with_cot_reasons,
             name="Groundedness"
            )
    .on(context_selection)
    .on_output()
    .aggregate(grounded.grounded_statements_aggregator)
)

# Evaluation of the RAG application
from trulens_eval import TruLlama
from trulens_eval import FeedbackMode
​
tru_recorder = TruLlama(
    sentence_window_engine,
    app_id="App_1",
    feedbacks=[
        f_qa_relevance,
        f_qs_relevance,
        f_groundedness
    ]
)
eval_questions = []
with open('eval_questions.txt', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)
eval_questions
eval_questions.append("How can I be successful in AI?")
eval_questions
for question in eval_questions:
    with tru_recorder as recording:
        sentence_window_engine.query(question)
records, feedback = tru.get_records_and_feedback(app_ids=[])
records.head()
import pandas as pd
​
pd.set_option("display.max_colwidth", None)
records[["input", "output"] + feedback]
tru.get_leaderboard(app_ids=[])
tru.run_dashboard()

Lesson 3: Sentence Window Retrieval

  
import warnings
warnings.filterwarnings('ignore')
import utils
​
import os
import openai
openai.api_key = utils.get_openai_api_key()
from llama_index import SimpleDirectoryReader
​
documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])
from llama_index import Document
​
document = Document(text="\n\n".join([doc.text for doc in documents]))
Window-sentence retrieval setup
from llama_index.node_parser import SentenceWindowNodeParser
​
# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)
text = "hello. how are you? I am fine!  "
​
nodes = node_parser.get_nodes_from_documents([Document(text=text)])
print([x.text for x in nodes])
print(nodes[1].metadata["window"])
text = "hello. foo bar. cat dog. mouse"
​
nodes = node_parser.get_nodes_from_documents([Document(text=text)])
print([x.text for x in nodes])
print(nodes[0].metadata["window"])
Building the index
from llama_index.llms import OpenAI
​
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
from llama_index import ServiceContext
​
sentence_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    # embed_model="local:BAAI/bge-large-en-v1.5"
    node_parser=node_parser,
)
from llama_index import VectorStoreIndex
​
sentence_index = VectorStoreIndex.from_documents(
    [document], service_context=sentence_context
)
sentence_index.storage_context.persist(persist_dir="./sentence_index")
​
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it
​
import os
from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index import load_index_from_storage
​
if not os.path.exists("./sentence_index"):
    sentence_index = VectorStoreIndex.from_documents(
        [document], service_context=sentence_context
    )
​
    sentence_index.storage_context.persist(persist_dir="./sentence_index")
else:
    sentence_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./sentence_index"),
        service_context=sentence_context
    )
Building the postprocessor
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
​
postproc = MetadataReplacementPostProcessor(
    target_metadata_key="window"
)
from llama_index.schema import NodeWithScore
from copy import deepcopy
​
scored_nodes = [NodeWithScore(node=x, score=1.0) for x in nodes]
nodes_old = [deepcopy(n) for n in nodes]
nodes_old[1].text
replaced_nodes = postproc.postprocess_nodes(scored_nodes)
print(replaced_nodes[1].text)
Adding a reranker
from llama_index.indices.postprocessor import SentenceTransformerRerank
​
# BAAI/bge-reranker-base
# link: https://huggingface.co/BAAI/bge-reranker-base
rerank = SentenceTransformerRerank(
    top_n=2, model="BAAI/bge-reranker-base"
)
from llama_index import QueryBundle
from llama_index.schema import TextNode, NodeWithScore
​
query = QueryBundle("I want a dog.")
​
scored_nodes = [
    NodeWithScore(node=TextNode(text="This is a cat"), score=0.6),
    NodeWithScore(node=TextNode(text="This is a dog"), score=0.4),
]
reranked_nodes = rerank.postprocess_nodes(
    scored_nodes, query_bundle=query
)
print([(x.text, x.score) for x in reranked_nodes])
Runing the query engine
sentence_window_engine = sentence_index.as_query_engine(
    similarity_top_k=6, node_postprocessors=[postproc, rerank]
)
window_response = sentence_window_engine.query(
    "What are the keys to building a career in AI?"
)
from llama_index.response.notebook_utils import display_response
​
display_response(window_response)
Putting it all Together
import os
from llama_index import ServiceContext, VectorStoreIndex, StorageContext
from llama_index.node_parser import SentenceWindowNodeParser
from llama_index.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index import load_index_from_storage
​
​
def build_sentence_window_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=3,
    save_dir="sentence_index",
):
    # create the sentence window node parser w/ default settings
    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=sentence_window_size,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
    )
    sentence_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
        node_parser=node_parser,
    )
    if not os.path.exists(save_dir):
        sentence_index = VectorStoreIndex.from_documents(
            documents, service_context=sentence_context
        )
        sentence_index.storage_context.persist(persist_dir=save_dir)
    else:
        sentence_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=sentence_context,
        )
​
    return sentence_index
​
​
def get_sentence_window_query_engine(
    sentence_index, similarity_top_k=6, rerank_top_n=2
):
    # define postprocessors
    postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="BAAI/bge-reranker-base"
    )
​
    sentence_window_engine = sentence_index.as_query_engine(
        similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
    )
    return sentence_window_engine
from llama_index.llms import OpenAI
​
index = build_sentence_window_index(
    [document],
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    save_dir="./sentence_index",
)
​
query_engine = get_sentence_window_query_engine(index, similarity_top_k=6)
​
TruLens Evaluation
eval_questions = []
with open('generated_questions.text', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)
from trulens_eval import Tru
​
def run_evals(eval_questions, tru_recorder, query_engine):
    for question in eval_questions:
        with tru_recorder as recording:
            response = query_engine.query(question)
from utils import get_prebuilt_trulens_recorder
​
from trulens_eval import Tru
​
Tru().reset_database()
Sentence window size = 1
sentence_index_1 = build_sentence_window_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=1,
    save_dir="sentence_index_1",
)
sentence_window_engine_1 = get_sentence_window_query_engine(
    sentence_index_1
)
tru_recorder_1 = get_prebuilt_trulens_recorder(
    sentence_window_engine_1,
    app_id='sentence window engine 1'
)
run_evals(eval_questions, tru_recorder_1, sentence_window_engine_1)
Tru().run_dashboard()
Note about the dataset of questions
Since this evaluation process takes a long time to run, the following file generated_questions.text contains one question (the one mentioned in the lecture video).
If you would like to explore other possible questions, feel free to explore the file directory by clicking on the "Jupyter" logo at the top right of this notebook. You'll see the following .text files:
generated_questions_01_05.text
generated_questions_06_10.text
generated_questions_11_15.text
generated_questions_16_20.text
generated_questions_21_24.text
Note that running an evaluation on more than one question can take some time, so we recommend choosing one of these files (with 5 questions each) to run and explore the results.

For evaluating a personal project, an eval set of 20 is reasonable.
For evaluating business applications, you may need a set of 100+ in order to cover all the use cases thoroughly.
Note that since API calls can sometimes fail, you may occasionally see null responses, and would want to re-run your evaluations. So running your evaluations in smaller batches can also help you save time and cost by only re-running the evaluation on the batches with issues.
eval_questions = []
with open('generated_questions.text', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)
Sentence window size = 3
sentence_index_3 = build_sentence_window_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    sentence_window_size=3,
    save_dir="sentence_index_3",
)
sentence_window_engine_3 = get_sentence_window_query_engine(
    sentence_index_3
)
​
tru_recorder_3 = get_prebuilt_trulens_recorder(
    sentence_window_engine_3,
    app_id='sentence window engine 3'
)
run_evals(eval_questions, tru_recorder_3, sentence_window_engine_3)
Tru().run_dashboard()

Lesson 4: Auto-merging Retrieval

  
import warnings
warnings.filterwarnings('ignore')
import utils
​
import os
import openai
openai.api_key = utils.get_openai_api_key()
from llama_index import SimpleDirectoryReader
​
documents = SimpleDirectoryReader(
    input_files=["./eBook-How-to-Build-a-Career-in-AI.pdf"]
).load_data()
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]))
print(documents[0])
Auto-merging retrieval setup
from llama_index import Document
​
document = Document(text="\n\n".join([doc.text for doc in documents]))
from llama_index.node_parser import HierarchicalNodeParser
​
# create the hierarchical node parser w/ default settings
node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)
nodes = node_parser.get_nodes_from_documents([document])
from llama_index.node_parser import get_leaf_nodes
​
leaf_nodes = get_leaf_nodes(nodes)
print(leaf_nodes[30].text)
nodes_by_id = {node.node_id: node for node in nodes}
​
parent_node = nodes_by_id[leaf_nodes[30].parent_node.node_id]
print(parent_node.text)
Building the index
from llama_index.llms import OpenAI
​
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
from llama_index import ServiceContext
​
auto_merging_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    node_parser=node_parser,
)
from llama_index import VectorStoreIndex, StorageContext
​
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
​
automerging_index = VectorStoreIndex(
    leaf_nodes, storage_context=storage_context, service_context=auto_merging_context
)
​
automerging_index.storage_context.persist(persist_dir="./merging_index")
# This block of code is optional to check
# if an index file exist, then it will load it
# if not, it will rebuild it
​
import os
from llama_index import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index import load_index_from_storage
​
if not os.path.exists("./merging_index"):
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)
​
    automerging_index = VectorStoreIndex(
            leaf_nodes,
            storage_context=storage_context,
            service_context=auto_merging_context
        )
​
    automerging_index.storage_context.persist(persist_dir="./merging_index")
else:
    automerging_index = load_index_from_storage(
        StorageContext.from_defaults(persist_dir="./merging_index"),
        service_context=auto_merging_context
    )
​
Defining the retriever and running the query engine
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.retrievers import AutoMergingRetriever
from llama_index.query_engine import RetrieverQueryEngine
​
automerging_retriever = automerging_index.as_retriever(
    similarity_top_k=12
)
​
retriever = AutoMergingRetriever(
    automerging_retriever,
    automerging_index.storage_context,
    verbose=True
)
​
rerank = SentenceTransformerRerank(top_n=6, model="BAAI/bge-reranker-base")
​
auto_merging_engine = RetrieverQueryEngine.from_args(
    automerging_retriever, node_postprocessors=[rerank]
)
auto_merging_response = auto_merging_engine.query(
    "What is the importance of networking in AI?"
)
from llama_index.response.notebook_utils import display_response
​
display_response(auto_merging_response)
Putting it all Together
import os
​
from llama_index import (
    ServiceContext,
    StorageContext,
    VectorStoreIndex,
    load_index_from_storage,
)
from llama_index.node_parser import HierarchicalNodeParser
from llama_index.node_parser import get_leaf_nodes
from llama_index import StorageContext, load_index_from_storage
from llama_index.retrievers import AutoMergingRetriever
from llama_index.indices.postprocessor import SentenceTransformerRerank
from llama_index.query_engine import RetrieverQueryEngine
​
​
def build_automerging_index(
    documents,
    llm,
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index",
    chunk_sizes=None,
):
    chunk_sizes = chunk_sizes or [2048, 512, 128]
    node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
    nodes = node_parser.get_nodes_from_documents(documents)
    leaf_nodes = get_leaf_nodes(nodes)
    merging_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
    )
    storage_context = StorageContext.from_defaults()
    storage_context.docstore.add_documents(nodes)
​
    if not os.path.exists(save_dir):
        automerging_index = VectorStoreIndex(
            leaf_nodes, storage_context=storage_context, service_context=merging_context
        )
        automerging_index.storage_context.persist(persist_dir=save_dir)
    else:
        automerging_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=save_dir),
            service_context=merging_context,
        )
    return automerging_index
​
​
def get_automerging_query_engine(
    automerging_index,
    similarity_top_k=12,
    rerank_top_n=6,
):
    base_retriever = automerging_index.as_retriever(similarity_top_k=similarity_top_k)
    retriever = AutoMergingRetriever(
        base_retriever, automerging_index.storage_context, verbose=True
    )
    rerank = SentenceTransformerRerank(
        top_n=rerank_top_n, model="BAAI/bge-reranker-base"
    )
    auto_merging_engine = RetrieverQueryEngine.from_args(
        retriever, node_postprocessors=[rerank]
    )
    return auto_merging_engine
from llama_index.llms import OpenAI
​
index = build_automerging_index(
    [document],
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    save_dir="./merging_index",
)
​
query_engine = get_automerging_query_engine(index, similarity_top_k=6)
TruLens Evaluation
from trulens_eval import Tru
​
Tru().reset_database()
Two layers
auto_merging_index_0 = build_automerging_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index_0",
    chunk_sizes=[2048,512],
)
auto_merging_engine_0 = get_automerging_query_engine(
    auto_merging_index_0,
    similarity_top_k=12,
    rerank_top_n=6,
)
from utils import get_prebuilt_trulens_recorder
​
tru_recorder = get_prebuilt_trulens_recorder(
    auto_merging_engine_0,
    app_id ='app_0'
)
eval_questions = []
with open('generated_questions.text', 'r') as file:
    for line in file:
        # Remove newline character and convert to integer
        item = line.strip()
        eval_questions.append(item)
def run_evals(eval_questions, tru_recorder, query_engine):
    for question in eval_questions:
        with tru_recorder as recording:
            response = query_engine.query(question)
run_evals(eval_questions, tru_recorder, auto_merging_engine_0)
from trulens_eval import Tru
​
Tru().get_leaderboard(app_ids=[])
Tru().run_dashboard()
Three layers
auto_merging_index_1 = build_automerging_index(
    documents,
    llm=OpenAI(model="gpt-3.5-turbo", temperature=0.1),
    embed_model="local:BAAI/bge-small-en-v1.5",
    save_dir="merging_index_1",
    chunk_sizes=[2048,512,128],
)
auto_merging_engine_1 = get_automerging_query_engine(
    auto_merging_index_1,
    similarity_top_k=12,
    rerank_top_n=6,
)
​
tru_recorder = get_prebuilt_trulens_recorder(
    auto_merging_engine_1,
    app_id ='app_1'
)
run_evals(eval_questions, tru_recorder, auto_merging_engine_1)
from trulens_eval import Tru
​
Tru().get_leaderboard(app_ids=[])
Tru().run_dashboard()

NLP, Transformer, vector

GenAI

This post is licensed under CC BY 4.0 by the author.

Q&A

Vector databases

​Why is a vector database useful?

Vector search

Retrieval Augmented Generation

Vector Embeddings

Dimensions

​Distance metrics

Lesson 1: Advanced RAG Pipeline

Lesson 1: Advanced RAG Pipeline

1. Importing Libraries and Loading Documents

2. Basic RAG Pipeline

3. Evaluation Setup using TruLens

4. TruLens Recording and Dashboard

5. Advanced RAG Pipeline - Sentence Window Retrieval

6. Advanced RAG Pipeline - Auto-Merging Retrieval

Lesson 2: RAG Triad of Metrics

1. Importing Libraries and Setting up TruLens

2. Setting up RAG Metrics - Answer Relevance

3. Setting up RAG Metrics - Context Relevance

4. Setting up RAG Metrics - Groundedness

5. Evaluation of the RAG Application

Lesson 3: Sentence Window Retrieval

1. Importing Libraries and Loading Documents

2. Window-Sentence Retrieval Setup

3. Building the Index

4. Putting it all Together

5. TruLens Evaluation

Lesson 4: Auto-Merging Retrieval

1. Importing Libraries and Loading Documents

2. Auto-Merging Retrieval Setup

3. Defining the Retriever and Running the Query Engine

4. Putting it all Together

5. TruLens Evaluation

Lesson 2: RAG Triad of metrics

Trending Tags

Why is a vector database useful?

Distance metrics