From RAGs to Agentic RAGs. Large Language Models (LLMs) have… | by Abdul Rehman Raja | Red Buffer

Large Language Models (LLMs) have revolutionized natural language processing, but they face a significant limitation: their fixed context length. This constraint makes it challenging for LLMs to generate accurate responses based on vast amounts of information beyond their immediate context window. Enter Retrieval-Augmented Generation (RAG), a groundbreaking solution that combines the power of information retrieval with the generative capabilities of LLMs.

Now, we’re witnessing the next leap forward: Agentic RAGs. These advanced systems not only retrieve and generate content but also actively interact with the retrieved information, enabling autonomous decision-making and contextual adaptability. In this article, we’ll explore the evolution from traditional RAG to agentic RAG, diving deep into their architectures, implementation, and the game-changing role of function calling in LLMs.

Before we delve into agentic RAGs, let’s revisit the fundamentals of traditional RAG systems:

Document Processing: Long documents are broken into smaller, manageable chunks.
Embedding: An embedding model converts these text chunks into vector representations (embeddings).
Storage: The text chunks and their corresponding embeddings are stored in a vector database.
Query Processing: When a user submits a query, the same embedding model converts it into a vector.
Retrieval: Using a similarity function (e.g., cosine similarity), the system retrieves the most relevant text chunks from the vector database.
Generation: The retrieved chunks, along with the original query, are passed to the LLM for processing and response generation.

This approach allows LLMs to access and utilize vast amounts of external information, significantly expanding their knowledge base beyond their training data.

Agentic RAG takes this concept further by introducing an autonomous agent into the retrieval process. Here’s how it differs from traditional RAG:

Query Optimization: Before interacting with the vector database, an agent leverages the function-calling capabilities of LLMs to refine or expand the user’s query intelligently.
Iterative Retrieval: The agent can make multiple retrieval attempts, adjusting its approach based on the results of previous queries.
Context-Aware Processing: The refined query and retrieved documents are passed to the LLM, resulting in more accurate and contextually aware responses.

In the context of AI and machine learning, an agent is an autonomous entity that can perceive its environment, process information, and take actions to achieve specific goals. Unlike a traditional program that follows a fixed set of instructions, an agent is designed to make decisions and act independently based on its inputs.

Agents can exhibit varying levels of intelligence, ranging from simple systems that respond to predefined triggers, to more complex systems capable of learning, adapting, and even reasoning about the future. When integrated into AI systems like RAG, agents don’t just passively retrieve or generate content but can actively engage with and manipulate the retrieved data, assess the relevance, and decide on the next course of action.

This concept of autonomy and decision-making is what underpins the evolution from standard RAG models to agentic RAGs.

Function calling is a crucial feature that enables LLMs to interact with external tools and APIs, significantly expanding their capabilities. In the context of agentic RAGs, function calling allows the agent to:

Dynamically formulate and refine queries
Interact with the vector database
Process and analyze retrieved information
Make decisions on whether to continue searching or generate a response

Think of function calling as giving the LLM a toolbox. Instead of trying to solve every problem with its internal knowledge, the LLM can now reach for specific tools (functions) to accomplish tasks more efficiently and accurately.

The evolution from RAG to agentic RAG isn’t just a technical improvement, it has real-world implications:

More Accurate Information: By intelligently refining queries and assessing results, agentic RAG can provide more precise and relevant information.
Improved Context Understanding: The agent can better understand the nuances of your query and provide answers that truly address your needs.
Reduced Hallucination: With better retrieval and context understanding, there’s less chance of the AI “making things up” or providing incorrect information.
Efficiency: Agentic RAG tends to find accurate documents more often as compared to simple RAG. Agentic RAGs also reduce user effort as they are iteratively improving the query till the point they find the optimal query and relevant documents from the vector database.

Let’s jump into the implementation of agentic RAGs. Before we begin, make sure you have Python 3.12 or later installed on your system.

1. Setting Up Environment Variables

The first step is to initialize necessary environment variables like the Hugging Face API token to access models via Hugging Face’s endpoints.

from langchain.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.utils import DistanceStrategy
from transformers.agents import Tool, HfEngine, ReactJsonAgent
import os
import osos.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_*******************"  # Use key with "Write Permissions"

Here, the Hugging Face API token is securely stored as an environment variable.

2. Extracting Text from PDF Documents

We use the PyPDF2 library to read and extract text from PDFs stored in a folder. This block of code loops through each PDF and extracts its contents into the all_text variable.

folder_path = 'documents'
all_text = ''
for filename in os.listdir(folder_path):
if filename.endswith('.pdf'):
with open(os.path.join(folder_path, filename), 'rb') as pdf_file:
pdf_reader = PyPDF2.PdfReader(pdf_file)
for page_num in range(len(pdf_reader.pages)):
page = pdf_reader.pages[page_num]
all_text += page.extract_text()

This part helps us pre-process documents for later retrieval.

3. Splitting Text into Chunks

To make text retrieval more efficient, we split the extracted text into manageable chunks using RecursiveCharacterTextSplitter. This method uses a tokenizer from Hugging Face.

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("thenlper/gte-small")
text_splitter = RecursiveCharacterTextSplitter.from_huggingface_tokenizer(
tokenizer, chunk_size=500, chunk_overlap=50
)
chunks = text_splitter.split_text(all_text)

This allows the retriever to work on smaller, meaningful sections of text.

4. Initializing the Embedding Model and Vector Database

We use FAISS, an efficient library for similarity search, and Hugging Face embeddings to encode the chunks into vectors. This is a crucial step to enable semantic search later.

embedding_model = HuggingFaceEmbeddings(model_name="thenlper/gte-small")
vectordb = FAISS.from_texts(
texts=chunks,
embedding=embedding_model,
distance_strategy=DistanceStrategy.COSINE,
)

Here, the FAISS index is built with vectors representing the document chunks.

5. Implementing a Custom Retriever Tool

I created a custom tool called RetrieverTool that performs a semantic similarity search using the FAISS index. The tool returns the documents most closely related to the input query.

class RetrieverTool(Tool):
name = "retriever"
description = "Using semantic similarity, retrieves some documents from the knowledge base that have the closest embeddings to the input query."
inputs = {
"query": {
"type": "text",
"description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.",
}
}
output_type = "text"def __init__(self, vectordb, **kwargs):
super().__init__(**kwargs)
self.vectordb = vectordb
def forward(self, query: str) -> str:
assert isinstance(query, str), "Your search query must be a string"
docs = self.vectordb.similarity_search(
query,
k=7,
)
return "\nRetrieved documents:\n" + "".join(
[f"===== Document {str(i)} =====\n" + doc.page_content for i, doc in enumerate(docs)]
)

This tool plays a key role in retrieving relevant information based on the input query.

6. Agentic RAG: Using an Agent to Optimize Queries

The next part is where we use the ReactJsonAgent to call the retriever tool repeatedly with semantically modified queries until a relevant result is found.

retriever_tool = RetrieverTool(vectordb)
llm_engine = HfEngine("meta-llama/Meta-Llama-3-8B-Instruct")
agent = ReactJsonAgent(tools=[retriever_tool], llm_engine=llm_engine)

The agent uses the retriever tool multiple times, adjusting its queries based on feedback until a satisfactory answer is obtained.

7. Running the Agent

Finally, the function run_agentic_rag orchestrates the retrieval process, feeding the agent a query and letting it iterate until it finds an answer.

def run_agentic_rag(question: str) -> str:
enhanced_question = f"""Using the information contained in your knowledge base, which you can access with the 'retriever' tool,
give a comprehensive answer to the question below.
Respond only to the question asked, response should be concise and relevant to the question.
If you cannot find information, do not give up and try calling your retriever again with different arguments!
Make sure to have covered the question completely by calling the retriever tool several times with semantically different queries.
Your queries should not be questions but affirmative form sentences.Question:
{question}"""
return agent.run(enhanced_question)

question = "Your Question or query statement"
answer = run_agentic_rag(question)
print(f"Question: {question}")
print(f"Answer: {answer}")

This process mimics how human agents work to retrieve precise information.

The journey from RAG to agentic RAG represents a significant leap in AI’s ability to understand, retrieve, and communicate information. As these systems become more sophisticated, they promise to transform how we interact with information, making the vast wealth of human knowledge more accessible and useful than ever before.

While there are certainly challenges to address — such as ensuring data privacy and preventing misuse — the potential benefits of agentic RAG systems are enormous. They represent not just an improvement in AI technology, but a step towards AI that can truly understand and assist us in meaningful ways.