Reasoning-Enhanced RAG Pipeline

A sophisticated Retrieval-Augmented Generation (RAG) system that combines graph-based knowledge representation, multi-step reasoning, and advanced verification mechanisms to answer scientific queries with high accuracy.

Quick Start

Prerequisites

Google Colab account (recommended) or local Python 3.8+
GROQ API key (free at https://console.groq.com)
GPU recommended but not required

Installation & Setup

Open in Google Colab
- Upload the notebook or create new cells

Install Dependencies (Cell 1)

pip install -q groq qdrant-client sentence-transformers scikit-learn kagglehub

Configure API Key (Cell 4)
- Go to Colab → Secrets (🔑 icon on left sidebar)
- Add secret: GROQ_API_KEY = your_api_key_here
Run All Cells
- Runtime → Run all
- Wait for dataset download (~2-3 minutes first time)
- Pipeline will automatically initialize
Query the System (Cell 12)
- Modify test_queries list
- Run to get answers with citations

Tech Stack & Concepts

Core Framework: Agentic RAG, Graph-Enhanced Retrieval (GraphRAG)
LLMs & Inference: Llama 3.3 70B, Groq API
Vector Database: Qdrant (In-memory)
Reasoning Implementation: Chain-of-Thought (CoT), Self-RAG, NLI Verification
Graph Processing: NetworkX, Semantic Graph Construction
Libraries: sentence-transformers, scikit-learn, kagglehub

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                    USER QUERY                                │
└─────────────────┬───────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 1: RETRIEVAL PLANNING                                 │
│ ├─ Query Analysis (LLM)                                     │
│ ├─ Sub-query Generation                                     │
│ └─ Strategy Selection (sequential/parallel)                 │
└─────────────────┬───────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 2: GRAPH-ENHANCED RETRIEVAL                           │
│ ├─ Vector Search (Qdrant + SentenceTransformer)            │
│ ├─ Query Expansion                                          │
│ ├─ Cross-Encoder Reranking                                  │
│ ├─ Knowledge Graph Walk (optional)                          │
│ └─ NLI Verification (optional)                              │
└─────────────────┬───────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 3: KNOWLEDGE AGGREGATION                              │
│ ├─ Multi-document Synthesis                                 │
│ ├─ Chain-of-Thought Reasoning (LLM)                         │
│ ├─ Confidence Scoring                                       │
│ └─ Progressive Outline Building                             │
└─────────────────┬───────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 4: PROBABILISTIC FUSION                               │
│ ├─ Weighted Answer Merging                                  │
│ ├─ Confidence-based Filtering                               │
│ └─ Beam Search Aggregation (LLM)                             │
└─────────────────┬───────────────────────────────────────────┘
                  │
                  ▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 5: ANSWER GENERATION & VERIFICATION                   │
│ ├─ Final Answer Synthesis (LLM)                             │
│ ├─ Citation Formatting                                      │
│ ├─ Self-RAG Reflection                                      │
│ └─ Quality Validation                                       │
└─────────────────┬───────────────────────────────────────────┘
                  │
                  ▼
              FINAL ANSWER

Section 3 Compliance

Section 3 Concept	Implemented?	Evidence in Codebase
Pre-Retrieval Reasoning	Yes	`AdvancedRetrievalPlanner` class breaks queries into sub-queries and decides on a strategy (Sequential vs. Parallel).
Reasoning-During-Retrieval	Yes	`KnowledgeGraphBuilder` and `retrieve_with_graph_reasoning` allow the system to "walk" the graph to find non-obvious related documents.
Post-Retrieval Reasoning	Yes	`DeepVerificationEngine` performs NLI (Natural Language Inference) to verify that retrieved documents logically entail the hypothesis.
Reasoning-Enhanced Generation	Yes	`ProbabilisticFusionEngine` and the final synthesis step use Chain-of-Thought (CoT) and Self-Reflection (checking if the answer is "supported" and "relevant") before outputting the result.

Core Components

1. Data Structures (Cell 5)

`ReasoningType` (Enum)

Defines different reasoning strategies:

CHAIN_OF_THOUGHT - Sequential step-by-step reasoning
GRAPH_OF_THOUGHT - Multi-path graph exploration
SELF_REFLECTION - Answer validation and critique
NLI_VERIFICATION - Natural Language Inference checking
PROBABILISTIC_FUSION - Weighted answer combination

`ReasoningStep` (Dataclass)

Tracks each reasoning iteration:

- step_id: Unique identifier
- query: Current sub-query
- reasoning_type: Strategy used
- retrieved_docs: Source documents
- synthesized_fact: Generated answer
- confidence: Score (0.0-1.0)
- supporting_evidence: Text snippets
- verification_score: NLI score
- graph_path: Navigation history

`RetrievalPlan` (Dataclass)

Query decomposition strategy:

- query: Original question
- sub_queries: Broken down questions
- retrieval_strategy: "sequential" or "parallel"
- expected_hops: Graph traversal depth
- reasoning_path: Step descriptions

2. Knowledge Graph Builder (Cell 6)

Purpose: Creates semantic connections between documents

Key Functions:

`extract_entities(text)`

Input: Document text
Output: List of capitalized entities (names, terms)
Method: Simple pattern matching (capitalization + length > 3)

`build_graph_from_documents(documents)`

Process:
1. Extract entities from each document
2. Generate embeddings using SentenceTransformer
3. Create nodes with content, entities, embeddings
4. Build edges between nodes sharing entities
Output: NetworkX directed graph

`walk_on_graph(start_nodes, max_hops=3)`

Purpose: Multi-hop document discovery
Algorithm:
1. Start from initial retrieved documents
2. Explore neighbors up to max_hops away
3. Prioritize high-weight connections (entity overlap)
Use Case: Finding related information not in initial search

Example:

Query: "H. pylori and cancer"
Initial Doc: "H. pylori causes infection"
Graph Walk → Finds: "Infections lead to cancer risk"

3. Retrieval Planner (Cell 7)

Purpose: Decomposes complex queries into sub-queries

Class: AdvancedRetrievalPlanner

`create_retrieval_plan(query)`

Process:
1. Send query to LLM (Llama 3.3 70B)
2. Request JSON structured plan
3. Parse sub-queries, strategy, expected hops
Fallback: Single-query direct retrieval

Example Output:

{
  "sub_queries": [
    "What is H. pylori?",
    "How does H. pylori cause cancer?"
  ],
  "retrieval_strategy": "sequential",
  "expected_hops": 2,
  "reasoning_path": ["Define pathogen", "Explain mechanism"]
}

4. Verification Engine (Cell 8)

Purpose: Validates answer accuracy and relevance

Class: DeepVerificationEngine

`verify_entailment(premise, hypothesis)`

Input:
- Premise: Retrieved document text
- Hypothesis: Query or claim
Process: LLM checks if premise supports hypothesis

Output:

{
  "entails": true/false,
  "score": 0.0-1.0,
  "label": "entailment/neutral/contradiction"
}

`self_rag_reflect(generated_text, evidence)`

Purpose: Self-critique mechanism from Self-RAG paper
Checks:
1. Relevant: Does answer address query?
2. Supported: Is answer backed by evidence?
3. Useful: Is answer informative?
Special Case: Recognizes "not found" statements as valid
Output: Scores + list of issues

5. Fusion Engine (Cell 9)

Purpose: Combines multiple answers into coherent response

Class: ProbabilisticFusionEngine

`beam_aggregate(sub_queries, answers)`

Algorithm:
1. Filter answers by confidence threshold (>0.3)
2. Weight by confidence scores
3. Send to LLM for fusion
Output: Combined answer with overall confidence

`progressive_aggregation(reasoning_steps)`

Purpose: Build hierarchical outline
Process: Extract key points from each step
Output: Structured outline with confidence per point

6. Hybrid Retriever (Cell 10)

Purpose: Core retrieval system with multiple strategies

Class: EnhancedHybridRetriever

Models Used:

Embedder: all-MiniLM-L6-v2 (384 dimensions)
Reranker: cross-encoder/ms-marco-MiniLM-L-6-v2
Vector DB: Qdrant (in-memory)

`retrieve_with_graph_reasoning(query, top_k, use_graph)`

Step-by-Step Process:

Query Expansion

query_expanded = f"{query} research study evidence"

Adds context terms to improve recall

Vector Search

search_results = qdrant.query_points(
    query=query_vector,
    limit=20  # Over-retrieve for reranking
)

Cosine similarity search
Returns 20 candidates

Cross-Encoder Reranking
```
pairs = [[query, text] for text in candidates]
rerank_scores = reranker.predict(pairs)
```
- More accurate than embeddings
- Scores each query-document pair
- Sorts by rerank score

Graph Enhancement (optional)

start_nodes = [top_3_results]
expanded = kg_builder.walk_on_graph(start_nodes)

Adds related documents
Increases diversity

Final Selection
- Returns top_k documents
- Includes rerank scores

7. Complete Pipeline (Cell 11)

Class: CompleteReasoningRAGPipeline

Main Method: process_query_with_all_techniques(query, enable_graph, enable_nli)

STAGE 1: Retrieval Planning

plan = self.planner.create_retrieval_plan(query)

Actions:

LLM analyzes query complexity
Generates sub-queries if needed
Determines retrieval strategy

Output Example:

Strategy: sequential
Sub-queries: 1

STAGE 2: Graph-Enhanced Retrieval

For each sub-query:

retrieved = self.retriever.retrieve_with_graph_reasoning(
    sub_query, top_k=5, use_graph=True
)

Actions:

Vector search (20 candidates)
Rerank to top 5
Optional: NLI verification of top result
Create ReasoningStep with:
- Retrieved documents
- Synthesized fact (via _synthesize_with_cot)
- Confidence score
- Supporting evidence

STAGE 3: Knowledge Aggregation

progressive_result = self.fusion_engine.progressive_aggregation(
    reasoning_steps
)

Actions:

Build outline from all steps
Calculate completeness (% of steps processed)
Prepare for fusion

Output:

{
  "outline": [
    {"point": "H. pylori is a pathogen...", "confidence": 0.8},
    {"point": "It causes cancer by...", "confidence": 0.7}
  ],
  "completeness": 1.0
}

STAGE 4: Probabilistic Fusion

fused = self.fusion_engine.beam_aggregate(
    plan.sub_queries, answers
)

Actions:

Weight answers by confidence
Filter low-confidence (<0.3)
LLM combines into coherent text
Calculate fusion confidence

Output:

{
  "fused_answer": "Combined text...",
  "confidence": 0.6
}

STAGE 5: Final Answer Generation

final_answer = self._generate_final_answer(
    query, reasoning_steps, fused
)

Process:

Source Collection

all_sources = []
for step in reasoning_steps:
    for doc in step.retrieved_docs[:2]:
        all_sources.append(doc['text'][:400])

Prompt Construction

Question: {query}
Sources: [Source 1], [Source 2]...

Instructions:
- Use ONLY provided sources
- Cite as [1], [2], etc.
- State if information not found

LLM Generation
- Model: Llama 3.3 70B
- Temperature: 0.1 (deterministic)
- Max tokens: 600

Quality Validation

if 'not found' in answer:
    confidence = 0.2  # Lower for missing info
elif not sources_support_answer:
    confidence = 0.3  # Penalize unsupported claims

STAGE 6: Self-RAG Reflection

reflection = self.verifier.self_rag_reflect(
    final_answer, all_evidence
)

Checks:

✅ Relevant: Answer addresses query?
✅ Supported: Claims backed by evidence?
✅ Useful: Provides value to user?

Output:

{
  "relevant": false,
  "supported": false,
  "useful": true,
  "overall_score": 0.3,
  "issues": ["Missing key mechanism details"]
}

📚 Dataset & Indexing (Cell 11)

SciFact Dataset

Source: Kaggle (via kagglehub)
Size: 5,183 scientific abstracts
Domain: Biomedical research papers
Format: CSV with columns: doc_id, title, abstract, structured

Text Cleaning

def clean_text(text):
    if not text or text == 'nan':
        return ''
    text = ' '.join(text.split())  # Remove extra whitespace
    if len(text) < 100:  # Filter very short texts
        return ''
    return text

Indexing Process

Initialize Qdrant
```
qdrant_client = QdrantClient(":memory:")
```
- In-memory vector database
- Fast for <10K documents

Create Collection

vectors_config = VectorParams(
    size=384,  # MiniLM embedding dimension
    distance=Distance.COSINE
)

Generate Embeddings

for doc in corpus:
    text = clean_text(doc['abstract'])
    vector = embedding_model.encode(text)
    qdrant.upload_point(id, vector, payload)

Build Knowledge Graph
- Scrolls through first 500 documents
- Extracts entities
- Creates graph edges

Final Stats:

✅ Indexed 5,183 documents
✅ Graph: 500 nodes, 79 edges

Query Execution Flow

Example: "How does cagPAI-positive H. pylori affect AID expression?"

1. Planning (1 API call)

→ LLM: Analyze query
← Response: Single complex query, sequential strategy

2. Retrieval

→ Vector DB: Search for "cagPAI H. pylori AID expression research study evidence"
← Results: 20 candidates
→ Reranker: Score all pairs
← Top 5 documents (including target paper at index 3354)

3. NLI Verification (1 API call)

→ LLM: Does top document support query?
← Score: 0.50 (moderate support)

4. Synthesis (1 API call)

→ LLM: Answer using these 3 sources
← "cagPAI-positive H. pylori induces aberrant AID expression via IκB kinase"
   Confidence: 0.6

5. Fusion (1 API call)

→ LLM: Combine answers (only 1 sub-query here, so pass-through)
← Fused answer with 0.6 confidence

6. Final Answer (1 API call)

→ LLM: Generate answer with citations from 5 sources
← "Infection with cagPAI-positive H. pylori induces aberrant expression 
    of activation-induced cytidine deaminase (AID) via the IκB kinase [1]."
   Confidence: 1.0

7. Reflection (1 API call)

→ LLM: Validate answer against evidence
← Relevant: False, Supported: False, Score: 0.3
   (Stricter than actual quality - known issue)

Total: 6 API calls, ~3,400 tokens

Key Algorithms & Strategies

1. Query Expansion

query_expanded = f"{query} research study evidence"

Why: Improves recall by adding context
Tradeoff: May reduce precision
Solution: Reranking filters noise

2. Two-Stage Retrieval

Stage 1: Fast vector search (retrieve 20)
Stage 2: Slow cross-encoder rerank (score all, keep 5)

Benefits: Best of both worlds - speed + accuracy
Cost: 20 reranker inferences per query

3. Temperature Control

# Planning: temperature=0.2 (creative decomposition)
# Synthesis: temperature=0.1 (faithful to sources)
# Fusion: temperature=0.2 (balanced combination)

4. Confidence Cascading

Retrieval score (0-1)
  → Synthesis confidence (0-1)
    → Fusion confidence (0-1)
      → Final confidence (0-1)

Lower scores propagate downstream

5. Progressive Outlining

Builds answer incrementally
Each step adds to outline
Tracks completeness metric

Performance Metrics

Accuracy Indicators

Confidence Score
- Range: 0.0 - 1.0
- Source: LLM self-assessment
- High (>0.8): Strong source support
- Low (<0.3): Weak/no evidence
Self-RAG Score
- Range: 0.0 - 1.0
- Components: Relevant + Supported + Useful
- Known Issue: Often underestimates quality (30% for correct answers)
Completeness
- Percentage of sub-queries answered
- Always 100% for single-query plans

Resource Usage

API Calls: 5-6 per query (simple), 15-20 (complex multi-step)
Tokens: ~3,000-5,000 per query
Time: 5-10 seconds per query
Memory: ~2GB for full dataset in RAM

Configuration Options

When Calling Pipeline

result = rag_pipeline.process_query_with_all_techniques(
    query="Your question",
    enable_graph=True,   # Use knowledge graph walking
    enable_nli=False     # Skip NLI verification (saves API calls)
)

Adjustable Parameters

In EnhancedHybridRetriever:

top_k=5           # Number of documents to retrieve
limit=20          # Candidates before reranking
max_hops=3        # Graph walk depth

In CompleteReasoningRAGPipeline:

temperature=0.1   # LLM creativity (lower = more deterministic)
max_tokens=600    # Answer length limit

In APICounter:

max_calls=100     # Stop after N API calls

Known Issues & Limitations

1. Low Self-RAG Scores

Issue: Reflection often marks correct answers as unsupported (30%)
Cause: Overly strict verification prompts
Impact: Misleading quality metric
Solution: Ignore Self-RAG score, trust Confidence instead

2. Graph Building Limited

Issue: Only first 500 docs used for graph
Cause: Performance optimization
Impact: May miss distant connections
Solution: Increase scroll limit=500 to limit=2000

3. Entity Extraction Naive

Issue: Simple capitalization-based
Cause: No NER model used
Impact: Misses lowercase entities, over-includes
Solution: Use spaCy or Transformers NER

4. No Caching

Issue: Same query reprocesses everything
Cause: No cache implementation
Impact: Wastes API calls
Solution: Add LRU cache for embeddings/answers

Advanced Customization

Add New Query Type

# In AdvancedRetrievalPlanner.create_retrieval_plan()

if "compare" in query.lower():
    return RetrievalPlan(
        sub_queries=[
            f"What is {entity1}?",
            f"What is {entity2}?",
            f"Compare {entity1} and {entity2}"
        ],
        retrieval_strategy="parallel",
        expected_hops=2
    )

Change Reranking Model

# In EnhancedHybridRetriever.__init__()

self.reranker = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-6')
# Faster, slightly less accurate

Add BM25 Sparse Retrieval

from rank_bm25 import BM25Okapi

# Tokenize corpus
tokenized = [doc.split() for doc in corpus]
bm25 = BM25Okapi(tokenized)

# Hybrid retrieval
vector_scores = qdrant.query(...)
bm25_scores = bm25.get_scores(query.split())
combined = 0.7*vector_scores + 0.3*bm25_scores

References & Citations

Papers Implemented:

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs (arXiv:2507.09477)
- Note: The full PDF (2507.09477v2.pdf) is included in this repository.
Self-RAG: Self-Reflective Retrieval-Augmented Generation
Graph-of-Thought: Graph-Based Reasoning for Complex Questions
Fusion-in-Decoder: Probabilistic Answer Fusion

Models Used:

Llama 3.3 70B (via GROQ API)
all-MiniLM-L6-v2 (Sentence Transformers)
ms-marco-MiniLM-L-6-v2 (Cross-Encoder)

Libraries:

Qdrant: Vector database
NetworkX: Graph operations
SentenceTransformers: Embeddings

🤝 Contributing & Support

Common Issues:

"API Key not found"
- Add GROQ_API_KEY to Colab secrets
"Collection not found"
- Run Cell 11 (dataset loading) before queries
"Out of memory"
- Reduce corpus_subset size from 5183 to 1000
"Answers are incorrect"
- Check if target document is in indexed subset
- Increase corpus_subset size
- Lower clean_text threshold to 50 chars

Example Usage

# Simple query
result = rag_pipeline.process_query_with_all_techniques(
    query="What causes gastric cancer?",
    enable_graph=False,  # Faster
    enable_nli=False     # Fewer API calls
)

print(result['final_answer']['answer'])
# → "Infection with Helicobacter pylori is a risk factor..."

# Complex multi-hop query
result = rag_pipeline.process_query_with_all_techniques(
    query="Explain the molecular pathway from H. pylori infection to cancer",
    enable_graph=True,   # Multi-hop reasoning
    enable_nli=True      # Verify each step
)

print(result['reflection']['overall_score'])
# → 0.3 (ignore this, it's often wrong)

print(result['final_answer']['confidence'])
# → 0.8 (trust this instead)

Learning Outcomes

After using this pipeline, you understand:

Modern RAG architecture (5-stage)
Vector databases (Qdrant)
Semantic search (embeddings + reranking)
Knowledge graphs for retrieval
LLM prompting strategies
Answer fusion techniques
Self-critique mechanisms
Citation generation

Version: 1.0
Last Updated: January 2026
License: MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
2507.09477v2.pdf		2507.09477v2.pdf
LICENSE		LICENSE
Reasoning_Enhanced_RAG.ipynb		Reasoning_Enhanced_RAG.ipynb
readme.md		readme.md

Folders and files

Latest commit

History

Repository files navigation

Reasoning-Enhanced RAG Pipeline

Quick Start

Prerequisites

Installation & Setup

Tech Stack & Concepts

System Architecture

Section 3 Compliance

Core Components

1. Data Structures (Cell 5)

ReasoningType (Enum)

ReasoningStep (Dataclass)

RetrievalPlan (Dataclass)

2. Knowledge Graph Builder (Cell 6)

extract_entities(text)

build_graph_from_documents(documents)

walk_on_graph(start_nodes, max_hops=3)

3. Retrieval Planner (Cell 7)

create_retrieval_plan(query)

4. Verification Engine (Cell 8)

verify_entailment(premise, hypothesis)

self_rag_reflect(generated_text, evidence)

5. Fusion Engine (Cell 9)

beam_aggregate(sub_queries, answers)

progressive_aggregation(reasoning_steps)

6. Hybrid Retriever (Cell 10)

retrieve_with_graph_reasoning(query, top_k, use_graph)

7. Complete Pipeline (Cell 11)

STAGE 1: Retrieval Planning

STAGE 2: Graph-Enhanced Retrieval

STAGE 3: Knowledge Aggregation

STAGE 4: Probabilistic Fusion

STAGE 5: Final Answer Generation

STAGE 6: Self-RAG Reflection

📚 Dataset & Indexing (Cell 11)

SciFact Dataset

Text Cleaning

Indexing Process

Query Execution Flow

Example: "How does cagPAI-positive H. pylori affect AID expression?"

Key Algorithms & Strategies

1. Query Expansion

2. Two-Stage Retrieval

3. Temperature Control

4. Confidence Cascading

5. Progressive Outlining

Performance Metrics

Accuracy Indicators

Resource Usage

Configuration Options

When Calling Pipeline

Adjustable Parameters

Known Issues & Limitations

1. Low Self-RAG Scores

2. Graph Building Limited

3. Entity Extraction Naive

4. No Caching

Advanced Customization

Add New Query Type

Change Reranking Model

Add BM25 Sparse Retrieval

References & Citations

🤝 Contributing & Support

Example Usage

Learning Outcomes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`ReasoningType` (Enum)

`ReasoningStep` (Dataclass)

`RetrievalPlan` (Dataclass)

`extract_entities(text)`

`build_graph_from_documents(documents)`

`walk_on_graph(start_nodes, max_hops=3)`

`create_retrieval_plan(query)`

`verify_entailment(premise, hypothesis)`

`self_rag_reflect(generated_text, evidence)`

`beam_aggregate(sub_queries, answers)`

`progressive_aggregation(reasoning_steps)`

`retrieve_with_graph_reasoning(query, top_k, use_graph)`

Packages