A sophisticated Retrieval-Augmented Generation (RAG) system that combines graph-based knowledge representation, multi-step reasoning, and advanced verification mechanisms to answer scientific queries with high accuracy.
- Google Colab account (recommended) or local Python 3.8+
- GROQ API key (free at https://console.groq.com)
- GPU recommended but not required
-
Open in Google Colab
- Upload the notebook or create new cells
-
Install Dependencies (Cell 1)
pip install -q groq qdrant-client sentence-transformers scikit-learn kagglehub
-
Configure API Key (Cell 4)
- Go to Colab → Secrets (🔑 icon on left sidebar)
- Add secret:
GROQ_API_KEY=your_api_key_here
-
Run All Cells
- Runtime → Run all
- Wait for dataset download (~2-3 minutes first time)
- Pipeline will automatically initialize
-
Query the System (Cell 12)
- Modify
test_querieslist - Run to get answers with citations
- Modify
- Core Framework: Agentic RAG, Graph-Enhanced Retrieval (GraphRAG)
- LLMs & Inference: Llama 3.3 70B, Groq API
- Vector Database: Qdrant (In-memory)
- Reasoning Implementation: Chain-of-Thought (CoT), Self-RAG, NLI Verification
- Graph Processing: NetworkX, Semantic Graph Construction
- Libraries:
sentence-transformers,scikit-learn,kagglehub
┌─────────────────────────────────────────────────────────────┐
│ USER QUERY │
└─────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 1: RETRIEVAL PLANNING │
│ ├─ Query Analysis (LLM) │
│ ├─ Sub-query Generation │
│ └─ Strategy Selection (sequential/parallel) │
└─────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 2: GRAPH-ENHANCED RETRIEVAL │
│ ├─ Vector Search (Qdrant + SentenceTransformer) │
│ ├─ Query Expansion │
│ ├─ Cross-Encoder Reranking │
│ ├─ Knowledge Graph Walk (optional) │
│ └─ NLI Verification (optional) │
└─────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 3: KNOWLEDGE AGGREGATION │
│ ├─ Multi-document Synthesis │
│ ├─ Chain-of-Thought Reasoning (LLM) │
│ ├─ Confidence Scoring │
│ └─ Progressive Outline Building │
└─────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 4: PROBABILISTIC FUSION │
│ ├─ Weighted Answer Merging │
│ ├─ Confidence-based Filtering │
│ └─ Beam Search Aggregation (LLM) │
└─────────────────┬───────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ STAGE 5: ANSWER GENERATION & VERIFICATION │
│ ├─ Final Answer Synthesis (LLM) │
│ ├─ Citation Formatting │
│ ├─ Self-RAG Reflection │
│ └─ Quality Validation │
└─────────────────┬───────────────────────────────────────────┘
│
▼
FINAL ANSWER
| Section 3 Concept | Implemented? | Evidence in Codebase |
|---|---|---|
| Pre-Retrieval Reasoning | Yes | AdvancedRetrievalPlanner class breaks queries into sub-queries and decides on a strategy (Sequential vs. Parallel). |
| Reasoning-During-Retrieval | Yes | KnowledgeGraphBuilder and retrieve_with_graph_reasoning allow the system to "walk" the graph to find non-obvious related documents. |
| Post-Retrieval Reasoning | Yes | DeepVerificationEngine performs NLI (Natural Language Inference) to verify that retrieved documents logically entail the hypothesis. |
| Reasoning-Enhanced Generation | Yes | ProbabilisticFusionEngine and the final synthesis step use Chain-of-Thought (CoT) and Self-Reflection (checking if the answer is "supported" and "relevant") before outputting the result. |
Defines different reasoning strategies:
CHAIN_OF_THOUGHT- Sequential step-by-step reasoningGRAPH_OF_THOUGHT- Multi-path graph explorationSELF_REFLECTION- Answer validation and critiqueNLI_VERIFICATION- Natural Language Inference checkingPROBABILISTIC_FUSION- Weighted answer combination
Tracks each reasoning iteration:
- step_id: Unique identifier
- query: Current sub-query
- reasoning_type: Strategy used
- retrieved_docs: Source documents
- synthesized_fact: Generated answer
- confidence: Score (0.0-1.0)
- supporting_evidence: Text snippets
- verification_score: NLI score
- graph_path: Navigation historyQuery decomposition strategy:
- query: Original question
- sub_queries: Broken down questions
- retrieval_strategy: "sequential" or "parallel"
- expected_hops: Graph traversal depth
- reasoning_path: Step descriptionsPurpose: Creates semantic connections between documents
Key Functions:
- Input: Document text
- Output: List of capitalized entities (names, terms)
- Method: Simple pattern matching (capitalization + length > 3)
- Process:
- Extract entities from each document
- Generate embeddings using SentenceTransformer
- Create nodes with content, entities, embeddings
- Build edges between nodes sharing entities
- Output: NetworkX directed graph
- Purpose: Multi-hop document discovery
- Algorithm:
- Start from initial retrieved documents
- Explore neighbors up to
max_hopsaway - Prioritize high-weight connections (entity overlap)
- Use Case: Finding related information not in initial search
Example:
Query: "H. pylori and cancer"
Initial Doc: "H. pylori causes infection"
Graph Walk → Finds: "Infections lead to cancer risk"
Purpose: Decomposes complex queries into sub-queries
Class: AdvancedRetrievalPlanner
- Process:
- Send query to LLM (Llama 3.3 70B)
- Request JSON structured plan
- Parse sub-queries, strategy, expected hops
- Fallback: Single-query direct retrieval
Example Output:
{
"sub_queries": [
"What is H. pylori?",
"How does H. pylori cause cancer?"
],
"retrieval_strategy": "sequential",
"expected_hops": 2,
"reasoning_path": ["Define pathogen", "Explain mechanism"]
}Purpose: Validates answer accuracy and relevance
Class: DeepVerificationEngine
- Input:
- Premise: Retrieved document text
- Hypothesis: Query or claim
- Process: LLM checks if premise supports hypothesis
- Output:
{ "entails": true/false, "score": 0.0-1.0, "label": "entailment/neutral/contradiction" }
- Purpose: Self-critique mechanism from Self-RAG paper
- Checks:
- Relevant: Does answer address query?
- Supported: Is answer backed by evidence?
- Useful: Is answer informative?
- Special Case: Recognizes "not found" statements as valid
- Output: Scores + list of issues
Purpose: Combines multiple answers into coherent response
Class: ProbabilisticFusionEngine
- Algorithm:
- Filter answers by confidence threshold (>0.3)
- Weight by confidence scores
- Send to LLM for fusion
- Output: Combined answer with overall confidence
- Purpose: Build hierarchical outline
- Process: Extract key points from each step
- Output: Structured outline with confidence per point
Purpose: Core retrieval system with multiple strategies
Class: EnhancedHybridRetriever
Models Used:
- Embedder:
all-MiniLM-L6-v2(384 dimensions) - Reranker:
cross-encoder/ms-marco-MiniLM-L-6-v2 - Vector DB: Qdrant (in-memory)
Step-by-Step Process:
-
Query Expansion
query_expanded = f"{query} research study evidence"
- Adds context terms to improve recall
-
Vector Search
search_results = qdrant.query_points( query=query_vector, limit=20 # Over-retrieve for reranking )
- Cosine similarity search
- Returns 20 candidates
-
Cross-Encoder Reranking
pairs = [[query, text] for text in candidates] rerank_scores = reranker.predict(pairs)
- More accurate than embeddings
- Scores each query-document pair
- Sorts by rerank score
-
Graph Enhancement (optional)
start_nodes = [top_3_results] expanded = kg_builder.walk_on_graph(start_nodes)
- Adds related documents
- Increases diversity
-
Final Selection
- Returns top_k documents
- Includes rerank scores
Class: CompleteReasoningRAGPipeline
Main Method: process_query_with_all_techniques(query, enable_graph, enable_nli)
plan = self.planner.create_retrieval_plan(query)Actions:
- LLM analyzes query complexity
- Generates sub-queries if needed
- Determines retrieval strategy
Output Example:
Strategy: sequential
Sub-queries: 1
For each sub-query:
retrieved = self.retriever.retrieve_with_graph_reasoning(
sub_query, top_k=5, use_graph=True
)Actions:
- Vector search (20 candidates)
- Rerank to top 5
- Optional: NLI verification of top result
- Create ReasoningStep with:
- Retrieved documents
- Synthesized fact (via
_synthesize_with_cot) - Confidence score
- Supporting evidence
progressive_result = self.fusion_engine.progressive_aggregation(
reasoning_steps
)Actions:
- Build outline from all steps
- Calculate completeness (% of steps processed)
- Prepare for fusion
Output:
{
"outline": [
{"point": "H. pylori is a pathogen...", "confidence": 0.8},
{"point": "It causes cancer by...", "confidence": 0.7}
],
"completeness": 1.0
}fused = self.fusion_engine.beam_aggregate(
plan.sub_queries, answers
)Actions:
- Weight answers by confidence
- Filter low-confidence (<0.3)
- LLM combines into coherent text
- Calculate fusion confidence
Output:
{
"fused_answer": "Combined text...",
"confidence": 0.6
}final_answer = self._generate_final_answer(
query, reasoning_steps, fused
)Process:
-
Source Collection
all_sources = [] for step in reasoning_steps: for doc in step.retrieved_docs[:2]: all_sources.append(doc['text'][:400])
-
Prompt Construction
Question: {query} Sources: [Source 1], [Source 2]... Instructions: - Use ONLY provided sources - Cite as [1], [2], etc. - State if information not found -
LLM Generation
- Model: Llama 3.3 70B
- Temperature: 0.1 (deterministic)
- Max tokens: 600
-
Quality Validation
if 'not found' in answer: confidence = 0.2 # Lower for missing info elif not sources_support_answer: confidence = 0.3 # Penalize unsupported claims
reflection = self.verifier.self_rag_reflect(
final_answer, all_evidence
)Checks:
- ✅ Relevant: Answer addresses query?
- ✅ Supported: Claims backed by evidence?
- ✅ Useful: Provides value to user?
Output:
{
"relevant": false,
"supported": false,
"useful": true,
"overall_score": 0.3,
"issues": ["Missing key mechanism details"]
}- Source: Kaggle (via kagglehub)
- Size: 5,183 scientific abstracts
- Domain: Biomedical research papers
- Format: CSV with columns: doc_id, title, abstract, structured
def clean_text(text):
if not text or text == 'nan':
return ''
text = ' '.join(text.split()) # Remove extra whitespace
if len(text) < 100: # Filter very short texts
return ''
return text-
Initialize Qdrant
qdrant_client = QdrantClient(":memory:")
- In-memory vector database
- Fast for <10K documents
-
Create Collection
vectors_config = VectorParams( size=384, # MiniLM embedding dimension distance=Distance.COSINE )
-
Generate Embeddings
for doc in corpus: text = clean_text(doc['abstract']) vector = embedding_model.encode(text) qdrant.upload_point(id, vector, payload)
-
Build Knowledge Graph
- Scrolls through first 500 documents
- Extracts entities
- Creates graph edges
Final Stats:
✅ Indexed 5,183 documents
✅ Graph: 500 nodes, 79 edges
1. Planning (1 API call)
→ LLM: Analyze query
← Response: Single complex query, sequential strategy
2. Retrieval
→ Vector DB: Search for "cagPAI H. pylori AID expression research study evidence"
← Results: 20 candidates
→ Reranker: Score all pairs
← Top 5 documents (including target paper at index 3354)
3. NLI Verification (1 API call)
→ LLM: Does top document support query?
← Score: 0.50 (moderate support)
4. Synthesis (1 API call)
→ LLM: Answer using these 3 sources
← "cagPAI-positive H. pylori induces aberrant AID expression via IκB kinase"
Confidence: 0.6
5. Fusion (1 API call)
→ LLM: Combine answers (only 1 sub-query here, so pass-through)
← Fused answer with 0.6 confidence
6. Final Answer (1 API call)
→ LLM: Generate answer with citations from 5 sources
← "Infection with cagPAI-positive H. pylori induces aberrant expression
of activation-induced cytidine deaminase (AID) via the IκB kinase [1]."
Confidence: 1.0
7. Reflection (1 API call)
→ LLM: Validate answer against evidence
← Relevant: False, Supported: False, Score: 0.3
(Stricter than actual quality - known issue)
Total: 6 API calls, ~3,400 tokens
query_expanded = f"{query} research study evidence"- Why: Improves recall by adding context
- Tradeoff: May reduce precision
- Solution: Reranking filters noise
Stage 1: Fast vector search (retrieve 20)
Stage 2: Slow cross-encoder rerank (score all, keep 5)
- Benefits: Best of both worlds - speed + accuracy
- Cost: 20 reranker inferences per query
# Planning: temperature=0.2 (creative decomposition)
# Synthesis: temperature=0.1 (faithful to sources)
# Fusion: temperature=0.2 (balanced combination)Retrieval score (0-1)
→ Synthesis confidence (0-1)
→ Fusion confidence (0-1)
→ Final confidence (0-1)
Lower scores propagate downstream
- Builds answer incrementally
- Each step adds to outline
- Tracks completeness metric
-
Confidence Score
- Range: 0.0 - 1.0
- Source: LLM self-assessment
- High (>0.8): Strong source support
- Low (<0.3): Weak/no evidence
-
Self-RAG Score
- Range: 0.0 - 1.0
- Components: Relevant + Supported + Useful
- Known Issue: Often underestimates quality (30% for correct answers)
-
Completeness
- Percentage of sub-queries answered
- Always 100% for single-query plans
- API Calls: 5-6 per query (simple), 15-20 (complex multi-step)
- Tokens: ~3,000-5,000 per query
- Time: 5-10 seconds per query
- Memory: ~2GB for full dataset in RAM
result = rag_pipeline.process_query_with_all_techniques(
query="Your question",
enable_graph=True, # Use knowledge graph walking
enable_nli=False # Skip NLI verification (saves API calls)
)In EnhancedHybridRetriever:
top_k=5 # Number of documents to retrieve
limit=20 # Candidates before reranking
max_hops=3 # Graph walk depthIn CompleteReasoningRAGPipeline:
temperature=0.1 # LLM creativity (lower = more deterministic)
max_tokens=600 # Answer length limitIn APICounter:
max_calls=100 # Stop after N API calls- Issue: Reflection often marks correct answers as unsupported (30%)
- Cause: Overly strict verification prompts
- Impact: Misleading quality metric
- Solution: Ignore Self-RAG score, trust Confidence instead
- Issue: Only first 500 docs used for graph
- Cause: Performance optimization
- Impact: May miss distant connections
- Solution: Increase
scroll limit=500tolimit=2000
- Issue: Simple capitalization-based
- Cause: No NER model used
- Impact: Misses lowercase entities, over-includes
- Solution: Use spaCy or Transformers NER
- Issue: Same query reprocesses everything
- Cause: No cache implementation
- Impact: Wastes API calls
- Solution: Add LRU cache for embeddings/answers
# In AdvancedRetrievalPlanner.create_retrieval_plan()
if "compare" in query.lower():
return RetrievalPlan(
sub_queries=[
f"What is {entity1}?",
f"What is {entity2}?",
f"Compare {entity1} and {entity2}"
],
retrieval_strategy="parallel",
expected_hops=2
)# In EnhancedHybridRetriever.__init__()
self.reranker = CrossEncoder('cross-encoder/ms-marco-TinyBERT-L-6')
# Faster, slightly less accuratefrom rank_bm25 import BM25Okapi
# Tokenize corpus
tokenized = [doc.split() for doc in corpus]
bm25 = BM25Okapi(tokenized)
# Hybrid retrieval
vector_scores = qdrant.query(...)
bm25_scores = bm25.get_scores(query.split())
combined = 0.7*vector_scores + 0.3*bm25_scoresPapers Implemented:
- Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs (arXiv:2507.09477)
- Note: The full PDF (
2507.09477v2.pdf) is included in this repository.
- Note: The full PDF (
- Self-RAG: Self-Reflective Retrieval-Augmented Generation
- Graph-of-Thought: Graph-Based Reasoning for Complex Questions
- Fusion-in-Decoder: Probabilistic Answer Fusion
Models Used:
- Llama 3.3 70B (via GROQ API)
- all-MiniLM-L6-v2 (Sentence Transformers)
- ms-marco-MiniLM-L-6-v2 (Cross-Encoder)
Libraries:
- Qdrant: Vector database
- NetworkX: Graph operations
- SentenceTransformers: Embeddings
Common Issues:
-
"API Key not found"
- Add GROQ_API_KEY to Colab secrets
-
"Collection not found"
- Run Cell 11 (dataset loading) before queries
-
"Out of memory"
- Reduce
corpus_subsetsize from 5183 to 1000
- Reduce
-
"Answers are incorrect"
- Check if target document is in indexed subset
- Increase
corpus_subsetsize - Lower
clean_textthreshold to 50 chars
# Simple query
result = rag_pipeline.process_query_with_all_techniques(
query="What causes gastric cancer?",
enable_graph=False, # Faster
enable_nli=False # Fewer API calls
)
print(result['final_answer']['answer'])
# → "Infection with Helicobacter pylori is a risk factor..."
# Complex multi-hop query
result = rag_pipeline.process_query_with_all_techniques(
query="Explain the molecular pathway from H. pylori infection to cancer",
enable_graph=True, # Multi-hop reasoning
enable_nli=True # Verify each step
)
print(result['reflection']['overall_score'])
# → 0.3 (ignore this, it's often wrong)
print(result['final_answer']['confidence'])
# → 0.8 (trust this instead)After using this pipeline, you understand:
- Modern RAG architecture (5-stage)
- Vector databases (Qdrant)
- Semantic search (embeddings + reranking)
- Knowledge graphs for retrieval
- LLM prompting strategies
- Answer fusion techniques
- Self-critique mechanisms
- Citation generation
Version: 1.0
Last Updated: January 2026
License: MIT