Wikipedia RAG System

A fully functional, locally-hosted RAG (Retrieval Augmented Generation) system using Wikipedia as knowledge base, with Ollama for LLM serving and optional MCP for agentic capabilities.

Overview

This project demonstrates a complete end-to-end RAG pipeline that:

Processes Wikipedia dumps into searchable chunks
Generates semantic embeddings for intelligent retrieval
Uses FAISS for fast vector similarity search
Connects to local Ollama LLMs for generating contextual answers
Provides both CLI and web interfaces for interaction

Features

Wikipedia Processing: Parse and clean 262K+ articles from Simple English Wikipedia
Semantic Search: FAISS-powered vector search over 569K text chunks
Local LLM: Ollama integration with Mistral/Llama2 models
Web Interface: Chat UI via OpenWebUI or Flask
Source Attribution: Every answer includes relevant Wikipedia citations
Offline Operation: Complete system runs locally
Memory Efficient: Optimized for 8GB RAM with batch processing

Architecture

Wikipedia XML → Parser → Chunker → Embeddings → FAISS Index
                                                      ↓
                                              RAG Retrieval ← User Query
                                                      ↓
                                            Context Formation
                                                      ↓
                                            Ollama LLM (Mistral)
                                                      ↓
                                              Generated Answer + Sources

What's Done

Data Processing:

Wikipedia parsing: 262,105 articles extracted
Text chunking: 569,456 searchable segments
Embeddings: 384-dim vectors using sentence-transformers
FAISS index: 834MB, cosine similarity search

LLM Integration:

Ollama setup with Mistral 7B model
RAG pipeline connecting search to generation
Context-aware response generation
Source citation system

Interfaces:

Interactive CLI chat
Flask web interface
OpenWebUI Docker integration

Conversation Memory:

Multi-turn dialogue support
Session-based conversation tracking
Context-aware follow-up questions
Persistent conversation storage

Hybrid Search:

BM25 keyword search integration
Reciprocal Rank Fusion (RRF)
Improved retrieval for exact terms and formulas
10-20% better retrieval quality

MCP Agents:

Multi-step planning and execution
4 Wikipedia tools (search, compare, multi-search, summarize)
Automatic mode selection (simple vs agentic)
Tool call tracing and synthesis

Quick Start

What I'm using

MacBook Air M1 (8GB RAM)
Python 3.9+, Docker (for OpenWebUI)
~5GB for Simple Wikipedia, ~150GB for full Wikipedia

Installation

# Clone the repository
git clone https://github.com/bhavyashah10/wikipedia-rag-project.git
cd wikipedia-rag-project

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Test setup
python test_setup.py

Data Processing Pipeline

# 1. Download Simple English Wikipedia
curl -O https://dumps.wikimedia.org/simplewiki/latest/simplewiki-latest-pages-articles.xml.bz2
mv simplewiki-latest-pages-articles.xml.bz2 data/raw/

# 2. Parse articles from XML
python test_parser.py

# 3. Create text chunks
python src/data_processing/text_chunker.py

# 4. Generate embeddings (~)
python src/embeddings/embedding_generator.py

# 5. Build FAISS index 
python src/retrieval/faiss_indexer.py

# 6. Build BM25 index for hybrid search 
python src/retrieval/hybrid_search.py

LLM Setup

# Install Ollama
brew install ollama

# Start Ollama service
brew services start ollama

# Pull Mistral model (~4GB, takes 5-10 minutes)
ollama pull mistral

# Or use Llama2
# ollama pull llama2:7b-chat

Running the System

Option 1: CLI Chat Interface

# Start interactive chat (without memory)
python src/llm_integration/rag_pipeline.py

# Start interactive chat (with conversation memory)
python src/llm_integration/rag_with_memory.py

# Start interactive chat (with hybrid search + memory)
python src/llm_integration/rag_with_hybrid_search.py

# Start interactive chat (with MCP agents - full agentic mode)
python src/mcp_agents/rag_with_mcp.py

# Example queries:
# - "What is artificial intelligence?" (simple)
# - "Compare quantum computing and classical computing" (agentic)
# - "What are the differences between mitochondria and chloroplasts?" (multi-step)

Option 2: OpenWebUI (Recommended)

# Install and run OpenWebUI with Docker
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

# Open browser to http://localhost:3000
# Login and start chatting

Option 3: Flask Web Interface (Optional - Use if not using Option 2)

# Start web server
cd src/llm_integration
python web_interface.py

# Open browser to http://localhost:5000

Project Structure

wikipedia-rag-project/
├── data/
│   ├── raw/                    # Wikipedia XML dumps
│   ├── processed/              # 569K clean chunks (JSON)
│   └── embeddings/             # FAISS index + vectors (1.6GB)
├── src/
│   ├── data_processing/
│   │   ├── wikipedia_parser.py    # XML parsing & cleaning
│   │   └── text_chunker.py        # Semantic text chunking
│   ├── embeddings/
│   │   └── embedding_generator.py # Vector embeddings (sentence-transformers)
│   ├── retrieval/
│   │   ├── faiss_indexer.py       # FAISS index & similarity search
│   │   └── hybrid_search.py       # BM25 + FAISS hybrid search
│   ├── llm_integration/
│   │   ├── rag_pipeline.py        # Complete RAG pipeline
│   │   ├── rag_with_memory.py     # RAG with conversation memory
│   │   ├── rag_with_hybrid_search.py  # RAG with hybrid search + memory
│   │   ├── conversation_memory.py # Conversation management
│   │   ├── web_interface.py       # Flask web server
│   │   └── templates/
│   │       └── chat.html          # Web UI
│   └── mcp_agents/                # MCP agentic layer
│       ├── tools.py               # Wikipedia MCP tools
│       ├── agent.py               # Planning & execution
│       └── rag_with_mcp.py        # RAG with MCP integration
├── config/
│   └── config.yaml             # System configuration
├── logs/                       # Application logs
├── requirements.txt            # Python dependencies
├── test_setup.py              # Setup verification
├── test_parser.py             # Parser testing
└── check_setup.py             # System status check

Configuration

Edit config/config.yaml to customize:

Processing Settings:

processing:
  chunk_size: 1000              # Characters per chunk
  chunk_overlap: 200            # Overlap between chunks
  min_article_length: 100       # Filter short articles

Embedding Settings:

embeddings:
  model_name: "sentence-transformers/all-MiniLM-L6-v2"
  dimension: 384
  normalize: true

RAG Settings:

rag:
  top_k: 5                      # Number of chunks to retrieve
  score_threshold: 0.7          # Minimum similarity score
  max_context_length: 4000      # Max characters in context

LLM Settings:

llm:
  model: "mistral:latest"       # Ollama model to use
  temperature: 0.7              # Response creativity
  max_tokens: 2048              # Max response length

MCP Settings:

mcp:
  enabled: true                 # Enable MCP agents
  max_tool_calls: 5             # Maximum tool iterations
  planning_temperature: 0.3     # Temperature for planning

Troubleshooting

Issue: Ollama connection failed

# Check if Ollama is running
brew services list | grep ollama

# Restart if needed
brew services restart ollama

# Test connection
curl http://localhost:11434/api/tags

Issue: Out of memory during embedding generation

# Edit config.yaml, reduce batch size
processing:
  batch_size: 50  # Reduce from 100

Issue: FAISS index not found

# Rebuild the index
python src/retrieval/faiss_indexer.py

Issue: Slow query responses

# Check if using GPU acceleration
python -c "import torch; print(torch.backends.mps.is_available())"

# Reduce top_k in config to retrieve fewer chunks
rag:
  top_k: 3  # Reduce from 5

Contact

For questions or feedback, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
src		src
.gitignore		.gitignore
README.md		README.md
check_setup.py		check_setup.py
openwebui_wikipedia_rag.py		openwebui_wikipedia_rag.py
requirements.txt		requirements.txt
test_setup.py		test_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia RAG System

Overview

Features

Architecture

What's Done

Quick Start

What I'm using

Installation

Data Processing Pipeline

LLM Setup

Running the System

Option 1: CLI Chat Interface

Option 2: OpenWebUI (Recommended)

Option 3: Flask Web Interface (Optional - Use if not using Option 2)

Project Structure

Configuration

Troubleshooting

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wikipedia RAG System

Overview

Features

Architecture

What's Done

Quick Start

What I'm using

Installation

Data Processing Pipeline

LLM Setup

Running the System

Option 1: CLI Chat Interface

Option 2: OpenWebUI (Recommended)

Option 3: Flask Web Interface (Optional - Use if not using Option 2)

Project Structure

Configuration

Troubleshooting

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages