Skip to content

Nirikshan95/AskMyYouTube

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

9 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

AskMyYouTube

Python 3.8+ Streamlit App Hugging Face License: MIT

๐ŸŽฏ Overview

AskMyYouTube is an intelligent Q&A application that allows users to ask questions about YouTube videos and get accurate answers based on the video's transcript. The application leverages advanced AI models to process video transcripts, create searchable knowledge bases, and provide contextually relevant answers to user queries.

โœจ Key Features

  • ๐ŸŽฌ YouTube Video Processing: Automatically extracts and processes YouTube video transcripts
  • ๐Ÿง  AI-Powered Q&A: Uses DeepSeek-V3 model for intelligent question answering
  • ๐Ÿ” Smart Retrieval: Implements vector-based semantic search for finding relevant content
  • ๐Ÿ’ฌ Interactive Chat Interface: User-friendly Streamlit web application with chat history
  • โšก Efficient Indexing: Creates and reuses vector stores for faster subsequent queries
  • ๐ŸŽฏ Context-Aware Responses: Provides answers strictly based on video content

๐Ÿ› ๏ธ Technology Stack

  • Frontend: Streamlit
  • AI Models:
    • DeepSeek-V3-0324 (Chat/Q&A)
    • sentence-transformers/all-MiniLM-L6-v2 (Embeddings)
  • Vector Database: FAISS
  • Text Processing: LangChain
  • YouTube Integration: youtube-transcript-api

๐Ÿ“‹ Prerequisites

  • Python 3.8 or higher
  • Hugging Face API token
  • Internet connection for YouTube transcript fetching

๐Ÿš€ Installation

1. Clone the Repository

git clone https://github.com/yourusername/AskMyYouTube.git
cd AskMyYouTube

2. Create Virtual Environment

python -m venv venv

# On Windows
venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Environment Configuration

Create a .env file in the root directory:

HUGGINGFACEHUB_API_TOKEN=your_huggingface_token_here
HF_TOKEN=your_huggingface_token_here

Getting your Hugging Face token:

  1. Visit Hugging Face
  2. Sign up/Login to your account
  3. Go to Settings โ†’ Access Tokens
  4. Create a new token with "Read" permissions

๐ŸŽฎ Usage

Starting the Application

streamlit run app.py

The application will be available at http://localhost:8501

How to Use

  1. Enter YouTube URL: Paste any YouTube video URL in the input field
  2. Submit URL: Click "Submit" to process the video transcript
  3. Ask Questions: Use the chat input to ask questions about the video content
  4. View Answers: Get AI-powered answers based on the video transcript
  5. Chat History: Review previous questions and answers in the conversation history

Example Workflow

1. URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
2. Question: "What is the main topic discussed in this video?"
3. AI Response: [Contextual answer based on video transcript]

๐Ÿ“ Project Structure

AskMyYouTube/
โ”œโ”€โ”€ configs/
โ”‚   โ””โ”€โ”€ config.py              # Configuration settings
โ”œโ”€โ”€ prompts/
โ”‚   โ””โ”€โ”€ template.txt           # Prompt template for AI model
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ indexing/
โ”‚   โ”‚   โ”œโ”€โ”€ chunking.py        # Text chunking utilities
โ”‚   โ”‚   โ”œโ”€โ”€ vector_store.py    # Vector database operations
โ”‚   โ”‚   โ””โ”€โ”€ youtube_transcript.py # YouTube transcript extraction
โ”‚   โ”œโ”€โ”€ ans_generation.py      # Answer generation pipeline
โ”‚   โ”œโ”€โ”€ models.py              # AI model loading functions
โ”‚   โ””โ”€โ”€ pipeline.py            # Main processing pipeline
โ”œโ”€โ”€ vector_store/              # Cached vector databases
โ”œโ”€โ”€ app.py                     # Streamlit application
โ”œโ”€โ”€ requirements.txt           # Project dependencies
โ”œโ”€โ”€ .env                       # Environment variables
โ”œโ”€โ”€ .gitignore                 # Git ignore rules
โ”œโ”€โ”€ LICENSE                    # MIT License
โ””โ”€โ”€ README.md                  # This file

โš™๏ธ Configuration

Customize the application behavior by modifying configs/config.py:

# AI Models
EMBEDDING_MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"
CHAT_MODEL_ID = "deepseek-ai/DeepSeek-V3-0324"

# Text Processing
CHUNK_SIZE = 1000              # Characters per chunk
CHUNK_OVERLAP = 100            # Overlap between chunks

# AI Parameters
MAX_TOKENS = 200               # Maximum response length
TEMPERATURE = 0.3              # Response creativity (0-1)

# Retrieval
RETRIEVAL_K = 5                # Number of relevant chunks to retrieve

๐Ÿ”ง How It Works

1. Video Processing

  • Extracts video ID from YouTube URL
  • Fetches transcript using YouTube Transcript API
  • Handles various URL formats (youtu.be/, youtube.com/watch?v=)

2. Text Indexing

  • Splits transcript into manageable chunks
  • Creates vector embeddings using sentence transformers
  • Stores in FAISS vector database for fast retrieval
  • Caches vector stores for repeated queries

3. Question Answering

  • Processes user queries through semantic search
  • Retrieves most relevant transcript segments
  • Generates contextual answers using DeepSeek-V3
  • Maintains conversation history

4. Smart Caching

  • Reuses existing vector stores for the same video
  • Significantly faster response times for repeated queries
  • Automatic cache management

๐Ÿ“Š Supported Video Types

  • โœ… Videos with auto-generated captions
  • โœ… Videos with manual captions
  • โœ… Videos with multiple language captions
  • โŒ Videos without any captions
  • โŒ Private or restricted videos

๐Ÿšจ Troubleshooting

Common Issues

"An error occurred: Try again"

  • Video might not have captions available
  • Check if the video is public and has captions
  • Try a different YouTube video

"Failed to load model"

  • Verify your Hugging Face API token
  • Check internet connectivity
  • Ensure token has proper permissions

Slow initial responses

  • First query processes the entire transcript
  • Subsequent queries use cached data and are faster
  • Processing time depends on video length

Getting Help

  1. Check the video has captions enabled
  2. Verify your .env file configuration
  3. Ensure all dependencies are installed
  4. Try with a different YouTube video

๐Ÿค Contributing

Contributions are welcome! Here's how you can help:

  1. Fork the repository
  2. Create a feature branch
    git checkout -b feature/amazing-feature
  3. Make your changes
  4. Commit your changes
    git commit -m 'Add some amazing feature'
  5. Push to the branch
    git push origin feature/amazing-feature
  6. Open a Pull Request

Development Setup

# Install development dependencies
pip install -r requirements.txt

# Run the application in development mode
streamlit run app.py --server.runOnSave true

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ฌ Contact

MAHESH KETAM - GitHub Profile

Project Link: https://github.com/yourusername/AskMyYouTube


โญ Star this repository if you find it helpful!

About

App that answers questions about any YouTube video using its transcript. Built using Retrieval-Augmented Generation (RAG) with LangChain, vector stores, and LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages