AskMyYouTube is an intelligent Q&A application that allows users to ask questions about YouTube videos and get accurate answers based on the video's transcript. The application leverages advanced AI models to process video transcripts, create searchable knowledge bases, and provide contextually relevant answers to user queries.
- ๐ฌ YouTube Video Processing: Automatically extracts and processes YouTube video transcripts
- ๐ง AI-Powered Q&A: Uses DeepSeek-V3 model for intelligent question answering
- ๐ Smart Retrieval: Implements vector-based semantic search for finding relevant content
- ๐ฌ Interactive Chat Interface: User-friendly Streamlit web application with chat history
- โก Efficient Indexing: Creates and reuses vector stores for faster subsequent queries
- ๐ฏ Context-Aware Responses: Provides answers strictly based on video content
- Frontend: Streamlit
- AI Models:
- DeepSeek-V3-0324 (Chat/Q&A)
- sentence-transformers/all-MiniLM-L6-v2 (Embeddings)
- Vector Database: FAISS
- Text Processing: LangChain
- YouTube Integration: youtube-transcript-api
- Python 3.8 or higher
- Hugging Face API token
- Internet connection for YouTube transcript fetching
git clone https://github.com/yourusername/AskMyYouTube.git
cd AskMyYouTubepython -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activatepip install -r requirements.txtCreate a .env file in the root directory:
HUGGINGFACEHUB_API_TOKEN=your_huggingface_token_here
HF_TOKEN=your_huggingface_token_hereGetting your Hugging Face token:
- Visit Hugging Face
- Sign up/Login to your account
- Go to Settings โ Access Tokens
- Create a new token with "Read" permissions
streamlit run app.pyThe application will be available at http://localhost:8501
- Enter YouTube URL: Paste any YouTube video URL in the input field
- Submit URL: Click "Submit" to process the video transcript
- Ask Questions: Use the chat input to ask questions about the video content
- View Answers: Get AI-powered answers based on the video transcript
- Chat History: Review previous questions and answers in the conversation history
1. URL: https://www.youtube.com/watch?v=dQw4w9WgXcQ
2. Question: "What is the main topic discussed in this video?"
3. AI Response: [Contextual answer based on video transcript]
AskMyYouTube/
โโโ configs/
โ โโโ config.py # Configuration settings
โโโ prompts/
โ โโโ template.txt # Prompt template for AI model
โโโ src/
โ โโโ indexing/
โ โ โโโ chunking.py # Text chunking utilities
โ โ โโโ vector_store.py # Vector database operations
โ โ โโโ youtube_transcript.py # YouTube transcript extraction
โ โโโ ans_generation.py # Answer generation pipeline
โ โโโ models.py # AI model loading functions
โ โโโ pipeline.py # Main processing pipeline
โโโ vector_store/ # Cached vector databases
โโโ app.py # Streamlit application
โโโ requirements.txt # Project dependencies
โโโ .env # Environment variables
โโโ .gitignore # Git ignore rules
โโโ LICENSE # MIT License
โโโ README.md # This file
Customize the application behavior by modifying configs/config.py:
# AI Models
EMBEDDING_MODEL_ID = "sentence-transformers/all-MiniLM-L6-v2"
CHAT_MODEL_ID = "deepseek-ai/DeepSeek-V3-0324"
# Text Processing
CHUNK_SIZE = 1000 # Characters per chunk
CHUNK_OVERLAP = 100 # Overlap between chunks
# AI Parameters
MAX_TOKENS = 200 # Maximum response length
TEMPERATURE = 0.3 # Response creativity (0-1)
# Retrieval
RETRIEVAL_K = 5 # Number of relevant chunks to retrieve- Extracts video ID from YouTube URL
- Fetches transcript using YouTube Transcript API
- Handles various URL formats (
youtu.be/,youtube.com/watch?v=)
- Splits transcript into manageable chunks
- Creates vector embeddings using sentence transformers
- Stores in FAISS vector database for fast retrieval
- Caches vector stores for repeated queries
- Processes user queries through semantic search
- Retrieves most relevant transcript segments
- Generates contextual answers using DeepSeek-V3
- Maintains conversation history
- Reuses existing vector stores for the same video
- Significantly faster response times for repeated queries
- Automatic cache management
- โ Videos with auto-generated captions
- โ Videos with manual captions
- โ Videos with multiple language captions
- โ Videos without any captions
- โ Private or restricted videos
"An error occurred: Try again"
- Video might not have captions available
- Check if the video is public and has captions
- Try a different YouTube video
"Failed to load model"
- Verify your Hugging Face API token
- Check internet connectivity
- Ensure token has proper permissions
Slow initial responses
- First query processes the entire transcript
- Subsequent queries use cached data and are faster
- Processing time depends on video length
- Check the video has captions enabled
- Verify your
.envfile configuration - Ensure all dependencies are installed
- Try with a different YouTube video
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch
git checkout -b feature/amazing-feature
- Make your changes
- Commit your changes
git commit -m 'Add some amazing feature' - Push to the branch
git push origin feature/amazing-feature
- Open a Pull Request
# Install development dependencies
pip install -r requirements.txt
# Run the application in development mode
streamlit run app.py --server.runOnSave trueThis project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for providing the AI models
- Streamlit for the web application framework
- LangChain for the AI pipeline tools
- YouTube Transcript API for transcript extraction
MAHESH KETAM - GitHub Profile
Project Link: https://github.com/yourusername/AskMyYouTube
โญ Star this repository if you find it helpful!