Unseen. Unheard. Unstoppable.
Your shadow ally in digital accessibility and AI research.
- π― Overview
- β¨ Features
- π Academic & Research Applications
- π¬ Technical Architecture
- π¦ Installation
- π Usage Guide
- β¨οΈ Keyboard Shortcuts
- π‘οΈ Security & Privacy Considerations
- π System Requirements
- π€ Contributing
- π License
- π Acknowledgments
GhostMentor is an innovative AI-powered accessibility and research tool designed to assist users in educational environments, accessibility testing, and assistive technology development. Built with cutting-edge technologies including Google Gemini API, Faster-Whisper, OpenCV, and Pygame, GhostMentor represents a significant advancement in real-time AI assistance systems.
GhostMentor serves multiple legitimate purposes in the academic and research community:
| Purpose | Description |
|---|---|
| π§ͺ Accessibility Research | Investigating screen capture exclusion technologies for assistive applications |
| π Educational Technology | Developing AI-powered tutoring and learning assistance systems |
| π¬ Human-Computer Interaction | Researching novel HUD interfaces and overlay technologies |
| βΏ Assistive Technology | Creating tools for users with visual or cognitive impairments |
| π AI Research | Exploring multimodal AI integration (vision + speech + text) |
β οΈ Important Notice: This tool is intended exclusively for educational, research, accessibility testing, and ethical development purposes. Users are responsible for ensuring compliance with all applicable laws, regulations, and institutional policies.
GhostMentor leverages the powerful Google Gemini API to analyze screen content in real-time, providing instant insights and assistance:
- Intelligent Image Recognition: Captures and processes screen content using advanced computer vision
- Context-Aware Responses: AI understands the context of your work and provides relevant assistance
- Streaming Output: Real-time response streaming for immediate feedback
Powered by Faster-Whisper for accurate and efficient speech transcription:
- Low-Latency Processing: Optimized for real-time transcription with minimal delay
- Multi-Language Support: Supports multiple languages with high accuracy
- Noise Resilience: Advanced algorithms handle noisy environments effectively
- Beam Search Decoding: Ensures accurate transcription even with complex speech patterns
A modern, non-intrusive interface that integrates seamlessly with your workflow:
- Always-On-Top Design: Stays visible without disrupting your primary tasks
- Scrollable Content: Review extensive AI responses with ease
- Minimal Visual Footprint: Designed to be helpful without being distracting
- Customizable Appearance: Adapt the interface to your preferences
GhostMentor incorporates privacy-focused design principles:
- Local Processing: Audio and image processing happens on your machine
- No Data Storage: Transient processing without permanent data retention
- API Security: Secure communication with AI services via encrypted connections
GhostMentor provides an excellent platform for researching:
| Research Area | Application |
|---|---|
| Adaptive Learning Systems | Study how AI assistance affects learning outcomes |
| Real-Time Tutoring | Develop and test AI-powered tutoring methodologies |
| Cognitive Load Theory | Research optimal information presentation in HUDs |
| Multimodal Interaction | Explore combining voice, vision, and text inputs |
Essential tool for accessibility researchers and developers:
- Screen Reader Compatibility Testing: Test how applications behave with various assistive technologies
- Visual Impairment Simulation: Understand user experiences with limited visual access
- Assistive Technology Development: Create tools for users with disabilities
- WCAG Compliance Research: Investigate accessibility standard implementation
Perfect for HCI researchers investigating:
- Attention Management: How overlay interfaces affect user focus
- Information Density: Optimal information presentation in limited screen space
- Multimodal Interfaces: Combining visual and auditory feedback channels
- Context-Aware Computing: Systems that adapt to user context and needs
Valuable for software testing professionals:
- UI/UX Testing: Automated interface analysis and feedback
- Accessibility Auditing: Identify accessibility issues in applications
- Cross-Platform Compatibility: Test behavior across different environments
- Documentation Generation: Automated documentation from visual analysis
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GhostMentor Architecture β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Screen β β Audio β β Keyboard β β
β β Capture β β Capture β β Handler β β
β β (OpenCV) β β (PyAudio) β β (Keyboard) β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Processing Core β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Image β β Speech β β Event β β β
β β β Encoding β β Transcribe β β Handling β β β
β β β (PNG) β β (Whisper) β β (Async) β β β
β β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β β
β β β β β β β
β β ββββββββββββββββββΌβββββββββββββββββ β β
β β βΌ β β
β β βββββββββββββββββββββββ β β
β β β Prompt Builder β β β
β β ββββββββββββ¬βββββββββββ β β
β βββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββ β
β β Gemini API Call β β
β β (Streaming Response) β β
β ββββββββββββ¬βββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββ β
β β HUD Display β β
β β (Pygame Window) β β
β β Transparent Overlay β β
β βββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
GhostMentor uses Google's Gemini API for multimodal AI processing:
import google.generativeai as genai
# Image processing pipeline
img_array = np.array(image)
img_rgb = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)
_, buffer = cv2.imencode(".png", img_rgb)
img_bytes = buffer.tobytes()
# API call with streaming
response = model.generate_content(
[{"mime_type": "image/png", "data": img_bytes}, prompt],
stream=True
)Efficient speech-to-text processing:
from faster_whisper import WhisperModel
# Initialize model with optimizations
whisper_model = WhisperModel(
"base",
device="cpu",
compute_type="int8"
)
# Audio processing pipeline
audio_np = np.frombuffer(data, dtype=np.int16).astype(np.float32) / 32768.0
segments, info = whisper_model.transcribe(
full_audio,
beam_size=5,
language="en"
)Efficient screen capture with OpenCV and PIL:
from PIL import ImageGrab
import cv2
import numpy as np
# Optimized capture
image = ImageGrab.grab()
img_array = np.array(image)
img_rgb = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)For privacy-preserving display technology research:
import ctypes
import win32gui
import win32con
# Window display configuration
WDA_EXCLUDEFROMCAPTURE = 0x00000011
ctypes.windll.user32.SetWindowDisplayAffinity(hwnd, WDA_EXCLUDEFROMCAPTURE)
win32gui.SetWindowPos(hwnd, win32con.HWND_TOPMOST, 100, 100, 800, 200, 0)- Python 3.8+ (Python 3.10 recommended)
- Windows Operating System (primary platform)
- Google Gemini API Key (Get it here)
- Clone the Repository
git clone https://github.com/maruf009sultan/GhostMentor.git
cd GhostMentor- Create Virtual Environment (Recommended)
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate- Install Dependencies
pip install -r requirements.txtOr install manually:
pip install numpy opencv-python pillow google-generativeai pygame pyaudio faster-whisper keyboard pywin32- Configure API Key
Open ghostmentor.py and add your Gemini API key:
API_KEY = "your-gemini-api-key-here"π Security Note: For production use, consider using environment variables or a secure configuration file instead of hardcoding your API key.
- Run GhostMentor
# Full mode (voice-enabled)
python ghostmentor.py -f
# Silent mode (text-only)
python ghostmentor.py -sGhostMentor/
βββ π ghostmentor.py # Main application (voice-enabled)
βββ π gm_unethical.py # Stealth research module
βββ π requirements.txt # Python dependencies
βββ π LICENSE.md # GhostMentor Shadow License
βββ π README.md # This documentation
βββ π¬ GhostMentor original.mp4 # Demo video
GhostMentor offers two primary operating modes:
Voice-enabled operation with complete functionality:
python ghostmentor.py -fFeatures:
- β Screen capture and analysis
- β Voice input processing
- β Real-time HUD display
- β Speech transcription
Best For:
- Accessibility research
- Multimodal interaction studies
- Voice-controlled assistance development
Text-only operation for focused analysis:
python ghostmentor.py -sFeatures:
- β Screen capture and analysis
- β Real-time HUD display
- β Voice input disabled
Best For:
- Quiet environments
- Text-based research
- Minimal system resource usage
Use GhostMentor as a study companion:
1. Open your study materials (PDFs, websites, etc.)
2. Launch GhostMentor
3. Press Ctrl+H to capture the screen
4. Press Ctrl+Enter to get AI assistance
Test application accessibility features:
1. Launch the target application
2. Run GhostMentor alongside it
3. Analyze how information is presented
4. Document accessibility improvements
Integrate into your research workflow:
1. Configure your research parameters
2. Use GhostMentor for data collection
3. Analyze AI responses for patterns
4. Document findings for publication
| Shortcut | Action | Description |
|---|---|---|
Ctrl + H |
πΈ Screenshot | Capture current screen for analysis |
Ctrl + Enter |
β‘ Analyze | Send captured content to Gemini API |
Ctrl + G |
π Reset | Clear transcript and reset history |
Alt + F4 |
πͺ Exit | Close GhostMentor immediately |
GhostMentor is designed with privacy in mind:
| Aspect | Implementation |
|---|---|
| Local Processing | Screen and audio processing occurs locally |
| No Persistent Storage | Data is not stored permanently |
| API Security | Encrypted communication with Gemini API |
| User Control | User initiates all captures and analysis |
DO:
- β Use for legitimate educational purposes
- β Employ in accessibility research
- β Utilize for software testing and development
- β Follow institutional and organizational policies
- β Obtain proper permissions when required
DON'T:
- β Use to violate terms of service of any platform
- β Employ for academic dishonesty or cheating
- β Use to bypass security measures unauthorized
- β Violate privacy rights of others
- β Engage in any illegal activities
When using GhostMentor in academic or institutional settings:
- Review Policies: Check your institution's policies on AI assistance tools
- Obtain Approval: Secure necessary approvals from ethics committees
- Document Usage: Maintain records of research usage
- Follow Guidelines: Adhere to field-specific ethical guidelines
| Component | Requirement |
|---|---|
| OS | Windows 10/11 |
| Python | 3.8 or higher |
| RAM | 4 GB minimum |
| Storage | 500 MB free space |
| Network | Internet connection for API |
| Component | Recommendation |
|---|---|
| OS | Windows 11 |
| Python | 3.10+ |
| RAM | 8 GB or more |
| CPU | Multi-core processor |
| Network | Stable broadband connection |
numpy>=1.21.0
opencv-python>=4.5.0
pillow>=8.0.0
google-generativeai>=0.1.0
pygame>=2.0.0
pyaudio>=0.2.11
faster-whisper>=0.9.0
keyboard>=0.13.0
pywin32>=300
We welcome contributions from the research and development community!
-
Fork the Repository
git fork https://github.com/maruf009sultan/GhostMentor.git
-
Create a Feature Branch
git checkout -b feature/your-feature-name
-
Make Your Changes
- Follow Python best practices
- Add appropriate documentation
- Include tests where applicable
-
Submit a Pull Request
- Describe your changes clearly
- Reference any related issues
- Ensure CI passes
- π Code Style: Follow PEP 8 guidelines
- π Documentation: Update README for new features
- π§ͺ Testing: Include tests for new functionality
- π Security: Report vulnerabilities responsibly
| Area | Needs |
|---|---|
| Documentation | Tutorials, API docs, translations |
| Testing | Unit tests, integration tests |
| Features | Accessibility improvements, UI enhancements |
| Research | Academic papers, use case studies |
GhostMentor is released under the GhostMentor Shadow License (GSL).
See LICENSE.md for the complete license text.
- β Free for educational and research use
- β Open source with attribution requirements
- β Modification allowed with license preservation
- β Commercial use restrictions apply
- β No warranty provided
| Technology | Purpose | Link |
|---|---|---|
| Google Gemini | Multimodal AI processing | AI Studio |
| Faster-Whisper | Speech recognition | GitHub |
| OpenCV | Computer vision | Website |
| Pygame | GUI and display | Website |
| PyAudio | Audio capture | Website |
This project was inspired by the open-source community's ongoing efforts to create accessible, AI-powered educational tools. We thank all contributors and researchers in the field of educational technology and assistive computing.
If you use GhostMentor in your research, please consider citing:
@software{ghostmentor2024,
title = {GhostMentor: An AI-Powered Accessibility and Research Assistant},
author = {maruf009sultan},
year = {2024},
url = {https://github.com/maruf009sultan/GhostMentor},
note = {Educational and research tool for accessibility testing and AI assistance}
}If GhostMentor has been helpful for your research or educational projects, please consider:
- β Starring this repository
- π΄ Forking and contributing
- π’ Sharing with the research community
- π Citing in your publications
Built with β€οΈ for the Research & Education Community
"The code doesn't lie. Neither does GhostMentor."