Skip to content

maruf009sultan/GhostMentor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‘» GhostMentor

The AI-Powered Accessibility Assistant

Unseen. Unheard. Unstoppable.

Your shadow ally in digital accessibility and AI research.


Python Gemini API License Open Source Research


Status Platform AI


πŸ“‹ Table of Contents


🎯 Overview

GhostMentor is an innovative AI-powered accessibility and research tool designed to assist users in educational environments, accessibility testing, and assistive technology development. Built with cutting-edge technologies including Google Gemini API, Faster-Whisper, OpenCV, and Pygame, GhostMentor represents a significant advancement in real-time AI assistance systems.

🎯 Core Purpose

GhostMentor serves multiple legitimate purposes in the academic and research community:

Purpose Description
πŸ§ͺ Accessibility Research Investigating screen capture exclusion technologies for assistive applications
πŸŽ“ Educational Technology Developing AI-powered tutoring and learning assistance systems
πŸ”¬ Human-Computer Interaction Researching novel HUD interfaces and overlay technologies
β™Ώ Assistive Technology Creating tools for users with visual or cognitive impairments
πŸ“Š AI Research Exploring multimodal AI integration (vision + speech + text)

⚠️ Important Notice: This tool is intended exclusively for educational, research, accessibility testing, and ethical development purposes. Users are responsible for ensuring compliance with all applicable laws, regulations, and institutional policies.


✨ Features

πŸ–₯️ Real-Time Screen Analysis

GhostMentor leverages the powerful Google Gemini API to analyze screen content in real-time, providing instant insights and assistance:

  • Intelligent Image Recognition: Captures and processes screen content using advanced computer vision
  • Context-Aware Responses: AI understands the context of your work and provides relevant assistance
  • Streaming Output: Real-time response streaming for immediate feedback

🎀 Advanced Speech Recognition

Powered by Faster-Whisper for accurate and efficient speech transcription:

  • Low-Latency Processing: Optimized for real-time transcription with minimal delay
  • Multi-Language Support: Supports multiple languages with high accuracy
  • Noise Resilience: Advanced algorithms handle noisy environments effectively
  • Beam Search Decoding: Ensures accurate transcription even with complex speech patterns

🎨 Transparent HUD Interface

A modern, non-intrusive interface that integrates seamlessly with your workflow:

  • Always-On-Top Design: Stays visible without disrupting your primary tasks
  • Scrollable Content: Review extensive AI responses with ease
  • Minimal Visual Footprint: Designed to be helpful without being distracting
  • Customizable Appearance: Adapt the interface to your preferences

πŸ”’ Privacy-Preserving Technology

GhostMentor incorporates privacy-focused design principles:

  • Local Processing: Audio and image processing happens on your machine
  • No Data Storage: Transient processing without permanent data retention
  • API Security: Secure communication with AI services via encrypted connections

πŸŽ“ Academic & Research Applications

πŸ“š Educational Technology Research

GhostMentor provides an excellent platform for researching:

Research Area Application
Adaptive Learning Systems Study how AI assistance affects learning outcomes
Real-Time Tutoring Develop and test AI-powered tutoring methodologies
Cognitive Load Theory Research optimal information presentation in HUDs
Multimodal Interaction Explore combining voice, vision, and text inputs

β™Ώ Accessibility Testing & Development

Essential tool for accessibility researchers and developers:

  • Screen Reader Compatibility Testing: Test how applications behave with various assistive technologies
  • Visual Impairment Simulation: Understand user experiences with limited visual access
  • Assistive Technology Development: Create tools for users with disabilities
  • WCAG Compliance Research: Investigate accessibility standard implementation

πŸ”¬ Human-Computer Interaction Studies

Perfect for HCI researchers investigating:

  • Attention Management: How overlay interfaces affect user focus
  • Information Density: Optimal information presentation in limited screen space
  • Multimodal Interfaces: Combining visual and auditory feedback channels
  • Context-Aware Computing: Systems that adapt to user context and needs

πŸ§ͺ Software Testing & QA

Valuable for software testing professionals:

  • UI/UX Testing: Automated interface analysis and feedback
  • Accessibility Auditing: Identify accessibility issues in applications
  • Cross-Platform Compatibility: Test behavior across different environments
  • Documentation Generation: Automated documentation from visual analysis

πŸ”¬ Technical Architecture

System Architecture Diagram

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      GhostMentor Architecture                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚
β”‚  β”‚   Screen     β”‚    β”‚    Audio     β”‚    β”‚   Keyboard   β”‚      β”‚
β”‚  β”‚   Capture    β”‚    β”‚   Capture    β”‚    β”‚   Handler    β”‚      β”‚
β”‚  β”‚  (OpenCV)    β”‚    β”‚  (PyAudio)   β”‚    β”‚  (Keyboard)  β”‚      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚
β”‚         β”‚                   β”‚                   β”‚               β”‚
β”‚         β–Ό                   β–Ό                   β–Ό               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚                    Processing Core                       β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚   β”‚
β”‚  β”‚  β”‚   Image     β”‚  β”‚   Speech    β”‚  β”‚   Event     β”‚     β”‚   β”‚
β”‚  β”‚  β”‚  Encoding   β”‚  β”‚ Transcribe  β”‚  β”‚  Handling   β”‚     β”‚   β”‚
β”‚  β”‚  β”‚   (PNG)     β”‚  β”‚ (Whisper)   β”‚  β”‚  (Async)    β”‚     β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β”‚   β”‚
β”‚  β”‚         β”‚                β”‚                β”‚            β”‚   β”‚
β”‚  β”‚         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β”‚   β”‚
β”‚  β”‚                          β–Ό                             β”‚   β”‚
β”‚  β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                   β”‚   β”‚
β”‚  β”‚              β”‚    Prompt Builder   β”‚                   β”‚   β”‚
β”‚  β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                   β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                            β”‚                                   β”‚
β”‚                            β–Ό                                   β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚              β”‚     Gemini API Call     β”‚                       β”‚
β”‚              β”‚   (Streaming Response)  β”‚                       β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚                         β”‚                                      β”‚
β”‚                         β–Ό                                      β”‚
β”‚              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚              β”‚     HUD Display         β”‚                       β”‚
β”‚              β”‚    (Pygame Window)      β”‚                       β”‚
β”‚              β”‚   Transparent Overlay   β”‚                       β”‚
β”‚              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ Core Technologies

Gemini API Integration

GhostMentor uses Google's Gemini API for multimodal AI processing:

import google.generativeai as genai

# Image processing pipeline
img_array = np.array(image)
img_rgb = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)
_, buffer = cv2.imencode(".png", img_rgb)
img_bytes = buffer.tobytes()

# API call with streaming
response = model.generate_content(
    [{"mime_type": "image/png", "data": img_bytes}, prompt], 
    stream=True
)

Faster-Whisper Speech Recognition

Efficient speech-to-text processing:

from faster_whisper import WhisperModel

# Initialize model with optimizations
whisper_model = WhisperModel(
    "base", 
    device="cpu", 
    compute_type="int8"
)

# Audio processing pipeline
audio_np = np.frombuffer(data, dtype=np.int16).astype(np.float32) / 32768.0
segments, info = whisper_model.transcribe(
    full_audio, 
    beam_size=5, 
    language="en"
)

Screen Capture Technology

Efficient screen capture with OpenCV and PIL:

from PIL import ImageGrab
import cv2
import numpy as np

# Optimized capture
image = ImageGrab.grab()
img_array = np.array(image)
img_rgb = cv2.cvtColor(img_array, cv2.COLOR_BGR2RGB)

Window Display Affinity

For privacy-preserving display technology research:

import ctypes
import win32gui
import win32con

# Window display configuration
WDA_EXCLUDEFROMCAPTURE = 0x00000011
ctypes.windll.user32.SetWindowDisplayAffinity(hwnd, WDA_EXCLUDEFROMCAPTURE)
win32gui.SetWindowPos(hwnd, win32con.HWND_TOPMOST, 100, 100, 800, 200, 0)

πŸ“¦ Installation

Prerequisites

  • Python 3.8+ (Python 3.10 recommended)
  • Windows Operating System (primary platform)
  • Google Gemini API Key (Get it here)

πŸš€ Quick Start

  1. Clone the Repository
git clone https://github.com/maruf009sultan/GhostMentor.git
cd GhostMentor
  1. Create Virtual Environment (Recommended)
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate
  1. Install Dependencies
pip install -r requirements.txt

Or install manually:

pip install numpy opencv-python pillow google-generativeai pygame pyaudio faster-whisper keyboard pywin32
  1. Configure API Key

Open ghostmentor.py and add your Gemini API key:

API_KEY = "your-gemini-api-key-here"

πŸ” Security Note: For production use, consider using environment variables or a secure configuration file instead of hardcoding your API key.

  1. Run GhostMentor
# Full mode (voice-enabled)
python ghostmentor.py -f

# Silent mode (text-only)
python ghostmentor.py -s

πŸ“ Project Structure

GhostMentor/
β”œβ”€β”€ πŸ“„ ghostmentor.py          # Main application (voice-enabled)
β”œβ”€β”€ πŸ“„ gm_unethical.py         # Stealth research module
β”œβ”€β”€ πŸ“„ requirements.txt        # Python dependencies
β”œβ”€β”€ πŸ“„ LICENSE.md             # GhostMentor Shadow License
β”œβ”€β”€ πŸ“„ README.md              # This documentation
└── 🎬 GhostMentor original.mp4  # Demo video

πŸš€ Usage Guide

Operating Modes

GhostMentor offers two primary operating modes:

πŸ”Š Full Mode (-f)

Voice-enabled operation with complete functionality:

python ghostmentor.py -f

Features:

  • βœ… Screen capture and analysis
  • βœ… Voice input processing
  • βœ… Real-time HUD display
  • βœ… Speech transcription

Best For:

  • Accessibility research
  • Multimodal interaction studies
  • Voice-controlled assistance development

πŸ”‡ Silent Mode (-s)

Text-only operation for focused analysis:

python ghostmentor.py -s

Features:

  • βœ… Screen capture and analysis
  • βœ… Real-time HUD display
  • ❌ Voice input disabled

Best For:

  • Quiet environments
  • Text-based research
  • Minimal system resource usage

🎯 Common Use Cases

1. Educational Assistance

Use GhostMentor as a study companion:

1. Open your study materials (PDFs, websites, etc.)
2. Launch GhostMentor
3. Press Ctrl+H to capture the screen
4. Press Ctrl+Enter to get AI assistance

2. Accessibility Testing

Test application accessibility features:

1. Launch the target application
2. Run GhostMentor alongside it
3. Analyze how information is presented
4. Document accessibility improvements

3. Research & Development

Integrate into your research workflow:

1. Configure your research parameters
2. Use GhostMentor for data collection
3. Analyze AI responses for patterns
4. Document findings for publication

⌨️ Keyboard Shortcuts

Shortcut Action Description
Ctrl + H πŸ“Έ Screenshot Capture current screen for analysis
Ctrl + Enter ⚑ Analyze Send captured content to Gemini API
Ctrl + G πŸ”„ Reset Clear transcript and reset history
Alt + F4 πŸšͺ Exit Close GhostMentor immediately

πŸ›‘οΈ Security & Privacy Considerations

πŸ” Data Handling

GhostMentor is designed with privacy in mind:

Aspect Implementation
Local Processing Screen and audio processing occurs locally
No Persistent Storage Data is not stored permanently
API Security Encrypted communication with Gemini API
User Control User initiates all captures and analysis

⚠️ Responsible Use Guidelines

DO:

  • βœ… Use for legitimate educational purposes
  • βœ… Employ in accessibility research
  • βœ… Utilize for software testing and development
  • βœ… Follow institutional and organizational policies
  • βœ… Obtain proper permissions when required

DON'T:

  • ❌ Use to violate terms of service of any platform
  • ❌ Employ for academic dishonesty or cheating
  • ❌ Use to bypass security measures unauthorized
  • ❌ Violate privacy rights of others
  • ❌ Engage in any illegal activities

πŸ›οΈ Institutional Compliance

When using GhostMentor in academic or institutional settings:

  1. Review Policies: Check your institution's policies on AI assistance tools
  2. Obtain Approval: Secure necessary approvals from ethics committees
  3. Document Usage: Maintain records of research usage
  4. Follow Guidelines: Adhere to field-specific ethical guidelines

πŸ“Š System Requirements

Minimum Requirements

Component Requirement
OS Windows 10/11
Python 3.8 or higher
RAM 4 GB minimum
Storage 500 MB free space
Network Internet connection for API

Recommended Specifications

Component Recommendation
OS Windows 11
Python 3.10+
RAM 8 GB or more
CPU Multi-core processor
Network Stable broadband connection

Dependency Versions

numpy>=1.21.0
opencv-python>=4.5.0
pillow>=8.0.0
google-generativeai>=0.1.0
pygame>=2.0.0
pyaudio>=0.2.11
faster-whisper>=0.9.0
keyboard>=0.13.0
pywin32>=300

🀝 Contributing

We welcome contributions from the research and development community!

How to Contribute

  1. Fork the Repository

    git fork https://github.com/maruf009sultan/GhostMentor.git
  2. Create a Feature Branch

    git checkout -b feature/your-feature-name
  3. Make Your Changes

    • Follow Python best practices
    • Add appropriate documentation
    • Include tests where applicable
  4. Submit a Pull Request

    • Describe your changes clearly
    • Reference any related issues
    • Ensure CI passes

Contribution Guidelines

  • πŸ“ Code Style: Follow PEP 8 guidelines
  • πŸ“– Documentation: Update README for new features
  • πŸ§ͺ Testing: Include tests for new functionality
  • πŸ”’ Security: Report vulnerabilities responsibly

Areas for Contribution

Area Needs
Documentation Tutorials, API docs, translations
Testing Unit tests, integration tests
Features Accessibility improvements, UI enhancements
Research Academic papers, use case studies

πŸ“œ License

GhostMentor is released under the GhostMentor Shadow License (GSL).

See LICENSE.md for the complete license text.

License Summary

  • βœ… Free for educational and research use
  • βœ… Open source with attribution requirements
  • βœ… Modification allowed with license preservation
  • ❌ Commercial use restrictions apply
  • ❌ No warranty provided

πŸ™ Acknowledgments

Technologies & Libraries

Technology Purpose Link
Google Gemini Multimodal AI processing AI Studio
Faster-Whisper Speech recognition GitHub
OpenCV Computer vision Website
Pygame GUI and display Website
PyAudio Audio capture Website

Inspired By

This project was inspired by the open-source community's ongoing efforts to create accessible, AI-powered educational tools. We thank all contributors and researchers in the field of educational technology and assistive computing.

Academic References

If you use GhostMentor in your research, please consider citing:

@software{ghostmentor2024,
  title = {GhostMentor: An AI-Powered Accessibility and Research Assistant},
  author = {maruf009sultan},
  year = {2024},
  url = {https://github.com/maruf009sultan/GhostMentor},
  note = {Educational and research tool for accessibility testing and AI assistance}
}

πŸ“ž Support & Community

GitHub Issues Discussions


⭐ Show Your Support

If GhostMentor has been helpful for your research or educational projects, please consider:

  • ⭐ Starring this repository
  • 🍴 Forking and contributing
  • πŸ“’ Sharing with the research community
  • πŸ“ Citing in your publications

Built with ❀️ for the Research & Education Community

"The code doesn't lie. Neither does GhostMentor."


Made with Python Open Source Love

About

πŸ•ΆοΈ GhostMentor – An AI-powered research and educational tool for accessibility testing, HCI studies, and assistive technology development. πŸ”₯ Open-source platform exploring multimodal AI integration with real-time screen analysis and voice recognition. 🎯 Perfect as an AI study companion, accessibility research tool, or educational assistant.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages