GitHub - satrajitghosh183/McGurk-Effect: Testing McGurk Effect in Deep Learning Models

Project Title

Testing the McGurk Effect in Multimodal Machine Learning Models

Abstract

This project investigates the susceptibility of multimodal machine learning models to sensory conflicts using the McGurk Effect, an audio-visual illusion where mismatched cues create altered perception. By creating cohesive, mismatched, and adversarial datasets from the GRID audiovisual corpus, we evaluate three architectures—Bimodal Autoencoder, Multimodal Transformer, and Audio-Visual Convolutional Neural Network (AVCNN). The results reveal significant vulnerabilities in multimodal systems, emphasizing the need for more robust models in real-world applications.

Objectives

Simulate the McGurk Effect by creating mismatched audio-visual datasets.
Evaluate multimodal models' performance on congruent and incongruent input pairs.
Identify model architectures most resilient to sensory conflicts.
Provide insights into improving robustness in multimodal systems.

Features

Dataset Preprocessing:
- Audio: Converted to Mel spectrograms (64 or 40 bands), normalized, and padded to consistent lengths.
- Video: Extracted frames resized to 112x112 pixels, with an additional pipeline for mouth regions resized to 60x80 pixels.
- Alignment: Paired audio and video features by phoneme timestamps.
Model Architectures Tested:
1. Bimodal Autoencoder: Encodes shared latent features of audio and video.
2. Multimodal Transformer: Utilizes self-attention and cross-modal attention for phoneme classification.
3. AVCNN: Tests robustness using adversarial perturbations.

Results

Bimodal Autoencoder: Effectively captures shared features but struggles with mismatched pairs.
Transformer: Achieved 88.78% training accuracy and 85.75% validation accuracy; simulates human-like predictions in 30% of cases.
AVCNN: Demonstrated high susceptibility to adversarial mismatches, with 80% fooled predictions.

Future Work

Extend analysis to include additional modalities.
Test robustness across larger datasets and diverse real-world scenarios.
Investigate whether increased data volume enhances or compromises model robustness.

Contributors

Satrajit Ghosh
ECE, Rutgers New Brunswick
Email: sg2231@rutgers.edu

References

GRID Audiovisual Corpus
Related research papers cited in the methodology and results sections.

Feel free to customize this further or add links to specific sections of your repository!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
DataPreprocessing.ipynb		DataPreprocessing.ipynb
README.md		README.md
TA-LSTM.ipynb		TA-LSTM.ipynb
Transformers.ipynb		Transformers.ipynb
data_preprocessing.ipynb		data_preprocessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Title

Abstract

Objectives

Features

Results

Future Work

Contributors

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Project Title

Abstract

Objectives

Features

Results

Future Work

Contributors

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages