MMT-JEPA

A multimodal English ↔ Twi model trained with a LeJEPA-style objective (predictive MSE + SIGReg on encoder pools).

What it does

Learns a shared latent space across text and audio in both languages by training a predictor to anticipate target representations from context — no reconstruction loss, no cascaded pipeline.

Three training objectives:

A — Audio → Text (both languages)
B — Text → Text (translation)
C — Text → Audio (both languages)

Files

File	Purpose
`model.py`	`MMT_JEPA` (shared encoder + predictor)
`sigreg.py`	SIGReg loss (LeJEPA)
`dataset.py`	`ObjA`, `ObjB`, `ObjC` dataset classes
`tokenizer.py`	Trains a joint BPE tokenizer on all text data
`train.py`	SSL pretraining (all objectives)
`train_decoder.py`	Decoder fine-tuning on frozen or tunable JEPA

Setup

pip install torch torchaudio soundfile sentencepiece datasets

Usage

1. Train the tokenizer

python tokenizer.py
# outputs: tokenizer.model, tokenizer.vocab

2. Train the model

python train.py

Checkpoints saved to checkpoints/epoch{N}.pt after each epoch.

Data

Objective	Dataset
A + C (English audio)	LibriSpeech train-clean-100
A + C (Twi audio)	twi-speech-text-multispeaker-16k
B (translation)	twi-english-paragraph-dataset_news · english-twi-sentences-non-nouns · english-twi-nouns-v2

All datasets load automatically via HuggingFace on first run.

Model config

Edit ModelConfig in model.py to change capacity:

d_model        = 512    # embedding dimension
trunk_layers   = 6      # shared transformer depth
vocab_size     = 16_000
n_mels         = 80
sample_rate    = 16_000
sigreg_lambda  = 0.02 # LeJEPA trade-off (TinyMMT_JEPAConfig in config.py for training runs)

Training notes

L2-normalise pooled predictions and targets before MSE; SIGReg runs on raw pooled ctx/tgt stacks
Loss: (1 - λ) · MSE + λ · SIGReg with λ = sigreg_lambda
Possible COLLAPSE log when embedding std is tiny or cosine similarity is near 1

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
config.py		config.py
dataset.py		dataset.py
decoder.py		decoder.py
inference.py		inference.py
logger.py		logger.py
model.py		model.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sigreg.py		sigreg.py
tokenizer.py		tokenizer.py
train.py		train.py
train_decoder.py		train_decoder.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMT-JEPA

What it does

Files

Setup

Usage

Data

Model config

Training notes

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MMT-JEPA

What it does

Files

Setup

Usage

Data

Model config

Training notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages