MLA-CamemBERT

MLA-CamemBERT is a project aimed at reproducing and adapting the CamemBERT model, a French-optimized version of RoBERTa.

📚 Description

This project provides a complete pipeline to:

Load and preprocess French-language datasets.
Adapt a multilingual model to the French language.
Fine-tune the model on various NLP tasks, including:
- POS Tagging (Part-of-Speech Tagging)
- Dependency Parsing
- Natural Language Inference (NLI)
- Named Entity Recognition (NER)

The model is designed to be efficient to train and easily integratable into NLP applications.

🛠️ Installation

Clone the repository:

git clone https://github.com/salhiraid/MLA-CamemBERT.git
cd MLA-CamemBERT

Create a virtual environment and install dependencies:

Copy code
python -m venv env
source env/bin/activate  # On Windows, use `env\Scripts\activate`
pip install -r requirements.txt

🚀 Usage Data Preprocessing Use the datasets.py script to handle loading and preparing data for training. Ensure that your datasets are correctly formatted before proceeding:

 python src/datasets.py

Training All training processes are conducted using Jupyter notebooks located in the notebooks/ directory. You can run the notebooks to train the model from scratch or fine-tune it on specific downstream tasks.
Model Implementation The implementation of the CamemBERT-like model from scratch is located in the src/model/ directory. You can directly use this model in your experiments:

  from src.model.camembert_model import CamemBERTBase

Evaluation Evaluate the performance of your trained model on specific tasks using the evaluation notebooks or scripts:

notebooks/evaluate_nli.ipynb notebooks/evaluate_ner.ipynb

📊 Results The project achieves competitive performance on the following tasks:

NLI (Natural Language Inference): Accuracy of 85% on the XNLI dataset. NER (Named Entity Recognition): F1-score of 91% on the CoNLL-2003 dataset. POS Tagging and Dependency Parsing: Results comparable to the original CamemBERT paper across multiple French treebanks (e.g., GSD, Sequoia).

🤝 Contributors

Noureddine Khaous
Raid Salhi
Amine Ouguouenoune
Ramy Larabi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLA-CamemBERT

📚 Description

🛠️ Installation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MLA-CamemBERT

📚 Description

🛠️ Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages