Arabic OCR (كلمتي) - Arabic Handwritten Text Recognition

A deep learning-based system for recognizing handwritten Arabic text using Convolutional Neural Networks (CNN), Bidirectional LSTM, attention mechanisms, and CTC loss. This project provides both a trained model and a user-friendly PyQt6 interface for real-time Arabic handwriting recognition.

Project Overview

This project was developed as part of the Deep Learning course at École d'Ingénierie Digitale et d'Intelligence Artificielle (EIDIA), Euro Mediterranean University of Fez. The system achieves 96.32% character accuracy and 79.98% word accuracy on the IFN/ENIT dataset.

Key Features

Arabic handwritten text recognition
Real-time inference with GUI application
Multiple decoding methods (Greedy and Beam Search)
Pre-trained model ready for deployment
Comprehensive preprocessing pipeline

Platform

Main Interface

The main interface showing the welcome screen with Arabic text and upload functionality

Recognition Results

Example of Arabic handwritten text recognition - Test case 1

Example of Arabic handwritten text recognition - Test case 2

Example of Arabic handwritten text recognition - Test case 3

## Architecture

The model combines several state-of-the-art techniques:

Feature Extraction: Pre-trained ResNet50 (ImageNet) for visual feature extraction
Sequential Processing: Bidirectional LSTM layers for temporal dependencies
Attention Mechanism: Self-attention for enhanced context modeling
Output Alignment: CTC loss for sequence alignment without explicit character-level annotation

Model Pipeline

Input Image (100x300) → ResNet50 → BiLSTM → Attention → CTC → Arabic Text

Dataset

IFN/ENIT Database: A specialized dataset for Arabic handwritten text recognition

Size: 26,000+ handwritten words from 411 different authors
Content: Tunisian city names in Arabic
Splits:
- Training: Sets A, B, C (23,301 images after augmentation)
- Validation: Subset of training data (1,044 images)
- Testing: Set D
Augmentation: Applied geometric transformations, brightness/contrast adjustments, and noise addition

Getting Started

Prerequisites

Option 1: Using pip (Recommended)

pip install -r requirements.txt

Option 2: Using conda

conda env create -f environment.yml
conda activate arabic-ocr

Quick Start

Clone the repository

git clone <repository-url>
cd ARABIC_OCR_INTERFACE

Install dependencies

pip install -r requirements.txt

Run the GUI Application

python main.py

Using the Interface
- Click "تحميل صورة" (Upload Image) to select an Arabic handwritten image
- View results from both Greedy and Beam Search decoding methods
- Click "الرجوع إلى الشاشة الرئيسية" (Return to Main Screen) to try another image

Programmatic Usage

from inference import infer_image
from inference_bm import infer_image2

# Greedy decoding
result_greedy = infer_image('path/to/your/image.jpg')
print(f"Greedy result: {result_greedy}")

# Beam search decoding
result_beam = infer_image2('path/to/your/image.jpg', method='beam', beam_width=30)
print(f"Beam search result: {result_beam}")

Project Structure

ARABIC_OCR_INTERFACE/
├── main.py                    # Main application entry point
├── first_screen.py           # Home screen GUI component
├── second_screen.py          # Results display GUI component
├── inference.py              # Greedy decoding inference
├── inference_bm.py           # Beam search decoding inference
├── ocr_model.keras          # Pre-trained model (96MB)
├── char_code_files/         # Character encoding mappings
│   ├── chars_to_codes.json  # Character to code mapping
│   └── codes_to_chars.json  # Code to character mapping
├── set_a_images/            # Sample training images
├── set_d_images/            # Sample test images
├── OCR_PROJECT_NOTEBOOK.ipynb  # Complete training notebook

🔧 Technical Details

Model Specifications

Input Size: 100×300 grayscale images
Architecture: ResNet50 + 2×BiLSTM(512,256) + Attention + CTC
Output Classes: 120 (Arabic characters + blank token)
Training: 34 epochs with adaptive learning rate
Loss Function: CTC (Connectionist Temporal Classification)

Preprocessing Pipeline

Grayscale conversion
Resize to 100×300 pixels
Binary thresholding
Color inversion (text appears white on black background)
Normalization to [0,1] range

Character Encoding

The system uses a specialized encoding for Arabic characters considering their contextual forms:

Position-aware encoding: Characters encoded based on position (beginning, middle, end, isolated)
120 unique tokens: Covering all Arabic character variations plus blank token
Example: kaB → قـ (Arabic letter Qaf at beginning of word)

Decoding Methods

Greedy Decoding: Selects most probable character at each time step
Beam Search: Explores multiple hypotheses with configurable beam width (default: 30)

📈 Performance Metrics

Metric	Greedy Decoding	Beam Search (width=20)
Character Accuracy Rate (CAR)	96.46%	96.33%
Word Accuracy Rate (WAR)	79.31%	79.98%

Training Curves

The model was trained in multiple phases:

Phase 1: Full model training (epochs 1-16)
Phase 2: Fine-tuning last 30 layers (epochs 17-26)
Phase 3: Fine-tuning last 50 layers (epochs 27-34)

GUI Features

Modern Arabic Interface: RTL support with Arabic text
Dual Decoding Display: Shows results from both decoding methods
Image Preview: Displays uploaded image with proper scaling
Intuitive Navigation: Simple two-screen interface

Evaluation

Test Environment

Hardware: Google Colab (A100 GPU) and RTX 4070 Laptop GPU
Training Time: ~5 hours for complete training
Inference Speed: Real-time processing for single images

Dataset Split

Training: Sets A, B, C (90% for training, 10% for validation)
Testing: Set D (independent evaluation)
Data Cleaning: Removed samples with annotation errors (e.g., chadda issues, transcription errors)

Data Augmentation

Applied to increase robustness and dataset size (10,435 → 25,887 images):

Geometric: Rotation (±3°), shearing
Photometric: Brightness/contrast adjustment
Noise: Gaussian noise, motion blur
Random Application: 0-3 augmentations per image
Mobile Deployment: Optimize model for mobile inference
Multi-language Support: Extend to other Arabic dialects and languages
Real-time Video OCR: Process video streams for continuous text recognition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic OCR (كلمتي) - Arabic Handwritten Text Recognition

Project Overview

Key Features

Platform

Main Interface

Recognition Results

Model Pipeline

Dataset

Getting Started

Prerequisites

Quick Start

Programmatic Usage

Project Structure

🔧 Technical Details

Model Specifications

Preprocessing Pipeline

Character Encoding

Decoding Methods

📈 Performance Metrics

Training Curves

GUI Features

Evaluation

Test Environment

Dataset Split

Data Augmentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
char_code_files		char_code_files
screenshots		screenshots
test_images_set_d		test_images_set_d
.gitignore		.gitignore
OCR_PROJECT_NOTEBOOK.ipynb		OCR_PROJECT_NOTEBOOK.ipynb
README.md		README.md
environment.yml		environment.yml
first_screen.py		first_screen.py
inference.py		inference.py
inference_bm.py		inference_bm.py
main.py		main.py
ocr_model.keras		ocr_model.keras
requirements.txt		requirements.txt
second_screen.py		second_screen.py

Folders and files

Latest commit

History

Repository files navigation

Arabic OCR (كلمتي) - Arabic Handwritten Text Recognition

Project Overview

Key Features

Platform

Main Interface

Recognition Results

Model Pipeline

Dataset

Getting Started

Prerequisites

Quick Start

Programmatic Usage

Project Structure

🔧 Technical Details

Model Specifications

Preprocessing Pipeline

Character Encoding

Decoding Methods

📈 Performance Metrics

Training Curves

GUI Features

Evaluation

Test Environment

Dataset Split

Data Augmentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages