Skip to content

iamEtornam/LinguaForge

LinguaForge

LinguaForge Hero

License: MIT Python 3.11+ Next.js 14 FastAPI

LinguaForge is an advanced, AI-powered ebook translation platform. It seamlessly transforms your PDF and EPUB files into 24+ languages, leveraging the state-of-the-art google/translategemma-4b-it model via the HuggingFace Inference API.


Table of Contents


Features

  • Multi-Format Support: Seamlessly extract and translate text from both PDF and EPUB files while preserving structural elements like chapters.
  • AI-Powered Translation: Utilizes Google's TranslateGemma model for high-quality, contextual translations in over 24 languages.
  • Customizable Tones: Choose translation tones (e.g., fluent, formal, casual) to suit your reading preference.
  • Scalable Pipeline: Asynchronous processing with Redis-backed state management ensures robust handling of large books.
  • Modern UI: A sleek, responsive Next.js 14 frontend built with Tailwind CSS and shadcn/ui.
  • Secure: API key and JWT-based authentication, rate limiting, and strict file validation.

Architecture

┌──────────────┐     ┌──────────────┐     ┌─────────┐
│   Frontend   │────▶│   Backend    │────▶│  Redis  │
│  Next.js 14  │     │   FastAPI    │     │  State  │
│  shadcn/ui   │     │   Uvicorn    │     │  Store  │
└──────────────┘     └──────┬───────┘     └─────────┘
                            │
                     ┌──────▼───────┐
                     │  HuggingFace │
                     │ Inference API│
                     └──────────────┘
Layer Technology Stack
Frontend Next.js 14 (App Router), TypeScript, Tailwind CSS, shadcn/ui
Backend FastAPI, Uvicorn, httpx (async), PyMuPDF, ebooklib
State Redis (for job tracking and chunk translation storage)
Model google/translategemma-4b-it via HuggingFace Inference API

Prerequisites


Quick Start (Docker)

The fastest way to get LinguaForge up and running is via Docker Compose.

# 1. Clone the repository
git clone https://github.com/iamEtornam/LinguaForge.git
cd LinguaForge

# 2. Create environment files from templates
cp .env.example .env
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env

# 3. Generate an API key and JWT secret
cd backend
python3 -m app.utils.api_keys   # Copy the generated key
cd ..

# 4. Edit .env files with your values
#    - Set HUGGINGFACE_API_KEY to your HuggingFace token
#    - Set API_KEYS to the key generated in step 3
#    - Set JWT_SECRET_KEY (generate with: openssl rand -hex 32)
#    - Set NEXT_PUBLIC_API_KEY in frontend/.env to the same API key

# 5. Build and start all services
docker compose up --build

# 6. Access the application
#    Frontend:  http://localhost:3000
#    API docs:  http://localhost:8000/docs
#    Health:    http://localhost:8000/health

To stop the services and clean up:

docker compose down           # Stop containers
docker compose down -v        # Stop and remove volumes (clears Redis data)

Local Development

Use this setup when you want hot-reloading and faster iteration during development.

1. Start Redis

Redis is required for job state tracking. Run it via Docker:

docker run -d --name linguaforge-redis -p 6379:6379 redis:7-alpine

(Alternatively, install it natively via your package manager, e.g., brew install redis on macOS).

2. Backend Setup

cd backend

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate        # macOS/Linux
# venv\Scripts\activate         # Windows

# Install dependencies
pip install -r requirements.txt

# Set up environment
cp .env.example .env
# Edit .env:
#   HUGGINGFACE_API_KEY=hf_your_key_here
#   API_KEYS=lf_your_generated_key
#   JWT_SECRET_KEY=your_jwt_secret        (generate: openssl rand -hex 32)
#   REDIS_URL=redis://localhost:6379/0

# Generate an API key (optional helper)
python -m app.utils.api_keys

# Start the backend with hot-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000. Interactive docs are at http://localhost:8000/docs.

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Set up environment
cp .env.example .env
# Edit .env:
#   NEXT_PUBLIC_API_URL=http://localhost:8000/api
#   NEXT_PUBLIC_API_KEY=lf_your_api_key    (same key as backend API_KEYS)

# Start the dev server
npm run dev

The application will be available at http://localhost:3000.


Environment Variables

Backend (backend/.env)

Variable Required Default Description
HUGGINGFACE_API_KEY Yes HuggingFace API token for translation model
API_KEYS Yes Comma-separated API keys for client auth
JWT_SECRET_KEY Yes Secret for signing JWT tokens
REDIS_URL No redis://redis:6379/0 Redis connection string
CORS_ORIGINS No http://localhost:3000 Allowed CORS origins (comma-separated)
FILE_SIZE_LIMIT_MB No 10 Max upload file size in MB
MAX_CHUNK_SIZE No 500 Max tokens per translation chunk
MAX_CONCURRENT_REQUESTS No 5 Max parallel HuggingFace API calls
RATE_LIMIT_PER_MINUTE No 5 Rate limit per IP per minute
RATE_LIMIT_PER_HOUR No 50 Rate limit per IP per hour
RATE_LIMIT_PER_DAY No 200 Rate limit per IP per day
MAX_EXTRACTED_TEXT_MB No 50 Max extracted text size in MB

Frontend (frontend/.env)

Variable Required Default Description
NEXT_PUBLIC_API_URL No http://localhost:8000/api Backend API base URL
NEXT_PUBLIC_API_KEY Yes API key for backend auth

API Endpoints

All endpoints are prefixed with /api. Authentication is required via the X-API-Key header or a JWT Bearer token.

Method Endpoint Auth Description
GET /health No Health check
POST /api/auth/token API Key Exchange API key for JWT token
POST /api/upload Yes Upload PDF/EPUB file (max 10MB)
POST /api/translate/{job_id} Yes Start translation for a job
GET /api/status/{job_id} Yes Get translation progress
GET /api/download/{job_id} Yes Download translated file

Example: Upload and Translate via CLI

API_KEY="lf_your_key_here"

# 1. Get a JWT token
TOKEN=$(curl -s -X POST http://localhost:8000/api/auth/token \
  -H "X-API-Key: $API_KEY" | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

# 2. Upload a file
JOB_ID=$(curl -s -X POST http://localhost:8000/api/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@mybook.pdf" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")

echo "Job ID: $JOB_ID"

# 3. Start translation (English to French, fluent tone)
curl -X POST "http://localhost:8000/api/translate/$JOB_ID" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"source_lang": "en", "target_lang": "fr", "tone": "fluent"}'

# 4. Poll status until complete
curl -s "http://localhost:8000/api/status/$JOB_ID" \
  -H "Authorization: Bearer $TOKEN"

# 5. Download the translated file
curl -O -J "http://localhost:8000/api/download/$JOB_ID" \
  -H "Authorization: Bearer $TOKEN"

Translation Pipeline

  1. Upload: File is validated (type, size, MIME check) and stored temporarily. A background job is initialized in Redis.
  2. Extract: Text is precisely extracted from PDF (PyMuPDF) or EPUB (ebooklib), maintaining internal chapter structures.
  3. Chunk: Text is intelligently split on paragraph boundaries, strictly respecting the token-per-chunk limit for optimal AI processing.
  4. Translate: Chunks are sent to the HuggingFace Inference API utilizing configurable concurrency and resilient exponential backoff retries.
  5. Download: Translated chunks are sequentially reassembled, served as a clean text file, and temporary assets are securely wiped.

Security

  • Authentication: Dual-layer API key + JWT token-based access control.
  • File Validation: Strict MIME type enforcement and file extension checks prevent malicious uploads.
  • Size Limits: Configurable maximum upload payload (default 10MB).
  • Rate Limiting: Granular per-IP limits on all endpoints via slowapi.
  • CORS: Restricted strictly to configured origins.
  • Security Headers: Hardened HTTP headers applied via FastAPI middleware.
  • Non-root Containers: Docker images are configured to run as unprivileged users.
  • Auto-cleanup: Temporary files are immediately deleted post-download to prevent storage bloat.

Deployment

Option 1: Docker Compose (Recommended for VPS/Cloud VM)

Ideal for single-server setups (e.g., DigitalOcean, Hetzner, AWS EC2).

# Clone and configure
git clone https://github.com/iamEtornam/LinguaForge.git
cd LinguaForge
cp .env.example .env
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env
# Fill in all required environment variables

# Build and start in detached mode
docker compose up --build -d

# Verify everything is running
docker compose ps
docker compose logs -f backend

Note: Set up a reverse proxy (like Nginx or Caddy) to expose ports 3000 and 8000 securely via HTTPS. Recommended Specs: 2+ vCPU, 2GB+ RAM. Redis needs ~512MB.

Production Checklist

  • Set strong, unique values for API_KEYS and JWT_SECRET_KEY
  • Configure CORS_ORIGINS to your production domain(s)
  • Deploy behind a reverse proxy (Nginx/Caddy) with SSL/TLS
  • Restrict firewall rules to expose only ports 80/443
  • Enable log rotation for Docker containers
  • Set up external monitoring/alerting (pinging /health)

Option 2: Split Deployment

Deploy services independently for enhanced scalability.

Backend (Railway / Render / Fly.io)

  1. Create a web service pointing to the backend/ directory.
  2. Build command: pip install -r requirements.txt
  3. Start command: uvicorn app.main:app --host 0.0.0.0 --port 8000
  4. Inject all backend environment variables.
  5. Provision a Redis add-on and map its URL to REDIS_URL.

Frontend (Vercel / Netlify)

  1. Import the repository and set the root directory to frontend/.
  2. Framework preset: Next.js.
  3. Inject environment variables:
    • NEXT_PUBLIC_API_URL = <your-deployed-backend-url>/api
    • NEXT_PUBLIC_API_KEY = <your-api-key>
  4. Deploy.

Redis Use a managed provider like Upstash, Redis Cloud, or a platform add-on.


Project Structure

linguaForge/
├── backend/                    # Python FastAPI backend
│   ├── app/
│   │   ├── main.py             # App entry point & lifespan
│   │   ├── routes.py           # API endpoint handlers
│   │   ├── config.py           # Settings (pydantic-settings)
│   │   ├── auth.py             # JWT & API key authentication
│   │   ├── middleware.py       # Rate limiting & security headers
│   │   ├── translator.py       # HuggingFace translation service
│   │   ├── parsers.py          # PDF & EPUB text extraction
│   │   ├── chunker.py          # Text chunking logic
│   │   ├── redis_client.py     # Redis state management
│   │   ├── models.py           # Pydantic request/response models
│   │   ├── logger.py           # Logging configuration
│   │   └── utils/
│   │       └── api_keys.py     # API key generation utility
│   ├── verify_security.py      # Security test suite
│   ├── requirements.txt
│   ├── Dockerfile
│   └── .env.example
├── frontend/                   # Next.js TypeScript frontend
│   ├── src/
│   │   ├── app/                # Next.js App Router (pages & layout)
│   │   ├── components/         # React components + shadcn/ui
│   │   ├── hooks/              # Custom React hooks
│   │   └── lib/                # API client, auth, constants
│   ├── package.json
│   ├── Dockerfile
│   └── .env.example
├── docker-compose.yml
├── .env.example
├── .gitignore
├── LICENSE
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── README.md

Contributing

Contributions are highly appreciated! Please review our Contributing Guide before submitting a pull request.

License

This project is licensed under the MIT License.

About

AI-powered ebook translation platform. Upload PDF or EPUB files and translate them into 24+ languages using Google's TranslateGemma 4B model.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Contributors