LinguaForge

LinguaForge is an advanced, AI-powered ebook translation platform. It seamlessly transforms your PDF and EPUB files into 24+ languages, leveraging the state-of-the-art google/translategemma-4b-it model via the HuggingFace Inference API.

Features

Multi-Format Support: Seamlessly extract and translate text from both PDF and EPUB files while preserving structural elements like chapters.
AI-Powered Translation: Utilizes Google's TranslateGemma model for high-quality, contextual translations in over 24 languages.
Customizable Tones: Choose translation tones (e.g., fluent, formal, casual) to suit your reading preference.
Scalable Pipeline: Asynchronous processing with Redis-backed state management ensures robust handling of large books.
Modern UI: A sleek, responsive Next.js 14 frontend built with Tailwind CSS and shadcn/ui.
Secure: API key and JWT-based authentication, rate limiting, and strict file validation.

Architecture

┌──────────────┐     ┌──────────────┐     ┌─────────┐
│   Frontend   │────▶│   Backend    │────▶│  Redis  │
│  Next.js 14  │     │   FastAPI    │     │  State  │
│  shadcn/ui   │     │   Uvicorn    │     │  Store  │
└──────────────┘     └──────┬───────┘     └─────────┘
                            │
                     ┌──────▼───────┐
                     │  HuggingFace │
                     │ Inference API│
                     └──────────────┘

Layer	Technology Stack
Frontend	Next.js 14 (App Router), TypeScript, Tailwind CSS, shadcn/ui
Backend	FastAPI, Uvicorn, httpx (async), PyMuPDF, ebooklib
State	Redis (for job tracking and chunk translation storage)
Model	`google/translategemma-4b-it` via HuggingFace Inference API

Prerequisites

Docker and Docker Compose (for containerized setup)
Python 3.11+ (for local backend development)
Node.js 20+ and npm (for local frontend development)
A HuggingFace API key with Inference API access.

Quick Start (Docker)

The fastest way to get LinguaForge up and running is via Docker Compose.

# 1. Clone the repository
git clone https://github.com/iamEtornam/LinguaForge.git
cd LinguaForge

# 2. Create environment files from templates
cp .env.example .env
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env

# 3. Generate an API key and JWT secret
cd backend
python3 -m app.utils.api_keys   # Copy the generated key
cd ..

# 4. Edit .env files with your values
#    - Set HUGGINGFACE_API_KEY to your HuggingFace token
#    - Set API_KEYS to the key generated in step 3
#    - Set JWT_SECRET_KEY (generate with: openssl rand -hex 32)
#    - Set NEXT_PUBLIC_API_KEY in frontend/.env to the same API key

# 5. Build and start all services
docker compose up --build

# 6. Access the application
#    Frontend:  http://localhost:3000
#    API docs:  http://localhost:8000/docs
#    Health:    http://localhost:8000/health

To stop the services and clean up:

docker compose down           # Stop containers
docker compose down -v        # Stop and remove volumes (clears Redis data)

Local Development

Use this setup when you want hot-reloading and faster iteration during development.

1. Start Redis

Redis is required for job state tracking. Run it via Docker:

docker run -d --name linguaforge-redis -p 6379:6379 redis:7-alpine

(Alternatively, install it natively via your package manager, e.g., brew install redis on macOS).

2. Backend Setup

cd backend

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate        # macOS/Linux
# venv\Scripts\activate         # Windows

# Install dependencies
pip install -r requirements.txt

# Set up environment
cp .env.example .env
# Edit .env:
#   HUGGINGFACE_API_KEY=hf_your_key_here
#   API_KEYS=lf_your_generated_key
#   JWT_SECRET_KEY=your_jwt_secret        (generate: openssl rand -hex 32)
#   REDIS_URL=redis://localhost:6379/0

# Generate an API key (optional helper)
python -m app.utils.api_keys

# Start the backend with hot-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000. Interactive docs are at http://localhost:8000/docs.

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Set up environment
cp .env.example .env
# Edit .env:
#   NEXT_PUBLIC_API_URL=http://localhost:8000/api
#   NEXT_PUBLIC_API_KEY=lf_your_api_key    (same key as backend API_KEYS)

# Start the dev server
npm run dev

The application will be available at http://localhost:3000.

Environment Variables

Backend (`backend/.env`)

Variable	Required	Default	Description
`HUGGINGFACE_API_KEY`	Yes	—	HuggingFace API token for translation model
`API_KEYS`	Yes	—	Comma-separated API keys for client auth
`JWT_SECRET_KEY`	Yes	—	Secret for signing JWT tokens
`REDIS_URL`	No	`redis://redis:6379/0`	Redis connection string
`CORS_ORIGINS`	No	`http://localhost:3000`	Allowed CORS origins (comma-separated)
`FILE_SIZE_LIMIT_MB`	No	`10`	Max upload file size in MB
`MAX_CHUNK_SIZE`	No	`500`	Max tokens per translation chunk
`MAX_CONCURRENT_REQUESTS`	No	`5`	Max parallel HuggingFace API calls
`RATE_LIMIT_PER_MINUTE`	No	`5`	Rate limit per IP per minute
`RATE_LIMIT_PER_HOUR`	No	`50`	Rate limit per IP per hour
`RATE_LIMIT_PER_DAY`	No	`200`	Rate limit per IP per day
`MAX_EXTRACTED_TEXT_MB`	No	`50`	Max extracted text size in MB

Frontend (`frontend/.env`)

Variable	Required	Default	Description
`NEXT_PUBLIC_API_URL`	No	`http://localhost:8000/api`	Backend API base URL
`NEXT_PUBLIC_API_KEY`	Yes	—	API key for backend auth

API Endpoints

All endpoints are prefixed with /api. Authentication is required via the X-API-Key header or a JWT Bearer token.

Method	Endpoint	Auth	Description
`GET`	`/health`	No	Health check
`POST`	`/api/auth/token`	API Key	Exchange API key for JWT token
`POST`	`/api/upload`	Yes	Upload PDF/EPUB file (max 10MB)
`POST`	`/api/translate/{job_id}`	Yes	Start translation for a job
`GET`	`/api/status/{job_id}`	Yes	Get translation progress
`GET`	`/api/download/{job_id}`	Yes	Download translated file

Example: Upload and Translate via CLI

API_KEY="lf_your_key_here"

# 1. Get a JWT token
TOKEN=$(curl -s -X POST http://localhost:8000/api/auth/token \
  -H "X-API-Key: $API_KEY" | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")

# 2. Upload a file
JOB_ID=$(curl -s -X POST http://localhost:8000/api/upload \
  -H "Authorization: Bearer $TOKEN" \
  -F "file=@mybook.pdf" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")

echo "Job ID: $JOB_ID"

# 3. Start translation (English to French, fluent tone)
curl -X POST "http://localhost:8000/api/translate/$JOB_ID" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"source_lang": "en", "target_lang": "fr", "tone": "fluent"}'

# 4. Poll status until complete
curl -s "http://localhost:8000/api/status/$JOB_ID" \
  -H "Authorization: Bearer $TOKEN"

# 5. Download the translated file
curl -O -J "http://localhost:8000/api/download/$JOB_ID" \
  -H "Authorization: Bearer $TOKEN"

Translation Pipeline

Upload: File is validated (type, size, MIME check) and stored temporarily. A background job is initialized in Redis.
Extract: Text is precisely extracted from PDF (PyMuPDF) or EPUB (ebooklib), maintaining internal chapter structures.
Chunk: Text is intelligently split on paragraph boundaries, strictly respecting the token-per-chunk limit for optimal AI processing.
Translate: Chunks are sent to the HuggingFace Inference API utilizing configurable concurrency and resilient exponential backoff retries.
Download: Translated chunks are sequentially reassembled, served as a clean text file, and temporary assets are securely wiped.

Security

Authentication: Dual-layer API key + JWT token-based access control.
File Validation: Strict MIME type enforcement and file extension checks prevent malicious uploads.
Size Limits: Configurable maximum upload payload (default 10MB).
Rate Limiting: Granular per-IP limits on all endpoints via slowapi.
CORS: Restricted strictly to configured origins.
Security Headers: Hardened HTTP headers applied via FastAPI middleware.
Non-root Containers: Docker images are configured to run as unprivileged users.
Auto-cleanup: Temporary files are immediately deleted post-download to prevent storage bloat.

Deployment

Option 1: Docker Compose (Recommended for VPS/Cloud VM)

Ideal for single-server setups (e.g., DigitalOcean, Hetzner, AWS EC2).

# Clone and configure
git clone https://github.com/iamEtornam/LinguaForge.git
cd LinguaForge
cp .env.example .env
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env
# Fill in all required environment variables

# Build and start in detached mode
docker compose up --build -d

# Verify everything is running
docker compose ps
docker compose logs -f backend

Note: Set up a reverse proxy (like Nginx or Caddy) to expose ports 3000 and 8000 securely via HTTPS. Recommended Specs: 2+ vCPU, 2GB+ RAM. Redis needs ~512MB.

Production Checklist

Set strong, unique values for API_KEYS and JWT_SECRET_KEY
Configure CORS_ORIGINS to your production domain(s)
Deploy behind a reverse proxy (Nginx/Caddy) with SSL/TLS
Restrict firewall rules to expose only ports 80/443
Enable log rotation for Docker containers
Set up external monitoring/alerting (pinging /health)

Option 2: Split Deployment

Deploy services independently for enhanced scalability.

Backend (Railway / Render / Fly.io)

Create a web service pointing to the backend/ directory.
Build command: pip install -r requirements.txt
Start command: uvicorn app.main:app --host 0.0.0.0 --port 8000
Inject all backend environment variables.
Provision a Redis add-on and map its URL to REDIS_URL.

Frontend (Vercel / Netlify)

Import the repository and set the root directory to frontend/.
Framework preset: Next.js.
Inject environment variables:
- NEXT_PUBLIC_API_URL = <your-deployed-backend-url>/api
- NEXT_PUBLIC_API_KEY = <your-api-key>
Deploy.

Redis Use a managed provider like Upstash, Redis Cloud, or a platform add-on.

Project Structure

linguaForge/
├── backend/                    # Python FastAPI backend
│   ├── app/
│   │   ├── main.py             # App entry point & lifespan
│   │   ├── routes.py           # API endpoint handlers
│   │   ├── config.py           # Settings (pydantic-settings)
│   │   ├── auth.py             # JWT & API key authentication
│   │   ├── middleware.py       # Rate limiting & security headers
│   │   ├── translator.py       # HuggingFace translation service
│   │   ├── parsers.py          # PDF & EPUB text extraction
│   │   ├── chunker.py          # Text chunking logic
│   │   ├── redis_client.py     # Redis state management
│   │   ├── models.py           # Pydantic request/response models
│   │   ├── logger.py           # Logging configuration
│   │   └── utils/
│   │       └── api_keys.py     # API key generation utility
│   ├── verify_security.py      # Security test suite
│   ├── requirements.txt
│   ├── Dockerfile
│   └── .env.example
├── frontend/                   # Next.js TypeScript frontend
│   ├── src/
│   │   ├── app/                # Next.js App Router (pages & layout)
│   │   ├── components/         # React components + shadcn/ui
│   │   ├── hooks/              # Custom React hooks
│   │   └── lib/                # API client, auth, constants
│   ├── package.json
│   ├── Dockerfile
│   └── .env.example
├── docker-compose.yml
├── .env.example
├── .gitignore
├── LICENSE
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── README.md

Contributing

Contributions are highly appreciated! Please review our Contributing Guide before submitting a pull request.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinguaForge

Table of Contents

Features

Architecture

Prerequisites

Quick Start (Docker)

Local Development

1. Start Redis

2. Backend Setup

3. Frontend Setup

Environment Variables

Backend (`backend/.env`)

Frontend (`frontend/.env`)

API Endpoints

Example: Upload and Translate via CLI

Translation Pipeline

Security

Deployment

Option 1: Docker Compose (Recommended for VPS/Cloud VM)

Production Checklist

Option 2: Split Deployment

Project Structure

Contributing

License

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
assets		assets
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

LinguaForge

Table of Contents

Features

Architecture

Prerequisites

Quick Start (Docker)

Local Development

1. Start Redis

2. Backend Setup

3. Frontend Setup

Environment Variables

Backend (backend/.env)

Frontend (frontend/.env)

API Endpoints

Example: Upload and Translate via CLI

Translation Pipeline

Security

Deployment

Option 1: Docker Compose (Recommended for VPS/Cloud VM)

Production Checklist

Option 2: Split Deployment

Project Structure

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages

Backend (`backend/.env`)

Frontend (`frontend/.env`)