LinguaForge is an advanced, AI-powered ebook translation platform. It seamlessly transforms your PDF and EPUB files into 24+ languages, leveraging the state-of-the-art google/translategemma-4b-it model via the HuggingFace Inference API.
- Features
- Architecture
- Prerequisites
- Quick Start (Docker)
- Local Development
- Environment Variables
- API Endpoints
- Translation Pipeline
- Security
- Deployment
- Project Structure
- Contributing
- License
- Multi-Format Support: Seamlessly extract and translate text from both PDF and EPUB files while preserving structural elements like chapters.
- AI-Powered Translation: Utilizes Google's TranslateGemma model for high-quality, contextual translations in over 24 languages.
- Customizable Tones: Choose translation tones (e.g., fluent, formal, casual) to suit your reading preference.
- Scalable Pipeline: Asynchronous processing with Redis-backed state management ensures robust handling of large books.
- Modern UI: A sleek, responsive Next.js 14 frontend built with Tailwind CSS and shadcn/ui.
- Secure: API key and JWT-based authentication, rate limiting, and strict file validation.
┌──────────────┐ ┌──────────────┐ ┌─────────┐
│ Frontend │────▶│ Backend │────▶│ Redis │
│ Next.js 14 │ │ FastAPI │ │ State │
│ shadcn/ui │ │ Uvicorn │ │ Store │
└──────────────┘ └──────┬───────┘ └─────────┘
│
┌──────▼───────┐
│ HuggingFace │
│ Inference API│
└──────────────┘
| Layer | Technology Stack |
|---|---|
| Frontend | Next.js 14 (App Router), TypeScript, Tailwind CSS, shadcn/ui |
| Backend | FastAPI, Uvicorn, httpx (async), PyMuPDF, ebooklib |
| State | Redis (for job tracking and chunk translation storage) |
| Model | google/translategemma-4b-it via HuggingFace Inference API |
- Docker and Docker Compose (for containerized setup)
- Python 3.11+ (for local backend development)
- Node.js 20+ and npm (for local frontend development)
- A HuggingFace API key with Inference API access.
The fastest way to get LinguaForge up and running is via Docker Compose.
# 1. Clone the repository
git clone https://github.com/iamEtornam/LinguaForge.git
cd LinguaForge
# 2. Create environment files from templates
cp .env.example .env
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env
# 3. Generate an API key and JWT secret
cd backend
python3 -m app.utils.api_keys # Copy the generated key
cd ..
# 4. Edit .env files with your values
# - Set HUGGINGFACE_API_KEY to your HuggingFace token
# - Set API_KEYS to the key generated in step 3
# - Set JWT_SECRET_KEY (generate with: openssl rand -hex 32)
# - Set NEXT_PUBLIC_API_KEY in frontend/.env to the same API key
# 5. Build and start all services
docker compose up --build
# 6. Access the application
# Frontend: http://localhost:3000
# API docs: http://localhost:8000/docs
# Health: http://localhost:8000/healthTo stop the services and clean up:
docker compose down # Stop containers
docker compose down -v # Stop and remove volumes (clears Redis data)Use this setup when you want hot-reloading and faster iteration during development.
Redis is required for job state tracking. Run it via Docker:
docker run -d --name linguaforge-redis -p 6379:6379 redis:7-alpine(Alternatively, install it natively via your package manager, e.g., brew install redis on macOS).
cd backend
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Set up environment
cp .env.example .env
# Edit .env:
# HUGGINGFACE_API_KEY=hf_your_key_here
# API_KEYS=lf_your_generated_key
# JWT_SECRET_KEY=your_jwt_secret (generate: openssl rand -hex 32)
# REDIS_URL=redis://localhost:6379/0
# Generate an API key (optional helper)
python -m app.utils.api_keys
# Start the backend with hot-reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The API will be available at http://localhost:8000. Interactive docs are at http://localhost:8000/docs.
cd frontend
# Install dependencies
npm install
# Set up environment
cp .env.example .env
# Edit .env:
# NEXT_PUBLIC_API_URL=http://localhost:8000/api
# NEXT_PUBLIC_API_KEY=lf_your_api_key (same key as backend API_KEYS)
# Start the dev server
npm run devThe application will be available at http://localhost:3000.
| Variable | Required | Default | Description |
|---|---|---|---|
HUGGINGFACE_API_KEY |
Yes | — | HuggingFace API token for translation model |
API_KEYS |
Yes | — | Comma-separated API keys for client auth |
JWT_SECRET_KEY |
Yes | — | Secret for signing JWT tokens |
REDIS_URL |
No | redis://redis:6379/0 |
Redis connection string |
CORS_ORIGINS |
No | http://localhost:3000 |
Allowed CORS origins (comma-separated) |
FILE_SIZE_LIMIT_MB |
No | 10 |
Max upload file size in MB |
MAX_CHUNK_SIZE |
No | 500 |
Max tokens per translation chunk |
MAX_CONCURRENT_REQUESTS |
No | 5 |
Max parallel HuggingFace API calls |
RATE_LIMIT_PER_MINUTE |
No | 5 |
Rate limit per IP per minute |
RATE_LIMIT_PER_HOUR |
No | 50 |
Rate limit per IP per hour |
RATE_LIMIT_PER_DAY |
No | 200 |
Rate limit per IP per day |
MAX_EXTRACTED_TEXT_MB |
No | 50 |
Max extracted text size in MB |
| Variable | Required | Default | Description |
|---|---|---|---|
NEXT_PUBLIC_API_URL |
No | http://localhost:8000/api |
Backend API base URL |
NEXT_PUBLIC_API_KEY |
Yes | — | API key for backend auth |
All endpoints are prefixed with /api. Authentication is required via the X-API-Key header or a JWT Bearer token.
| Method | Endpoint | Auth | Description |
|---|---|---|---|
GET |
/health |
No | Health check |
POST |
/api/auth/token |
API Key | Exchange API key for JWT token |
POST |
/api/upload |
Yes | Upload PDF/EPUB file (max 10MB) |
POST |
/api/translate/{job_id} |
Yes | Start translation for a job |
GET |
/api/status/{job_id} |
Yes | Get translation progress |
GET |
/api/download/{job_id} |
Yes | Download translated file |
API_KEY="lf_your_key_here"
# 1. Get a JWT token
TOKEN=$(curl -s -X POST http://localhost:8000/api/auth/token \
-H "X-API-Key: $API_KEY" | python3 -c "import sys,json; print(json.load(sys.stdin)['access_token'])")
# 2. Upload a file
JOB_ID=$(curl -s -X POST http://localhost:8000/api/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@mybook.pdf" | python3 -c "import sys,json; print(json.load(sys.stdin)['job_id'])")
echo "Job ID: $JOB_ID"
# 3. Start translation (English to French, fluent tone)
curl -X POST "http://localhost:8000/api/translate/$JOB_ID" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"source_lang": "en", "target_lang": "fr", "tone": "fluent"}'
# 4. Poll status until complete
curl -s "http://localhost:8000/api/status/$JOB_ID" \
-H "Authorization: Bearer $TOKEN"
# 5. Download the translated file
curl -O -J "http://localhost:8000/api/download/$JOB_ID" \
-H "Authorization: Bearer $TOKEN"- Upload: File is validated (type, size, MIME check) and stored temporarily. A background job is initialized in Redis.
- Extract: Text is precisely extracted from PDF (PyMuPDF) or EPUB (ebooklib), maintaining internal chapter structures.
- Chunk: Text is intelligently split on paragraph boundaries, strictly respecting the token-per-chunk limit for optimal AI processing.
- Translate: Chunks are sent to the HuggingFace Inference API utilizing configurable concurrency and resilient exponential backoff retries.
- Download: Translated chunks are sequentially reassembled, served as a clean text file, and temporary assets are securely wiped.
- Authentication: Dual-layer API key + JWT token-based access control.
- File Validation: Strict MIME type enforcement and file extension checks prevent malicious uploads.
- Size Limits: Configurable maximum upload payload (default 10MB).
- Rate Limiting: Granular per-IP limits on all endpoints via
slowapi. - CORS: Restricted strictly to configured origins.
- Security Headers: Hardened HTTP headers applied via FastAPI middleware.
- Non-root Containers: Docker images are configured to run as unprivileged users.
- Auto-cleanup: Temporary files are immediately deleted post-download to prevent storage bloat.
Ideal for single-server setups (e.g., DigitalOcean, Hetzner, AWS EC2).
# Clone and configure
git clone https://github.com/iamEtornam/LinguaForge.git
cd LinguaForge
cp .env.example .env
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env
# Fill in all required environment variables
# Build and start in detached mode
docker compose up --build -d
# Verify everything is running
docker compose ps
docker compose logs -f backendNote: Set up a reverse proxy (like Nginx or Caddy) to expose ports
3000and8000securely via HTTPS. Recommended Specs: 2+ vCPU, 2GB+ RAM. Redis needs ~512MB.
- Set strong, unique values for
API_KEYSandJWT_SECRET_KEY - Configure
CORS_ORIGINSto your production domain(s) - Deploy behind a reverse proxy (Nginx/Caddy) with SSL/TLS
- Restrict firewall rules to expose only ports 80/443
- Enable log rotation for Docker containers
- Set up external monitoring/alerting (pinging
/health)
Deploy services independently for enhanced scalability.
Backend (Railway / Render / Fly.io)
- Create a web service pointing to the
backend/directory. - Build command:
pip install -r requirements.txt - Start command:
uvicorn app.main:app --host 0.0.0.0 --port 8000 - Inject all backend environment variables.
- Provision a Redis add-on and map its URL to
REDIS_URL.
Frontend (Vercel / Netlify)
- Import the repository and set the root directory to
frontend/. - Framework preset: Next.js.
- Inject environment variables:
NEXT_PUBLIC_API_URL=<your-deployed-backend-url>/apiNEXT_PUBLIC_API_KEY=<your-api-key>
- Deploy.
Redis Use a managed provider like Upstash, Redis Cloud, or a platform add-on.
linguaForge/
├── backend/ # Python FastAPI backend
│ ├── app/
│ │ ├── main.py # App entry point & lifespan
│ │ ├── routes.py # API endpoint handlers
│ │ ├── config.py # Settings (pydantic-settings)
│ │ ├── auth.py # JWT & API key authentication
│ │ ├── middleware.py # Rate limiting & security headers
│ │ ├── translator.py # HuggingFace translation service
│ │ ├── parsers.py # PDF & EPUB text extraction
│ │ ├── chunker.py # Text chunking logic
│ │ ├── redis_client.py # Redis state management
│ │ ├── models.py # Pydantic request/response models
│ │ ├── logger.py # Logging configuration
│ │ └── utils/
│ │ └── api_keys.py # API key generation utility
│ ├── verify_security.py # Security test suite
│ ├── requirements.txt
│ ├── Dockerfile
│ └── .env.example
├── frontend/ # Next.js TypeScript frontend
│ ├── src/
│ │ ├── app/ # Next.js App Router (pages & layout)
│ │ ├── components/ # React components + shadcn/ui
│ │ ├── hooks/ # Custom React hooks
│ │ └── lib/ # API client, auth, constants
│ ├── package.json
│ ├── Dockerfile
│ └── .env.example
├── docker-compose.yml
├── .env.example
├── .gitignore
├── LICENSE
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
└── README.md
Contributions are highly appreciated! Please review our Contributing Guide before submitting a pull request.
This project is licensed under the MIT License.
