Production-grade AI bookkeeping system — async demand forecasting, entity extraction, scheduled alerts.
- Overview
- Architecture
- Services
- Quick Start
- API Reference
- Configuration
- Running Tests
- Deployment
- Contributors
Book-Keeping AI automates two core operations for small and medium businesses:
- Transaction Entity Extraction — parse natural-language transaction descriptions (e.g., "John Doe bought 2 apples for $5") into structured records (customer, item, quantity, price).
- Demand Forecasting — upload historical sales data and receive an AI-generated 6-month demand forecast with low-stock alerts, delivered asynchronously via a Celery task queue.
| Area | Before | After |
|---|---|---|
| Forecast execution | Synchronous — request blocks until SARIMA+LSTM finishes (minutes) | Async Celery task; API returns 202 Accepted + task_id immediately |
| Alerts | All fire at once on-demand | Celery Beat scheduler runs daily at 08:00 UTC; per-item email config |
| Global state | data_file_path global variable (thread-unsafe) |
File-ID system, each upload gets a unique hash-based ID |
| Input validation | data.get(...) with no schema |
marshmallow schemas on every endpoint, typed error messages |
| Rate limiting | None | flask-limiter backed by Redis |
| Auth | None | X-API-Key header middleware (swap for JWT in production) |
| Entity detection | Two separate v1/v2 apps | Single unified service, backend param selects LLM / NLP / auto |
| Tests | None | pytest suite covering upload, forecast, alert config, entity extraction |
| Logging | print / basic Flask logger |
Structured %(asctime)s [%(levelname)s] to stdout (JSON-ready) |
| Model selection | Fixed 70/30 SARIMA/LSTM weights | Dynamic weighting via hold-out MAPE — better model gets more influence |
┌─────────────────────────────────────────────────────────────────┐
│ Client / Frontend │
└───────────────────────────┬─────────────────────────────────────┘
│ HTTP
┌──────▼──────┐
│ Nginx │ Port 80 / 443
│ (reverse │ TLS termination
│ proxy) │ Rate limiting (L7)
└──┬───────┬──┘
│ │
┌───────────▼─┐ ┌─▼─────────────┐
│ Forecast API│ │ Entity API │
│ Flask:5000 │ │ Flask:5001 │
└──────┬──────┘ └───────────────┘
│ enqueue
┌──────▼──────┐
│ Redis │ Broker + Result backend
│ :6379 │ Rate-limit storage
└──┬───────┬──┘
│ │
┌─────────▼─┐ ┌─▼──────────┐
│ Celery │ │ Celery │
│ Worker │ │ Beat │
│ (forecast │ │ (scheduler │
│ jobs) │ │ 08:00 UTC)│
└───────────┘ └────────────┘
Client
│
├─ POST /api/v1/upload (multipart CSV/XLSX)
│ └─ returns { file_id }
│
├─ POST /api/v1/forecast { item_id, file_id, horizon_months }
│ └─ enqueues Celery task → returns { task_id, poll_url }
│
└─ GET /api/v1/tasks/{task_id} (poll until status == "success")
└─ returns { future_months[], predicted_demand[], alerts[], plot_url }
Celery Beat (08:00 UTC daily)
└─ check_all_alerts task
├─ loads alert_configs.json
├─ finds most recent uploaded dataset
├─ runs predict_demand for each configured item
└─ for each item where demand > stock + threshold:
├─ logs [ALERT] warning
└─ (stub) sends email via provider integration
| Service | Port | Purpose |
|---|---|---|
forecast-api |
5000 | Demand forecasting REST API |
entity-api |
5001 | Entity extraction REST API |
forecast-worker |
— | Celery worker (forecast jobs) |
forecast-beat |
— | Celery Beat (daily alert scheduler) |
redis |
6379 | Broker, result backend, rate-limit store |
nginx |
80/443 | Reverse proxy, TLS |
- Docker ≥ 24 and Docker Compose ≥ 2.20
- A Groq API key for LLM-based entity extraction
git clone https://github.com/GDSC-VIT/book-keeping-ai.git
cd book-keeping-ai
cp .env.example .env
# Fill in API_KEY, SECRET_KEY, and GROQ_API_KEY in .envdocker compose up --build -ddocker compose ps
curl http://localhost:5000/health # {"status":"ok","service":"demand-forecast"}
curl http://localhost:5001/health # {"status":"ok","service":"entity-detection"}All endpoints require the X-API-Key header.
POST /api/v1/upload
Content-Type: multipart/form-data
X-API-Key: <your-key>
file: <CSV or XLSX>Your file must contain these columns:
| Column | Type | Example |
|---|---|---|
transaction_date |
ISO date | 2023-06-15 |
item_id |
string | A001 |
quantity |
number | 42 |
Response 201
{ "message": "File uploaded successfully", "file_id": "a3f9c2e10b4d" }POST /api/v1/forecast
Content-Type: application/json
X-API-Key: <your-key>
{
"item_id": "A001",
"file_id": "a3f9c2e10b4d",
"horizon_months": 6
}Response 202
{
"task_id": "d3b07384-d9a1-...",
"status": "queued",
"poll_url": "/api/v1/tasks/d3b07384-d9a1-..."
}GET /api/v1/tasks/{task_id}
X-API-Key: <your-key>Response 200 (complete)
{
"task_id": "d3b07384-d9a1-...",
"status": "success",
"result": {
"item_id": "A001",
"horizon_months": 6,
"future_months": ["2024-07", "2024-08", "2024-09", "2024-10", "2024-11", "2024-12"],
"predicted_demand": [120.5, 133.2, 128.7, 141.0, 156.3, 149.8],
"alerts": [
{
"level": "warning",
"period": "2024-09",
"message": "Reorder 28 units of item 'A001' by 2024-09 ..."
}
],
"plot_url": "/api/v1/plots/A001",
"generated_at": "2024-06-15T08:00:00Z"
}
}POST /api/v1/alerts
Content-Type: application/json
X-API-Key: <your-key>
{
"item_id": "A001",
"low_stock_threshold": 20.0,
"notify_email": "ops@yourcompany.com"
}Response 201
{ "message": "Alert configuration saved", "item_id": "A001" }Alerts are evaluated daily at 08:00 UTC by the Celery Beat scheduler.
POST /api/v1/extract
Content-Type: application/json
X-API-Key: <your-key>
{
"text": "John Doe bought 2 apples for $5",
"backend": "auto"
}backend options: "auto" (default — LLM, falls back to NLP), "llm", "nlp".
Response 200
{
"CustomerName": "John Doe",
"ItemName": "apples",
"ItemQuantity": "2",
"Price": "5",
"_backend_used": "llm"
}POST /api/v1/extract/entities
Content-Type: application/json
X-API-Key: <your-key>
{ "text": "apples less than 50 rs" }Response 200
{ "object": "apples", "action": "less", "range": "50" }Supported formats:
| Input | Response |
|---|---|
"item less than N" |
{ action: "less", range: "N" } |
"item more than N" |
{ action: "more", range: "N" } |
"item more than N less than M" |
{ action: "range", min: "N", max: "M" } |
All config is via environment variables. Copy .env.example to .env.
| Variable | Default | Description |
|---|---|---|
API_KEY |
changeme |
Shared API key for all endpoints |
SECRET_KEY |
changeme-secret |
Flask session secret |
GROQ_API_KEY |
— | Required for LLM entity extraction |
REDIS_URL |
redis://redis:6379/0 |
Redis connection string |
UPLOAD_FOLDER |
uploads |
Path for uploaded datasets |
MAX_UPLOAD_MB |
32 |
Maximum upload size in MB |
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING / ERROR |
ENV |
production |
development enables hourly beat schedule |
ALLOWED_ORIGINS |
* |
Comma-separated CORS origins |
# Install test deps
pip install pytest
# Run the full test suite
pytest tests/ -v
# Run only entity detection tests
pytest tests/test_entity_detection.py -v- Set
API_KEY,SECRET_KEY, andGROQ_API_KEYto real secrets (use Docker secrets or a vault) - Set
ALLOWED_ORIGINSto your frontend domain - Add TLS certificates to
nginx/certs/and configurenginx/nginx.conf - Set
ENV=productionto disable the development hourly beat schedule - Point
notify_emailalert configs to a real email address and integrate a mail provider (SendGrid, SES, etc.) intasks._emit_alert() - Use a persistent volume for
redis-dataandforecast-uploads - Consider adding a Postgres database to replace the
alert_configs.jsonfile for multi-node deployments
# Run more Celery workers for parallel forecast jobs
docker compose scale forecast-worker=4| Souvik Mahanta |
Made with ❤️ by Souvik Mahanta
