Search, organize, and manage your local files through WhatsApp.
Pinpoint indexes your documents, PDFs, spreadsheets, images, and media into a local SQLite database, then lets you search and work with them through natural language — either via WhatsApp or direct API calls.
I kept running into the same problems:
- Traveling and need my passport scan? It's somewhere on my laptop. I shouldn't have to dig through folders from my phone. I want to ask "find my passport" on WhatsApp and get it instantly.
- Important receipts and documents get buried in chats. Someone sends a file on WhatsApp, and a week later it's gone in the scroll. I wanted to forward it once, say "save this to Tax 2025," and know it would stay organized and searchable.
- Small business work often starts in WhatsApp. Orders, invoices, screenshots, spreadsheet photos. I wanted to turn those into something usable without manually opening every file and retyping data.
- Photos are messy and time-consuming. Hundreds of wedding, trip, or family photos need sorting, grouping, and sometimes face recognition. I wanted that to be fast instead of a weekend chore.
- I forget useful details. Phone numbers, due dates, client info, things buried in files or chat history. I wanted a system that could remember them and bring them back later.
If your files are scattered across your computer and you live on WhatsApp, Pinpoint puts them together.
- "Find the Sharma invoice from last month" — searches across all your indexed documents instantly
- "What's in that Excel in Downloads?" — reads and analyzes spreadsheets, CSVs, PDFs on the fly
- "Find everyone over 30 in the contacts spreadsheet" — search, filter, group, sort within Excel/CSV files
- "Create an Excel with these expense totals" — generates spreadsheets, text files, charts from conversation
- Send a purchase order image, ask "turn this into Excel" — extracts tables from images/PDFs into spreadsheets
- "Merge these 3 PDFs into one" — merge, split PDFs, convert images to PDF and back
- "Move all receipts to the Tax folder" — batch file operations through conversation
- "Group my wedding photos by category" — AI classifies photos into groups and sorts them into folders
- "Cull my camera roll — keep the best 80%" — AI scores every photo for quality and separates rejects
- "Who is this person?" — remembers faces, recognizes them across photos later
- "OCR this scanned document" — extracts text from images and scanned PDFs
- "Send me that PDF" — sends files directly to your WhatsApp
- Send a photo/file to the bot — saves it to your PC, renames it, organizes into folders you choose. Important WhatsApp images and documents never get lost
- "Create a folder called Tax 2025 and move all receipts there" — creates folders and organizes files through conversation
- "Find Sharma's phone number from that Excel" — searches inside spreadsheets with smart phone/ID normalization (finds "920-889-6630" when you type "9208896630")
- "Where's that chart I made yesterday?" — searches files Pinpoint itself created in past conversations
- "Send an email to john@company.com with the Q1 report" — Gmail send, Calendar create, Drive upload (requires gws CLI + Google auth)
- "Remind me to call the dentist at 3pm" — persistent reminders that survive restarts and reconnects
- "Remember that my car insurance expires in March" — persistent memory across conversations
- "Watch my Documents folder" — auto-indexes new files as they appear
- Voice messages — transcribes and responds to audio messages
Your Files ──> Indexer ──> SQLite/FTS5 ──> Search API ──> WhatsApp Bot
(local) (extract) (local DB) (FastAPI) (Gemini AI)
Indexing pipeline
- Text extraction: PDF, DOCX, PPTX, XLSX, CSV, EPUB, images (OCR), plain text
- Chunking: Chonkie RecursiveChunker (2500-char chunks for section-level search)
- Fact extraction: Gemini extracts key facts (names, dates, amounts) at index time, stored separately
- Embeddings: Gemini Embedding 2 (768-dim) for chunk-level semantic search
Document search pipeline
- FTS5 full-text search with BM25 scoring (porter stemming, unicode61 tokenizer)
- Three-tier lexical fallback: strict → relaxed (synonym-aware) → broad (OR)
- Metadata-aware ranking: boosts matches in filename, title, path, and identifier-like terms
- Coverage scoring: penalizes results that match only one of several query concepts
- Ambiguity detection: clustered near-tie results trigger clarification instead of a wrong guess
- Strong signal shortcut: skips expensive stages when the top result is clearly dominant
- Search transparency: each result explains why it matched (title, path, chunk, identifier)
- Progressive disclosure: document overview before full text (saves tokens)
- Feedback loop: logs which searches helped or escalated, for future ranking improvements
- Available but not default: Gemini query expansion, embedding cosine similarity, LLM reranker
Visual search
- Image search: Gemini Embedding 2 text-to-image similarity across photo folders
- Video search: embed sampled frames, find matching moments by description
- Photo grouping: embed images + category names, classify by cosine similarity (Gemini Embedding 2)
- Photo culling: Gemini Flash vision scores each photo for technical + aesthetic quality
- Face recognition: InsightFace detection with persistent face memory (
known_facestable)
Bot intelligence
- 82 Gemini tool declarations, intent-grouped per message (subset loaded based on detected intent)
- 23 skill files loaded by detected intent, not all at once
- Action ledger: tracks what the bot actually did vs claimed (prevents hallucination)
- Cost circuit breaker: $0.10 per-message budget, hard stop
Core indexing and search run locally on your machine. Optional features (WhatsApp bot, Gemini AI, Google Workspace, web search) send data to external services when used.
Important: Search only finds files that have been indexed. Files get indexed when you:
- Explicitly index a file or folder (
/index-file,/index) - Watch a folder — new files are picked up every 60 minutes
- Read or analyze a file — auto-indexes in the background for future searches
Pinpoint does not scan your entire computer automatically. You control what gets indexed.
Every interaction builds a local cache that makes future operations faster and cheaper:
- Documents — text, chunks, and embeddings stored after first index. Re-search is instant, no re-extraction.
- Images — embeddings cached after first search or group. Next time you search or group the same folder, cached images are free.
- Videos — frame embeddings stored per video. Searching the same video again costs nothing.
- Photo scores — culling scores cached by file mtime. Re-running cull on the same folder skips already-scored photos.
- Photo classifications — grouping results cached. Re-grouping reuses existing classifications.
- Faces — detected face data cached per image. Recognition on already-scanned photos is instant.
- Facts — extracted key facts stored per document. Fact search never re-extracts.
- Search queries — query expansion and reranking results cached. Repeated searches are free.
If you cancel a long job halfway (like embedding 1000 photos), the work already done is saved. Next run picks up where it left off.
Pinpoint has a 4-layer memory system that learns from everyday use:
Conversation memory — Keeps the last 50 messages per session. In the bot flow, long conversations are compacted instead of simply truncated, so important outcomes can survive even when older turns are compressed. Idle chats reset after 60 minutes.
Persistent personal memory — "Remember that my passport number is X12345." Stored permanently in SQLite, searchable with FTS5, and survives restarts. When you save a new fact, Gemini can decide to add it, update an existing memory, merge complementary details, ignore duplicates, or supersede a contradiction with an audit trail. You can also forget by description — "forget my old address" — without needing an internal ID.
Document fact extraction — When a file is indexed, Gemini extracts key facts such as names, dates, amounts, and topics, then stores them separately from the raw document text. You can search facts directly without reopening the full file.
Face memory — "Remember this is John." Saves face embeddings persistently so future face detection runs can recognize the same person across photos.
pip install pinpoint-search
npm install -g pinpoint-bot
pinpoint setup
pinpoint startThat's it. pinpoint setup asks for your Gemini API key and writes config to ~/.pinpoint/.env. pinpoint start launches the API and WhatsApp bot — scan the QR code to pair.
Requirements: Python 3.11+, Node.js 20+
pinpoint doctorpip install pinpoint-search
pinpoint setup
pinpoint apiThe API runs at http://localhost:5123. Interactive docs at http://localhost:5123/docs.
git clone https://github.com/vijishmadhavan/pinpoint
cd pinpoint
conda env create -f environment.yml
conda activate pinpoint
cd bot && npm install && cd ..
./start.shpip install pinpoint-search[ocr] # Tesseract OCR (Gemini handles OCR without it)
pip install pinpoint-search[faces] # Face recognition (CPU)
pip install pinpoint-search[faces-gpu] # Face recognition (GPU)
pip install pinpoint-search[all] # Everything| Ask this | Pinpoint does this |
|---|---|
| "Find invoice 4821" | Searches indexed documents by content and filename |
| "Read the quarterly report PDF" | Extracts and returns the text |
| "Search for Sharma across all files" | Full-text search with ranked results |
| "Analyze the sales spreadsheet" | Loads Excel/CSV into pandas, runs queries |
| "Filter rows where amount > 5000" | Search, filter, group, sort within spreadsheets |
| "Create an Excel summary of Q1 expenses" | Generates new spreadsheets from conversation |
| (send purchase order image) "make this an Excel" | Extracts tables from images/PDFs into spreadsheets |
| "Merge invoice_1.pdf and invoice_2.pdf" | Merge, split PDFs |
| "Convert these images to a single PDF" | Images to PDF, PDF to images |
| "Make a bar chart of sales by month" | Generates charts from data |
| "Move old files to archive" | Batch file operations |
| "Watch my Downloads folder" | Auto-indexes new files every 60 minutes |
| "Find photos of the beach" | Visual image search across your photos |
| "OCR this scanned receipt" | Extracts text from images/scanned PDFs |
| "Group wedding photos by category" | AI classifies and sorts photos into folders |
| "Cull my camera roll, keep best 80%" | AI scores photos, separates rejects into a folder |
| "Who is this person?" | Face detection + recognition across photos |
| "Send me that report" | Sends the file to your WhatsApp chat |
| (send a photo/file to bot) | Saves to PC, renames, puts in your chosen folder |
| "Make a folder called Invoices 2025" | Creates folders on your computer |
| "Find Sharma's number from contacts.xlsx" | Searches inside spreadsheets, normalizes phone/ID formats |
| "Where's that chart I made yesterday?" | Searches files Pinpoint created in past conversations |
| "Email john@company.com the Q1 report" | Gmail send with attachment (needs gws CLI setup) |
| "Remind me at 5pm to call bank" | Persistent reminders — survive restarts |
| "Remember my passport number is X" | Stores in persistent memory |
| (voice message) | Transcribes audio and responds |
Stable core — works without any API keys:
- Document search (FTS5)
- File read/list/move/rename/delete
- Auto-indexing on file access
- Watch folders
- Background job tracking
- Data analysis (Excel, CSV)
Optional — needs Gemini API key or extra setup:
- WhatsApp bot (needs WhatsApp pairing + either Gemini or Ollama)
- OCR, captioning, fact extraction (Gemini-powered)
- Photo culling/scoring (Gemini Flash — vision judges quality, needs to "see" each photo)
- Photo grouping by category (Gemini Embedding 2 — cheap, classifies by similarity not vision)
- Visual image/video search (Gemini Embedding 2 — text-to-image similarity)
- Face recognition (needs insightface — GPU optional, CPU works)
- Google Workspace — Gmail, Calendar, Drive (needs gws CLI:
npm install -g @googleworkspace/cli && gws auth login) - Web search (needs Jina or LangSearch API key)
Note: Gemini Embedding 2 is used for image/video/photo features. Document text search uses FTS5 by default — embedding-based document search exists but is not the default path.
pinpoint/
run_api.py # Backend entrypoint (port 5123)
api/ # FastAPI routers
core.py # health, indexing
search.py # document search, facts, web read
files.py # file ops, watch folders, background jobs
data.py # Excel/CSV analysis
media.py # image/video/audio search, OCR
photos.py # photo scoring, culling, grouping
faces.py # face detection and recognition
transform.py # file/image/PDF transforms
memory.py # conversation memory
google.py # Google Workspace integration
search_pipeline.py # Search: FTS5, ranking, ambiguity detection
indexing_service.py # Shared index/chunk/embed pipeline
job_service.py # Persistent background job lifecycle
database.py # SQLite schema and helpers
extractors.py # Text extraction (PDF, Office, images, OCR)
bot/
index.js # WhatsApp bot entrypoint
src/tools.js # Gemini tool declarations
src/llm.js # LLM loop (Gemini / Ollama)
src/skills.js # Skill system for tool routing
For deeper details: docs/architecture.md
The main product surface is intentionally small:
api/— FastAPI routers and API-facing behaviorpinpoint/— Python package and CLI entry pointsbot/— WhatsApp bot packageskills/— skill markdowns shipped with the productbenchmarks/— search evaluation datasets, reports, and benchmark scriptstests/— regression and packaging coveragedocs/— product docs, troubleshooting, and release notes
Internal planning notes and downloaded comparison repos are kept out of the GitHub-facing product surface.
Pinpoint stores shared CLI/bot config in ~/.pinpoint/.env. pinpoint setup writes that file for you.
Key variables:
| Variable | Required? | What it does |
|---|---|---|
GEMINI_API_KEY |
For AI features | Enables bot, OCR, media, photo workflows |
API_SECRET |
No | If set, all API requests need X-API-Secret header |
GEMINI_MODEL |
No | Defaults to gemini-3.1-flash-lite-preview |
OLLAMA_MODEL |
No | Use local Ollama instead of Gemini for bot |
JINA_API_KEY |
No | Enables web search via Jina |
conda run -n pinpoint python -m pytest tests/ -qThe suite covers search, indexing, file operations, jobs, packaging, security, and API contracts.
Useful commands:
pinpoint setup
pinpoint doctor
pinpoint api
pinpoint start
pinpoint search "invoice 4821"
pinpoint index /path/to/folder
pinpoint status
pinpoint logsPinpoint includes offline search evaluation and load testing:
# Search quality
python evaluate_search.py --dataset benchmarks/search_relevance_v4_mixed.json --corpus benchmarks/corpus_v4_mixed
# Concurrent load
python load_test_search.py --corpus benchmarks/corpus_v4_mixed --rounds 10 --concurrency 8Current results on the mixed-domain offline benchmark: 94.4% success@1, perfect recall, ~4.3ms average query latency. In a concurrent load test (concurrency=2, rounds=1): ~216 QPS, ~9.2ms average wall latency.
See benchmarks/README.md for details.
- Architecture — how the system is built
- Benchmark Summary — what Pinpoint has measured so far
- Stability Policy — what Pinpoint treats as stable, optional, or experimental
- Troubleshooting — common issues and fixes
- Release Checklist — what to check before pushing
- Contributing — how to contribute
Pinpoint is a work in progress. I built it for my own daily use and it works well for my workflow, but it is not yet polished for general audiences. Expect rough edges, occasional slow responses on large files, and setup that still requires some technical comfort. Issues, feedback, and contributions are welcome.
Pinpoint learned heavily from these open-source projects:
- OpenClaw — WhatsApp bot patterns, tool calling, skill system
- QMD — search pipeline, RRF fusion, BM25 scoring
- Mem0 — memory dedup, conflict detection, LLM-powered merge
- Supermemory — memory priority, fact extraction patterns
- Khoj — RAG patterns, search architecture
- Gemini CLI — tool calling efficiency, loop detection, compaction
- Chonkie — document chunking
- Claude Code — context management, compaction prompts
