Purva Kekan purva-kekan

Purva Prakash Kekan

AI Analyst | Building LLM-Powered Systems | MS Analytics @ Northeastern (3.93 GPA)

👋 About Me

AI Analyst specializing in LLMs, RAG systems, and production ML infrastructure. Currently building CAVO at IpserLab—an AI-powered travel platform leveraging retrieval-augmented generation and recommendation systems.

I design and deploy end-to-end AI/ML solutions: from ETL pipelines and feature engineering to model deployment with Docker, Kubernetes, and AWS. Previous experience includes data engineering at HDFC Bank (India's largest private bank, 500K+ records) and advanced analytics at Northeastern University (3.93 GPA).

Seeking: AI Engineer | ML Engineer | Data Scientist roles focused on LLMs, MLOps, and production AI systems

📍 Boston, MA | Open to Relocation Across US

🛠️ Technical Stack

AI/ML & LLMs

Python Scikit-learn XGBoost TensorFlow Serving TorchServe SARIMA Recommendation Systems LLMs RAG LangChain LlamaIndex LangGraph Haystack Hugging Face Transformers OpenAI API GPT-4V

Data Engineering & Databases

ETL Pipelines Data Modeling dbt Apache Spark Ray Kafka Celery Alembic NGINX PostgreSQL Redis Feast FAISS Chroma SQL Pandas NumPy

MLOps & Infrastructure

MLflow Weights & Biases LangSmith DVC Phoenix (Arize AI) AWS (S3, SageMaker) Databricks Docker Kubernetes Terraform GitHub Actions CI/CD Prometheus Grafana FastAPI

Business Intelligence & Visualization

Power BI Tableau Looker Studio Streamlit Excel DAX Matplotlib Seaborn PyVis NetworkX ArcGIS

🚀 Featured Projects

🤖 Atomic Habits: Semantic Relationship Mapping

RAG-based semantic mapping app with interactive data visualization for 50+ concepts from James Clear's "Atomic Habits." Built using LangChain, LlamaIndex, and LangGraph for advanced NLP processing, with FAISS and Chroma for vector similarity search. Features interactive PyVis concept network graphs.

Tech Stack: LangChain, LlamaIndex, LangGraph, Hugging Face Transformers, FAISS, Chroma, Streamlit, PyVis, NetworkX
Live Demo

🫁 Respiratory Mortality Analysis System

Scalable ML and data pipeline platform for real-time predictions on 200K+ CDC respiratory records. Built production-grade infrastructure with FastAPI for high-performance API endpoints, Redis for caching, and Celery for asynchronous task processing. Deployed on AWS S3 with comprehensive data preprocessing and ML models using Scikit-learn.

Tech Stack: Scikit-learn, Pandas, NumPy, FastAPI, NGINX, Redis, AWS S3, Celery, Alembic

🏥 Boston 311 Opioid Crisis: Geospatial Analytics & Forecasting

ML forecasting platform analyzing 3M+ Boston 311 service request records, achieving 15% demand reduction post-policy implementation. Built comprehensive geospatial analytics using ArcGIS to map opioid-related incidents. Developed SARIMA time-series forecasting models and interactive Power BI dashboards with advanced DAX measures.

Tech Stack: Python, Pandas, NumPy, Matplotlib, Seaborn, Power BI, SARIMA, DAX, PostgreSQL, ArcGIS

💼 Professional Experience

AI Analyst | IpserLab | May 2025 – Present

Building CAVO, an AI-powered travel-planning platform leveraging LLMs, RAG, and recommendation systems

Engineered end-to-end Python and SQL ETL pipelines using LangChain, RAG, and GitHub Actions CI/CD, cutting failed pipeline runs by ~30% and reducing iteration time by ~25%
Designed reusable ML feature datasets with Feast and FAISS vector similarity search to support real-time itinerary and travel recommendation experiments
Built Python data validation and anomaly detection layer for RAG pipeline with Prometheus monitoring and hybrid search (BM25 + vector similarity), reducing low-quality outputs by ~30%
Developed internal Streamlit and Power BI dashboards wired to Weights & Biases, LangSmith, and OpenAI API/GPT-4V to debug conversation flows, cutting model triage time by ~30%
Prototyped containerized ML services with Docker and Kubernetes on AWS and Databricks, reducing 40% setup effort for new experiments

Data Engineer | HDFC Bank Ltd | Aug 2022 – Aug 2023

India's largest private sector bank with $21B+ annual revenue, 173,000+ employees

Built SQL-based ETL transformations and data models for credit risk analytics on 500K+ financial records using AWS SageMaker Feature Store, Tecton, and BigQuery, reducing lineage rework by 40%
Automated batch and incremental data pipelines with Python and dbt, using DVC and Terraform to enforce data governance, reducing prediction conflicts by 20% in regulatory reporting
Increased processing efficiency by 35% with Apache Spark for distributed computing and Ray for parallel processing; designed Kafka streaming pipelines for real-time ingestion, improving operational efficiency by 45%
Established production-grade monitoring for ML services using TensorFlow Serving, TorchServe, and Grafana with Phoenix (Arize AI) integration
Implemented MLflow-based experiment tracking and model versioning integrated with Haystack NLP pipelines and Weights & Biases

🎓 Education

Master of Science in Analytics | Northeastern University | Sept 2023 – May 2025
GPA: 3.93/4.0 | CPS Scholars and Leader Award

Bachelor of Science in Information Technology | Mumbai University | Sept 2020 – May 2023
GPA: 3.70/4.0

📫 Let's Connect

Open to: AI Engineer | ML Engineer | Data Scientist roles (Full-time)
Location: Boston, MA | Open to Relocation Across US

Transforming complex data challenges into scalable AI solutions that drive real-world impact

Provide feedback

Saved searches

Use saved searches to filter your results more quickly