Skip to content
View purva-kekan's full-sized avatar

Block or report purva-kekan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
purva-kekan/README.md

Purva Prakash Kekan

AI Analyst | Building LLM-Powered Systems | MS Analytics @ Northeastern (3.93 GPA)

Portfolio LinkedIn Email


👋 About Me

AI Analyst specializing in LLMs, RAG systems, and production ML infrastructure. Currently building CAVO at IpserLab—an AI-powered travel platform leveraging retrieval-augmented generation and recommendation systems.

I design and deploy end-to-end AI/ML solutions: from ETL pipelines and feature engineering to model deployment with Docker, Kubernetes, and AWS. Previous experience includes data engineering at HDFC Bank (India's largest private bank, 500K+ records) and advanced analytics at Northeastern University (3.93 GPA).

Seeking: AI Engineer | ML Engineer | Data Scientist roles focused on LLMs, MLOps, and production AI systems

📍 Boston, MA | Open to Relocation Across US


🛠️ Technical Stack

AI/ML & LLMs

Python Scikit-learn XGBoost TensorFlow Serving TorchServe SARIMA Recommendation Systems LLMs RAG LangChain LlamaIndex LangGraph Haystack Hugging Face Transformers OpenAI API GPT-4V

Data Engineering & Databases

ETL Pipelines Data Modeling dbt Apache Spark Ray Kafka Celery Alembic NGINX PostgreSQL Redis Feast FAISS Chroma SQL Pandas NumPy

MLOps & Infrastructure

MLflow Weights & Biases LangSmith DVC Phoenix (Arize AI) AWS (S3, SageMaker) Databricks Docker Kubernetes Terraform GitHub Actions CI/CD Prometheus Grafana FastAPI

Business Intelligence & Visualization

Power BI Tableau Looker Studio Streamlit Excel DAX Matplotlib Seaborn PyVis NetworkX ArcGIS


🚀 Featured Projects

RAG-based semantic mapping app with interactive data visualization for 50+ concepts from James Clear's "Atomic Habits." Built using LangChain, LlamaIndex, and LangGraph for advanced NLP processing, with FAISS and Chroma for vector similarity search. Features interactive PyVis concept network graphs.

Tech Stack: LangChain, LlamaIndex, LangGraph, Hugging Face Transformers, FAISS, Chroma, Streamlit, PyVis, NetworkX
Live Demo

Scalable ML and data pipeline platform for real-time predictions on 200K+ CDC respiratory records. Built production-grade infrastructure with FastAPI for high-performance API endpoints, Redis for caching, and Celery for asynchronous task processing. Deployed on AWS S3 with comprehensive data preprocessing and ML models using Scikit-learn.

Tech Stack: Scikit-learn, Pandas, NumPy, FastAPI, NGINX, Redis, AWS S3, Celery, Alembic

ML forecasting platform analyzing 3M+ Boston 311 service request records, achieving 15% demand reduction post-policy implementation. Built comprehensive geospatial analytics using ArcGIS to map opioid-related incidents. Developed SARIMA time-series forecasting models and interactive Power BI dashboards with advanced DAX measures.

Tech Stack: Python, Pandas, NumPy, Matplotlib, Seaborn, Power BI, SARIMA, DAX, PostgreSQL, ArcGIS


💼 Professional Experience

AI Analyst | IpserLab | May 2025 – Present

Building CAVO, an AI-powered travel-planning platform leveraging LLMs, RAG, and recommendation systems

  • Engineered end-to-end Python and SQL ETL pipelines using LangChain, RAG, and GitHub Actions CI/CD, cutting failed pipeline runs by ~30% and reducing iteration time by ~25%
  • Designed reusable ML feature datasets with Feast and FAISS vector similarity search to support real-time itinerary and travel recommendation experiments
  • Built Python data validation and anomaly detection layer for RAG pipeline with Prometheus monitoring and hybrid search (BM25 + vector similarity), reducing low-quality outputs by ~30%
  • Developed internal Streamlit and Power BI dashboards wired to Weights & Biases, LangSmith, and OpenAI API/GPT-4V to debug conversation flows, cutting model triage time by ~30%
  • Prototyped containerized ML services with Docker and Kubernetes on AWS and Databricks, reducing 40% setup effort for new experiments

Data Engineer | HDFC Bank Ltd | Aug 2022 – Aug 2023

India's largest private sector bank with $21B+ annual revenue, 173,000+ employees

  • Built SQL-based ETL transformations and data models for credit risk analytics on 500K+ financial records using AWS SageMaker Feature Store, Tecton, and BigQuery, reducing lineage rework by 40%
  • Automated batch and incremental data pipelines with Python and dbt, using DVC and Terraform to enforce data governance, reducing prediction conflicts by 20% in regulatory reporting
  • Increased processing efficiency by 35% with Apache Spark for distributed computing and Ray for parallel processing; designed Kafka streaming pipelines for real-time ingestion, improving operational efficiency by 45%
  • Established production-grade monitoring for ML services using TensorFlow Serving, TorchServe, and Grafana with Phoenix (Arize AI) integration
  • Implemented MLflow-based experiment tracking and model versioning integrated with Haystack NLP pipelines and Weights & Biases

🎓 Education

Master of Science in Analytics | Northeastern University | Sept 2023 – May 2025
GPA: 3.93/4.0 | CPS Scholars and Leader Award

Bachelor of Science in Information Technology | Mumbai University | Sept 2020 – May 2023
GPA: 3.70/4.0


📫 Let's Connect

Open to: AI Engineer | ML Engineer | Data Scientist roles (Full-time)
Location: Boston, MA | Open to Relocation Across US

View Portfolio

Transforming complex data challenges into scalable AI solutions that drive real-world impact

Pinned Loading

  1. healthcare-analytics-geospatial-analysis healthcare-analytics-geospatial-analysis Public

    To understand the Opioid Rise in the city of Boston, the given project works on a decade of dataset obtained from Boston 311 Service Requests data to understand the trends and patterns in the syrin…

  2. atomichabits-nlp-project atomichabits-nlp-project Public

    A fun Streamlit app that digs into James Clear’s Atomic Habits using NLP. Chapter-wise summaries + sentiment scores, top keywords and word clouds across the book, concept network showing connection…

    Python

  3. enterprise-risk-management-reporting enterprise-risk-management-reporting Public

    The given project is based on enterprise risk management analysis and its importance in any business organization. This project consists of a report generation for ERM of Oracle - Cerner Acquisitio…

  4. respiratory-mortality-system respiratory-mortality-system Public

    A production-ready machine learning platform for analyzing and predicting respiratory-related mortality trends in the United States. This system processes CDC WONDER database records spanning 1999-…

    Python