AI Analyst | Building LLM-Powered Systems | MS Analytics @ Northeastern (3.93 GPA)
AI Analyst specializing in LLMs, RAG systems, and production ML infrastructure. Currently building CAVO at IpserLab—an AI-powered travel platform leveraging retrieval-augmented generation and recommendation systems.
I design and deploy end-to-end AI/ML solutions: from ETL pipelines and feature engineering to model deployment with Docker, Kubernetes, and AWS. Previous experience includes data engineering at HDFC Bank (India's largest private bank, 500K+ records) and advanced analytics at Northeastern University (3.93 GPA).
Seeking: AI Engineer | ML Engineer | Data Scientist roles focused on LLMs, MLOps, and production AI systems
📍 Boston, MA | Open to Relocation Across US
Python Scikit-learn XGBoost TensorFlow Serving TorchServe SARIMA Recommendation Systems LLMs RAG LangChain LlamaIndex LangGraph Haystack Hugging Face Transformers OpenAI API GPT-4V
ETL Pipelines Data Modeling dbt Apache Spark Ray Kafka Celery Alembic NGINX PostgreSQL Redis Feast FAISS Chroma SQL Pandas NumPy
MLflow Weights & Biases LangSmith DVC Phoenix (Arize AI) AWS (S3, SageMaker) Databricks Docker Kubernetes Terraform GitHub Actions CI/CD Prometheus Grafana FastAPI
Power BI Tableau Looker Studio Streamlit Excel DAX Matplotlib Seaborn PyVis NetworkX ArcGIS
RAG-based semantic mapping app with interactive data visualization for 50+ concepts from James Clear's "Atomic Habits." Built using LangChain, LlamaIndex, and LangGraph for advanced NLP processing, with FAISS and Chroma for vector similarity search. Features interactive PyVis concept network graphs.
Tech Stack: LangChain, LlamaIndex, LangGraph, Hugging Face Transformers, FAISS, Chroma, Streamlit, PyVis, NetworkX
Live Demo
Scalable ML and data pipeline platform for real-time predictions on 200K+ CDC respiratory records. Built production-grade infrastructure with FastAPI for high-performance API endpoints, Redis for caching, and Celery for asynchronous task processing. Deployed on AWS S3 with comprehensive data preprocessing and ML models using Scikit-learn.
Tech Stack: Scikit-learn, Pandas, NumPy, FastAPI, NGINX, Redis, AWS S3, Celery, Alembic
ML forecasting platform analyzing 3M+ Boston 311 service request records, achieving 15% demand reduction post-policy implementation. Built comprehensive geospatial analytics using ArcGIS to map opioid-related incidents. Developed SARIMA time-series forecasting models and interactive Power BI dashboards with advanced DAX measures.
Tech Stack: Python, Pandas, NumPy, Matplotlib, Seaborn, Power BI, SARIMA, DAX, PostgreSQL, ArcGIS
Building CAVO, an AI-powered travel-planning platform leveraging LLMs, RAG, and recommendation systems
- Engineered end-to-end Python and SQL ETL pipelines using LangChain, RAG, and GitHub Actions CI/CD, cutting failed pipeline runs by ~30% and reducing iteration time by ~25%
- Designed reusable ML feature datasets with Feast and FAISS vector similarity search to support real-time itinerary and travel recommendation experiments
- Built Python data validation and anomaly detection layer for RAG pipeline with Prometheus monitoring and hybrid search (BM25 + vector similarity), reducing low-quality outputs by ~30%
- Developed internal Streamlit and Power BI dashboards wired to Weights & Biases, LangSmith, and OpenAI API/GPT-4V to debug conversation flows, cutting model triage time by ~30%
- Prototyped containerized ML services with Docker and Kubernetes on AWS and Databricks, reducing 40% setup effort for new experiments
India's largest private sector bank with $21B+ annual revenue, 173,000+ employees
- Built SQL-based ETL transformations and data models for credit risk analytics on 500K+ financial records using AWS SageMaker Feature Store, Tecton, and BigQuery, reducing lineage rework by 40%
- Automated batch and incremental data pipelines with Python and dbt, using DVC and Terraform to enforce data governance, reducing prediction conflicts by 20% in regulatory reporting
- Increased processing efficiency by 35% with Apache Spark for distributed computing and Ray for parallel processing; designed Kafka streaming pipelines for real-time ingestion, improving operational efficiency by 45%
- Established production-grade monitoring for ML services using TensorFlow Serving, TorchServe, and Grafana with Phoenix (Arize AI) integration
- Implemented MLflow-based experiment tracking and model versioning integrated with Haystack NLP pipelines and Weights & Biases
Master of Science in Analytics | Northeastern University | Sept 2023 – May 2025
GPA: 3.93/4.0 | CPS Scholars and Leader Award
Bachelor of Science in Information Technology | Mumbai University | Sept 2020 – May 2023
GPA: 3.70/4.0
Open to: AI Engineer | ML Engineer | Data Scientist roles (Full-time)
Location: Boston, MA | Open to Relocation Across US
