Skip to content

pronzzz/movie-recommendation-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 Movie Recommendation System

Python 3.9+ License: MIT Tests Code Style: Black

A hybrid movie recommendation engine combining Collaborative Filtering and Content-Based Filtering to deliver personalized, diverse, and explainable recommendations. Built to address real-world challenges including cold-start, popularity bias, and lack of transparency.


πŸ“‹ Table of Contents


✨ Features

Feature Description
Hybrid Architecture Combines SVD-based Collaborative Filtering + TF-IDF Content-Based Filtering
Cold-Start Handling Adaptive blending shifts weight to content features for new users
Diversity Promotion MMR-based re-ranking reduces popularity bias and promotes long-tail items
Explainability Human-readable explanations for every recommendation
Comprehensive Metrics RMSE, Precision@K, NDCG, Coverage, Diversity, Novelty
Multiple Datasets Supports MovieLens 100K, 1M, 10M, and 20M variants

Challenges Addressed

This system tackles key challenges in modern recommender systems:

  • Cold-Start Problem: New users/items lack interaction history β†’ Content-based fallback + preference elicitation
  • Data Sparsity: Most user-item ratings are missing β†’ Matrix factorization handles sparse data efficiently
  • Popularity Bias: Popular items dominate β†’ MMR re-ranking + long-tail promotion
  • Lack of Diversity: Recommendations too similar β†’ Intra-list diversity optimization
  • Explainability: Black-box models lack trust β†’ Genre and similarity-based explanations

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Hybrid Recommender System                     β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚
β”‚  β”‚  Collaborative   β”‚         β”‚  Content-Based   β”‚              β”‚
β”‚  β”‚    Filtering     β”‚         β”‚    Filtering     β”‚              β”‚
β”‚  β”‚                  β”‚         β”‚                  β”‚              β”‚
β”‚  β”‚  SVD Matrix      β”‚         β”‚  TF-IDF Genres   β”‚              β”‚
β”‚  β”‚  Factorization   β”‚         β”‚  + User Profiles β”‚              β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚
β”‚           β”‚                            β”‚                         β”‚
β”‚           β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”‚                         β”‚
β”‚           └────►│   Adaptive   β”‚β—„β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β”‚                 β”‚   Blending   β”‚                                 β”‚
β”‚                 β”‚  (Ξ± weight)  β”‚                                 β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                                 β”‚
β”‚                        β”‚                                         β”‚
β”‚                 β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”                                 β”‚
β”‚                 β”‚   Diversity  β”‚                                 β”‚
β”‚                 β”‚   Reranker   β”‚                                 β”‚
β”‚                 β”‚    (MMR)     β”‚                                 β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                                 β”‚
β”‚                        β”‚                                         β”‚
β”‚                 β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”                                 β”‚
β”‚                 β”‚  Explanation β”‚                                 β”‚
β”‚                 β”‚  Generator   β”‚                                 β”‚
β”‚                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Details

Component Technology Purpose
Collaborative Filter SVD (scikit-surprise) Learn latent user/item factors from ratings
Content-Based Filter TF-IDF (scikit-learn) Build item profiles from genres, user profiles from history
Hybrid Combiner Weighted average Blend CF and CBF scores with adaptive Ξ±
Diversity Reranker MMR algorithm Balance relevance vs. diversity in final list
Explanation Generator Rule-based NLG Generate human-readable recommendation reasons

πŸš€ Installation

Prerequisites

  • Python 3.9 or higher
  • pip package manager

Install from Source

# Clone the repository
git clone https://github.com/pronzzz/movie-recommendation-system.git
cd movie-recommendation-system

# Install dependencies
pip install numpy pandas scipy scikit-learn scikit-surprise requests

# Optional: Install development dependencies
pip install pytest pytest-cov black flake8

Verify Installation

# Run tests to verify everything works
PYTHONPATH=src pytest tests/ -v

🎯 Quick Start

from movie_recommender import MovieLensLoader, HybridRecommender

# Load MovieLens dataset (auto-downloads if needed)
loader = MovieLensLoader(variant="100k")
ratings, movies = loader.load()

# Train hybrid model
recommender = HybridRecommender(alpha=0.7)  # 70% CF, 30% CBF
recommender.fit(ratings, movies)

# Get recommendations with explanations
recs = recommender.recommend(user_id=1, n=10, explain=True)

for item_id, score, explanation in recs:
    movie = movies[movies["item_id"] == item_id].iloc[0]
    print(f"🎬 {movie['title']}")
    print(f"   Score: {score:.2f} | {explanation}\n")

Sample Output:

🎬 Star Wars (1977)
   Score: 4.72 | Recommended because you enjoy Sci-Fi and Action movies.

🎬 Raiders of the Lost Ark (1981)
   Score: 4.65 | Similar to 'Indiana Jones' which you rated highly.

🎬 The Matrix (1999)
   Score: 4.58 | Users with similar taste have rated this highly.

πŸ“– Usage Guide

Data Loading

from movie_recommender.data import MovieLensLoader, train_test_split

# Load different dataset sizes
loader = MovieLensLoader(variant="100k")  # Options: 100k, 1m, 10m, 20m
ratings, movies = loader.load()

# Get dataset statistics
stats = loader.get_statistics()
print(f"Users: {stats['n_users']}, Movies: {stats['n_movies']}")
print(f"Ratings: {stats['n_ratings']}, Sparsity: {stats['sparsity']:.2%}")

# Split for training/testing
train_data, test_data = train_test_split(ratings, test_size=0.2)

Individual Models

from movie_recommender.models import CollaborativeFilter, ContentBasedFilter

# Collaborative Filtering only
cf = CollaborativeFilter(n_factors=50, n_epochs=20)
cf.fit(ratings)
cf_recs = cf.recommend(user_id=1, n=10)

# Content-Based Filtering only
cbf = ContentBasedFilter(use_genres=True)
cbf.fit(ratings, movies)
cbf_recs = cbf.recommend(user_id=1, n=10)

# Find similar items
similar = cf.get_similar_items(item_id=50, n=5)

Hybrid Recommendations

from movie_recommender.models import HybridRecommender

# Configure hybrid model
hybrid = HybridRecommender(
    alpha=0.7,              # CF weight (0.0 = pure CBF, 1.0 = pure CF)
    cold_start_threshold=5, # Users with fewer ratings get more CBF
    cf_params={"n_factors": 100, "n_epochs": 25},
)

hybrid.fit(train_data, movies)

# Recommendations with explanations
recs = hybrid.recommend(user_id=42, n=10, explain=True)

Diversity Re-ranking

from movie_recommender.models import DiversityReranker

# Get base recommendations
base_recs = recommender.recommend(user_id=1, n=50)

# Re-rank for diversity
reranker = DiversityReranker(lambda_param=0.5)  # 0=diversity, 1=relevance
diverse_recs = reranker.rerank(base_recs, n=10)

Model Evaluation

from movie_recommender.evaluation import evaluate_recommender

metrics = evaluate_recommender(
    recommender=hybrid,
    test_data=test_data,
    train_data=train_data,
    k=10,
    relevance_threshold=4.0,
)

print(f"RMSE: {metrics['rmse']:.3f}")
print(f"Precision@10: {metrics['precision@k']:.3f}")
print(f"NDCG@10: {metrics['ndcg@k']:.3f}")
print(f"Coverage: {metrics['coverage']:.2%}")
print(f"Novelty: {metrics['novelty']:.2f}")

πŸ“Š Evaluation Metrics

Accuracy Metrics

Metric Description Optimal
RMSE Root Mean Square Error for rating prediction Lower ↓
MAE Mean Absolute Error Lower ↓
Precision@K Fraction of relevant items in top-K Higher ↑
Recall@K Fraction of relevant items retrieved Higher ↑
NDCG@K Normalized Discounted Cumulative Gain Higher ↑
Hit Rate Users with at least one relevant recommendation Higher ↑
MRR Mean Reciprocal Rank of first relevant item Higher ↑

Beyond-Accuracy Metrics

Metric Description Purpose
Coverage % of catalog ever recommended Reduce filter bubbles
Diversity Intra-list pairwise dissimilarity Avoid repetitive lists
Novelty Average inverse popularity Promote long-tail discovery

πŸ“ Project Structure

movie-recommendation-system/
β”œβ”€β”€ πŸ“„ pyproject.toml          # Package metadata & dependencies
β”œβ”€β”€ πŸ“„ README.md               # This file
β”œβ”€β”€ πŸ“„ LICENSE                 # MIT License
β”œβ”€β”€ πŸ“„ CONTRIBUTING.md         # Contribution guidelines
β”œβ”€β”€ πŸ“„ GUIDE.md                # Detailed implementation guide
β”‚
β”œβ”€β”€ πŸ“‚ src/movie_recommender/  # Main package
β”‚   β”œβ”€β”€ πŸ“‚ data/              # Data loading & preprocessing
β”‚   β”‚   β”œβ”€β”€ loader.py         # MovieLens dataset loader
β”‚   β”‚   └── preprocessing.py  # Train/test split, normalization
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“‚ models/            # Recommendation algorithms
β”‚   β”‚   β”œβ”€β”€ base.py           # Abstract base recommender
β”‚   β”‚   β”œβ”€β”€ cf.py             # Collaborative filtering (SVD)
β”‚   β”‚   β”œβ”€β”€ cbf.py            # Content-based filtering (TF-IDF)
β”‚   β”‚   β”œβ”€β”€ hybrid.py         # Hybrid combiner
β”‚   β”‚   └── reranker.py       # Diversity re-ranking (MMR)
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“‚ evaluation/        # Metrics & evaluation
β”‚   β”‚   └── metrics.py        # RMSE, Precision@K, NDCG, etc.
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“‚ explainability/    # Explanation generation
β”‚   β”‚   └── explanations.py   # Human-readable reasons
β”‚   β”‚
β”‚   └── utils.py              # Logging, formatting helpers
β”‚
β”œβ”€β”€ πŸ“‚ tests/                  # Unit tests (44 tests)
β”‚   β”œβ”€β”€ test_data.py          # Data layer tests
β”‚   β”œβ”€β”€ test_models.py        # Model tests
β”‚   └── test_evaluation.py    # Metrics tests
β”‚
β”œβ”€β”€ πŸ“‚ examples/              # Usage examples
β”‚   └── demo_usage.py         # End-to-end demo script
β”‚
└── πŸ“‚ data/                  # Downloaded datasets (gitignored)

πŸ“š Datasets

This project uses the MovieLens datasets from GroupLens Research.

Dataset Users Movies Ratings Size
MovieLens 100K 943 1,682 100,000 5 MB
MovieLens 1M 6,040 3,706 1,000,209 25 MB
MovieLens 10M 69,878 10,677 10,000,054 265 MB
MovieLens 20M 138,493 27,278 20,000,263 500 MB

Datasets are automatically downloaded on first use and cached in the data/ directory.


πŸ§ͺ Testing

# Run all tests
PYTHONPATH=src pytest tests/ -v

# Run with coverage report
PYTHONPATH=src pytest tests/ --cov=src/movie_recommender --cov-report=html

# Run specific test file
PYTHONPATH=src pytest tests/test_models.py -v

Test Coverage:

  • test_data.py - 12 tests for data loading and preprocessing
  • test_models.py - 12 tests for CF, CBF, Hybrid, and Reranker
  • test_evaluation.py - 20 tests for all evaluation metrics

🀝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone and install dev dependencies
git clone https://github.com/pronzzz/movie-recommendation-system.git
cd movie-recommendation-system
pip install -e ".[dev]"

# Run linting
black src/ tests/
flake8 src/ tests/

# Run tests
pytest tests/ -v

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


Made with ❀️ by pronzzz

About

A machine learning-based recommendation engine that utilizes collaborative and content-based filtering to provide personalized movie suggestions based on user ratings and film metadata.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages