Skip to content

pronzzz/fuel-effeciency-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš— Fuel Efficiency Prediction System β›½

Python TensorFlow scikit-learn License: MIT Maintenance

An interpretable, robust regression system for predicting vehicle fuel efficiency (MPG) using the Auto MPG dataset. This project goes beyond typical ML demos by emphasizing understanding over accuracy and treating regression as a decision-support system.


πŸ“‹ Table of Contents


🎯 What Makes This Different

Typical Auto MPG Projects This Project
Train one neural network Compare multiple models for honest evaluation
Print a loss value Explain why predictions behave as they do
Declare success Demonstrate when NNs helpβ€”and when they don't
Black-box predictions Prioritize interpretability over marginal gains

πŸ’‘ Philosophy: Numbers are only impressive when they explain something real.


πŸ“ Project Structure

Fuel-efficiency-prediction/
β”œβ”€β”€ πŸ“‚ data/
β”‚   └── loader.py              # Data ingestion from UCI repository
β”œβ”€β”€ πŸ“‚ preprocessing/
β”‚   β”œβ”€β”€ cleaner.py             # Missing value handling, encoding
β”‚   └── features.py            # Normalization, train-test split
β”œβ”€β”€ πŸ“‚ eda/
β”‚   └── visualizations.py      # Pairplots, correlation analysis
β”œβ”€β”€ πŸ“‚ models/
β”‚   β”œβ”€β”€ neural_network.py      # Keras Sequential model (64β†’32β†’1)
β”‚   └── baselines.py           # Linear Regression, Ridge, Random Forest
β”œβ”€β”€ πŸ“‚ training/
β”‚   └── trainer.py             # Training loop with early stopping
β”œβ”€β”€ πŸ“‚ evaluation/
β”‚   β”œβ”€β”€ metrics.py             # MAE, RMSE, RΒ² calculations
β”‚   β”œβ”€β”€ plots.py               # Diagnostic visualizations
β”‚   └── interpretability.py    # Feature importance analysis
β”œβ”€β”€ πŸ“‚ utils/
β”‚   └── config.py              # Hyperparameters and constants
β”œβ”€β”€ πŸ“‚ outputs/                # Generated plots and artifacts
β”œβ”€β”€ 🐍 main.py                 # End-to-end pipeline
β”œβ”€β”€ πŸ“„ requirements.txt
β”œβ”€β”€ πŸ“„ GUIDE.md                # Detailed usage guide
β”œβ”€β”€ πŸ“„ LICENSE
└── πŸ“„ README.md

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • pip package manager

1. Clone the Repository

git clone https://github.com/pronzzz/fuel-effeciency-prediction.git
cd fuel-effeciency-prediction

2. Install Dependencies

pip install -r requirements.txt

3. Run the Full Pipeline

python main.py

Command Line Options

Option Description
--eda-only Only run exploratory data analysis
--skip-eda Skip EDA for faster execution
--verbose Enable detailed output

πŸ“Š Dataset

The Auto MPG dataset from the UCI Machine Learning Repository contains fuel efficiency data for automobiles from 1970-1982.

Feature Description Type
mpg Miles per gallon Target
cylinders Number of cylinders Discrete (3-8)
displacement Engine displacement (cubic inches) Continuous
horsepower Engine horsepower Continuous
weight Vehicle weight (lbs) Continuous
acceleration 0-60 mph time (seconds) Continuous
model_year Year (70-82) Discrete
origin Country (USA/Europe/Japan) Categorical

Dataset Statistics:

  • πŸ“¦ 398 samples
  • πŸ”’ 8 features (7 predictors + 1 target)
  • ❓ 6 missing values (horsepower)

🧠 Model Architecture

Neural Network

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  INPUT (9 features after one-hot encoding) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dense(64, ReLU)                            β”‚
β”‚  Purpose: Learn feature combinations        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dense(32, ReLU)                            β”‚
β”‚  Purpose: Compress to efficiency-relevant   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Dense(1, Linear)                           β”‚
β”‚  Purpose: Predict MPG                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Total Parameters: ~2,753

Baselines for Comparison

Model Purpose
Linear Regression Simple interpretable baseline
Ridge Regression L2 regularization for multicollinearity
Random Forest Nonlinear tree-based ensemble

πŸ” Key Insights

1️⃣ Weight Dominates MPG Prediction

Vehicle weight is the single strongest predictor of fuel efficiency:

  • Physics: Heavier cars need more energy to accelerate (F = ma)
  • Correlation: r β‰ˆ -0.83 with MPG
  • Engineering insight: 500 lb reduction β†’ 3-5 MPG improvement

2️⃣ When Neural Networks Help (and When They Don't)

For tabular data with ~400 samples:

Observation Implication
NN provides modest improvement over linear regression Complexity may not be worth it
Random Forest often matches NN performance Simpler is often better
Baseline models are highly competitive Start simple, add complexity only when needed

3️⃣ Interpretability > Marginal Accuracy

In real-world engineering decisions:

Knowing "reduce weight by 500 lbs β†’ +3 MPG" is more valuable than knowing "predicted MPG is 32.7"


πŸ“ˆ Generated Outputs

After running the pipeline, check outputs/ for:

File Description
pairplot.png Feature relationships and correlations
correlation_heatmap.png Multicollinearity detection
target_distribution.png MPG distribution analysis
features_by_origin.png Feature differences by car origin
predicted_vs_actual_*.png Prediction accuracy per model
residuals_*.png Error pattern analysis
training_history.png Neural network convergence
feature_importance_*.png Which features matter most

πŸ“– Detailed Guide

For a comprehensive walkthrough including:

  • Step-by-step pipeline explanation
  • Understanding each visualization
  • Interpreting model results
  • Customizing hyperparameters
  • Extending the system

See GUIDE.md


πŸ› οΈ Future Extensions

  • SHAP values for game-theoretic feature importance
  • Uncertainty estimation for predictions
  • Scenario simulation (e.g., "What if we reduce weight by 500 lbs?")
  • Gradient Boosting (XGBoost, LightGBM) baselines
  • Cross-validation for more robust evaluation
  • Web interface for interactive predictions

🀝 Contributing

Contributions are welcome! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


Made with ❀️ for interpretable machine learning

About

A regression-based machine learning model developed to predict vehicle fuel efficiency (MPG) based on technical specifications and historical performance data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages