An interpretable, robust regression system for predicting vehicle fuel efficiency (MPG) using the Auto MPG dataset. This project goes beyond typical ML demos by emphasizing understanding over accuracy and treating regression as a decision-support system.
- What Makes This Different
- Project Structure
- Quick Start
- Dataset
- Model Architecture
- Key Insights
- Generated Outputs
- Detailed Guide
- Future Extensions
- Contributing
- License
| Typical Auto MPG Projects | This Project |
|---|---|
| Train one neural network | Compare multiple models for honest evaluation |
| Print a loss value | Explain why predictions behave as they do |
| Declare success | Demonstrate when NNs helpβand when they don't |
| Black-box predictions | Prioritize interpretability over marginal gains |
π‘ Philosophy: Numbers are only impressive when they explain something real.
Fuel-efficiency-prediction/
βββ π data/
β βββ loader.py # Data ingestion from UCI repository
βββ π preprocessing/
β βββ cleaner.py # Missing value handling, encoding
β βββ features.py # Normalization, train-test split
βββ π eda/
β βββ visualizations.py # Pairplots, correlation analysis
βββ π models/
β βββ neural_network.py # Keras Sequential model (64β32β1)
β βββ baselines.py # Linear Regression, Ridge, Random Forest
βββ π training/
β βββ trainer.py # Training loop with early stopping
βββ π evaluation/
β βββ metrics.py # MAE, RMSE, RΒ² calculations
β βββ plots.py # Diagnostic visualizations
β βββ interpretability.py # Feature importance analysis
βββ π utils/
β βββ config.py # Hyperparameters and constants
βββ π outputs/ # Generated plots and artifacts
βββ π main.py # End-to-end pipeline
βββ π requirements.txt
βββ π GUIDE.md # Detailed usage guide
βββ π LICENSE
βββ π README.md
- Python 3.8 or higher
- pip package manager
git clone https://github.com/pronzzz/fuel-effeciency-prediction.git
cd fuel-effeciency-predictionpip install -r requirements.txtpython main.py| Option | Description |
|---|---|
--eda-only |
Only run exploratory data analysis |
--skip-eda |
Skip EDA for faster execution |
--verbose |
Enable detailed output |
The Auto MPG dataset from the UCI Machine Learning Repository contains fuel efficiency data for automobiles from 1970-1982.
| Feature | Description | Type |
|---|---|---|
mpg |
Miles per gallon | Target |
cylinders |
Number of cylinders | Discrete (3-8) |
displacement |
Engine displacement (cubic inches) | Continuous |
horsepower |
Engine horsepower | Continuous |
weight |
Vehicle weight (lbs) | Continuous |
acceleration |
0-60 mph time (seconds) | Continuous |
model_year |
Year (70-82) | Discrete |
origin |
Country (USA/Europe/Japan) | Categorical |
Dataset Statistics:
- π¦ 398 samples
- π’ 8 features (7 predictors + 1 target)
- β 6 missing values (horsepower)
βββββββββββββββββββββββββββββββββββββββββββββββ
β INPUT (9 features after one-hot encoding) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Dense(64, ReLU) β
β Purpose: Learn feature combinations β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Dense(32, ReLU) β
β Purpose: Compress to efficiency-relevant β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Dense(1, Linear) β
β Purpose: Predict MPG β
βββββββββββββββββββββββββββββββββββββββββββββββ
Total Parameters: ~2,753
| Model | Purpose |
|---|---|
| Linear Regression | Simple interpretable baseline |
| Ridge Regression | L2 regularization for multicollinearity |
| Random Forest | Nonlinear tree-based ensemble |
Vehicle weight is the single strongest predictor of fuel efficiency:
- Physics: Heavier cars need more energy to accelerate (F = ma)
- Correlation: r β -0.83 with MPG
- Engineering insight: 500 lb reduction β 3-5 MPG improvement
For tabular data with ~400 samples:
| Observation | Implication |
|---|---|
| NN provides modest improvement over linear regression | Complexity may not be worth it |
| Random Forest often matches NN performance | Simpler is often better |
| Baseline models are highly competitive | Start simple, add complexity only when needed |
In real-world engineering decisions:
Knowing "reduce weight by 500 lbs β +3 MPG" is more valuable than knowing "predicted MPG is 32.7"
After running the pipeline, check outputs/ for:
| File | Description |
|---|---|
pairplot.png |
Feature relationships and correlations |
correlation_heatmap.png |
Multicollinearity detection |
target_distribution.png |
MPG distribution analysis |
features_by_origin.png |
Feature differences by car origin |
predicted_vs_actual_*.png |
Prediction accuracy per model |
residuals_*.png |
Error pattern analysis |
training_history.png |
Neural network convergence |
feature_importance_*.png |
Which features matter most |
For a comprehensive walkthrough including:
- Step-by-step pipeline explanation
- Understanding each visualization
- Interpreting model results
- Customizing hyperparameters
- Extending the system
See GUIDE.md
- SHAP values for game-theoretic feature importance
- Uncertainty estimation for predictions
- Scenario simulation (e.g., "What if we reduce weight by 500 lbs?")
- Gradient Boosting (XGBoost, LightGBM) baselines
- Cross-validation for more robust evaluation
- Web interface for interactive predictions
Contributions are welcome! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- UCI Machine Learning Repository for the Auto MPG dataset
- Original dataset from StatLib library, Carnegie Mellon University
Made with β€οΈ for interpretable machine learning