Final Year CSE Project - An intelligent Network Intrusion Detection System combining Machine Learning, Deep Learning, and Explainable AI for real-time threat detection.
- Project Overview
- Features
- Architecture
- Dataset
- Installation
- Usage
- Module Description
- Results
- Technologies Used
- Future Enhancements
- Contributors
- License
This project develops an intelligent Network Intrusion Detection System (NIDS) that automatically detects and classifies malicious network traffic in real-time using advanced Machine Learning and Deep Learning techniques.
- Offline Detection Phase โ Train and evaluate multiple ML models on labeled network traffic data
- Real-Time Detection Phase โ Capture live packets using Scapy and classify traffic on the fly
- Explainability (XAI) โ Use SHAP to visualize and justify each model decision
- Interactive Dashboard โ Build a Streamlit interface to display metrics, live detections, and feature importances
- ๐ฏ Multi-Model Ensemble: Random Forest, XGBoost, SVM, KNN, Deep Learning (LSTM/CNN)
- ๐ Real-Time Detection: Live packet capture and classification using Scapy
- ๐ง Explainable AI: SHAP-based interpretability for model decisions
- ๐ Interactive Dashboard: Streamlit-based web interface for monitoring
- โ๏ธ Class Balancing: SMOTE implementation for handling imbalanced datasets
- ๐จ Comprehensive Visualization: Performance metrics, confusion matrices, ROC curves
- ๐จ Alert System: Real-time intrusion alerts and logging
- ๐ Historical Analytics: Attack pattern analysis and trends
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ NETWORK TRAFFIC SOURCE โ
โ (Live Packets / Dataset Files) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DATA PREPROCESSING MODULE โ
โ โข Load Dataset (CICIDS 2017 / NSL-KDD) โ
โ โข Clean Data (Remove duplicates, handle missing values) โ
โ โข Encode Features (Label/One-Hot Encoding) โ
โ โข Normalize Features (StandardScaler/MinMaxScaler) โ
โ โข Balance Dataset (SMOTE) โ
โ โข Train-Test Split โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FEATURE SELECTION MODULE โ
โ โข Correlation Analysis โ
โ โข Univariate Selection (Chi-square, ANOVA) โ
โ โข Tree-Based Importance (Random Forest) โ
โ โข Recursive Feature Elimination (RFE) โ
โ โข Dimensionality Reduction (PCA - Optional) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODEL TRAINING MODULE โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Traditional ML โ โ Deep Learning โ โ
โ โ โข Random Forest โ โ โข LSTM โ โ
โ โ โข XGBoost โ โ โข CNN โ โ
โ โ โข SVM โ โ โข CNN-LSTM Hybrid โ โ
โ โ โข KNN โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โข Logistic Reg. โ โ
โ โ โข Naive Bayes โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โข Hyperparameter Tuning (GridSearchCV/RandomizedSearchCV) โ
โ โข Cross-Validation โ
โ โข Model Evaluation (Accuracy, Precision, Recall, F1, ROC-AUC) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SHAP EXPLAINABILITY MODULE โ
โ โข Generate SHAP Values โ
โ โข Global Feature Importance โ
โ โข Local Prediction Explanations โ
โ โข Summary Plots, Force Plots, Waterfall Plots โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REAL-TIME DETECTION MODULE โ
โ โข Live Packet Capture (Scapy) โ
โ โข Feature Extraction from Packets โ
โ โข Real-Time Classification โ
โ โข Intrusion Logging & Alerts โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INTERACTIVE DASHBOARD โ
โ (Streamlit) โ
โ โข Live Traffic Monitoring โ
โ โข Model Performance Metrics โ
โ โข SHAP Visualizations โ
โ โข Historical Analytics โ
โ โข Alert Management โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Source: Canadian Institute for Cybersecurity
- Features: 80+ network traffic features
- Attack Types: DDoS, PortScan, Brute Force, Web Attacks, Infiltration, Botnet
- Size: ~2.8M records
- Source: NSL-KDD (improved version of KDD Cup 99)
- Features: 41 network features
- Attack Categories: DoS, Probe, R2L, U2R
- Python 3.10 or higher
- pip package manager
- Administrator/Root privileges (for packet capture)
git clone https://github.com/Priyanshu-Ku/NIDS-ML.git
cd "Network Intrusion Detection using ML (NIDS)/NIDS-ML"Windows (PowerShell):
python -m venv venv
.\venv\Scripts\Activate.ps1Linux/Mac:
python3 -m venv venv
source venv/bin/activatepip install --upgrade pip
pip install -r requirements.txt- Download CICIDS 2017 from here
- Extract and place CSV files in
data/raw/folder - Or use NSL-KDD as a fallback
python -c "import numpy, pandas, sklearn, tensorflow, scapy; print('All packages installed successfully!')"cd src
python data_preprocessing.pyThis will:
- Load the raw dataset
- Clean and preprocess data
- Apply encoding and normalization
- Balance classes using SMOTE
- Save preprocessed data
python feature_selection.pySelect the most important features for better model performance.
python model_training.pyThis will:
- Train multiple ML models
- Perform hyperparameter tuning
- Evaluate model performance
- Save trained models to
models/folder
python shap_explainability.pyGenerate SHAP explanations and visualizations.
Note: Requires administrator/root privileges
# Windows (Run as Administrator)
python realtime_detection.py
# Linux/Mac
sudo python realtime_detection.pystreamlit run dashboard.pyAccess the dashboard at http://localhost:8501
Handles all data preprocessing tasks including loading, cleaning, encoding, normalization, balancing, and train-test splitting.
Key Classes:
DataPreprocessor: Main preprocessing pipeline
Key Functions:
load_data(): Load dataset from CSVclean_data(): Remove duplicates and handle missing valuesencode_features(): Encode categorical featuresnormalize_features(): Standardize/normalize featuresbalance_dataset(): Apply SMOTE for class balancingsplit_data(): Train-test split with stratification
Implements various feature selection techniques to identify the most important features.
Key Classes:
FeatureSelector: Feature selection pipeline
Key Methods:
correlation_analysis(): Remove highly correlated featuresunivariate_selection(): Chi-square, ANOVA F-testtree_based_importance(): Random Forest importancerecursive_feature_elimination(): RFEapply_pca(): Dimensionality reduction
Trains and evaluates multiple ML and DL models.
Key Classes:
MLModelTrainer: Traditional ML model trainingDeepLearningTrainer: Deep learning model training
Key Models:
- Random Forest, XGBoost, SVM, KNN, Logistic Regression, Naive Bayes
- LSTM, CNN, CNN-LSTM Hybrid
Key Features:
- Hyperparameter tuning with GridSearchCV
- Cross-validation
- Comprehensive evaluation metrics
Provides explainability for model predictions using SHAP.
Key Classes:
SHAPExplainer: SHAP-based explainability
Key Visualizations:
- Summary plots (global importance)
- Force plots (individual predictions)
- Waterfall plots (feature contributions)
- Dependence plots (feature interactions)
Captures and analyzes live network traffic in real-time.
Key Classes:
PacketFeatureExtractor: Extract features from packetsRealtimeDetector: Real-time classification engine
Key Features:
- Live packet capture using Scapy
- Real-time feature extraction
- On-the-fly classification
- Intrusion logging and alerts
Interactive Streamlit dashboard for visualization and monitoring.
Key Pages:
- ๐ Home: Overview and system status
- ๐ Model Performance: Comparison and metrics
- ๐ด Live Detection: Real-time monitoring
- ๐ง Explainability: SHAP visualizations
- ๐ Analytics: Historical data analysis
- โ๏ธ Settings: Configuration options
(To be updated after model training)
| Model | Accuracy | Precision | Recall | F1-Score | ROC-AUC |
|---|---|---|---|---|---|
| Random Forest | 98.5% | 98.3% | 98.7% | 98.5% | 99.2% |
| XGBoost | 98.2% | 98.0% | 98.4% | 98.2% | 99.0% |
| SVM | 96.8% | 96.5% | 97.0% | 96.7% | 98.5% |
| Deep Learning (LSTM) | 97.5% | 97.2% | 97.8% | 97.5% | 98.8% |
- Python 3.10+
- scikit-learn: Traditional ML algorithms
- XGBoost: Gradient boosting
- TensorFlow/PyTorch: Deep learning
- imbalanced-learn: SMOTE for class balancing
- SHAP: Explainable AI
- pandas: Data manipulation
- NumPy: Numerical computing
- scipy: Scientific computing
- Scapy: Packet capture and analysis
- Matplotlib: Static plots
- Seaborn: Statistical visualization
- Plotly: Interactive visualizations
- Streamlit: Dashboard framework
- joblib: Model persistence
- logging: System logging
- tqdm: Progress bars
- Ensemble Stacking: Combine multiple models for better accuracy
- Distributed Detection: Multi-node deployment for large networks
- Auto-ML: Automated model selection and hyperparameter tuning
- Advanced DL: Transformer-based models, GNNs for network graphs
- Edge Deployment: Deploy on edge devices (Raspberry Pi, IoT gateways)
- Cloud Integration: AWS/Azure deployment with auto-scaling
- Zero-Day Detection: Anomaly detection for unknown attacks
- Mobile App: Android/iOS app for mobile monitoring
- API Development: REST API for integration with SIEM systems
- Blockchain Logging: Immutable intrusion logs using blockchain
- [Your Name] - Lead Developer - [GitHub Profile]
- [Teammate 1] - ML Engineer - [GitHub Profile]
- [Teammate 2] - Data Scientist - [GitHub Profile]
Supervisor: [Supervisor Name]
Institution: [Your College/University]
Year: 2024-2025
This project is licensed under the MIT License - see the LICENSE file for details.
- Canadian Institute for Cybersecurity for CICIDS 2017 dataset
- NSL-KDD dataset providers
- Open-source community for amazing tools and libraries
- Our project supervisor for guidance and support
For questions or collaboration:
- Email: [your.email@example.com]
- LinkedIn: [Your LinkedIn]
- Project Link: [GitHub Repository URL]
If you use this project in your research, please cite:
@misc{nids-ml-2025,
author = {Your Name},
title = {Network Intrusion Detection System using Machine Learning},
year = {2025},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/yourusername/nids-ml}}
}โญ Star this repository if you find it helpful!
๐ Report bugs and suggest features in the Issues section.
Last Updated: November 10, 2025