Skip to content

Priyanshu-Ku/NIDS-ML

Repository files navigation

๐Ÿ›ก๏ธ Network Intrusion Detection System using Machine Learning (NIDS-ML)

Final Year CSE Project - An intelligent Network Intrusion Detection System combining Machine Learning, Deep Learning, and Explainable AI for real-time threat detection.


๐Ÿ“‹ Table of Contents


๐ŸŽฏ Project Overview

This project develops an intelligent Network Intrusion Detection System (NIDS) that automatically detects and classifies malicious network traffic in real-time using advanced Machine Learning and Deep Learning techniques.

Primary Goals

  1. Offline Detection Phase โ€“ Train and evaluate multiple ML models on labeled network traffic data
  2. Real-Time Detection Phase โ€“ Capture live packets using Scapy and classify traffic on the fly
  3. Explainability (XAI) โ€“ Use SHAP to visualize and justify each model decision
  4. Interactive Dashboard โ€“ Build a Streamlit interface to display metrics, live detections, and feature importances

โœจ Features

  • ๐ŸŽฏ Multi-Model Ensemble: Random Forest, XGBoost, SVM, KNN, Deep Learning (LSTM/CNN)
  • ๐Ÿ” Real-Time Detection: Live packet capture and classification using Scapy
  • ๐Ÿง  Explainable AI: SHAP-based interpretability for model decisions
  • ๐Ÿ“Š Interactive Dashboard: Streamlit-based web interface for monitoring
  • โš–๏ธ Class Balancing: SMOTE implementation for handling imbalanced datasets
  • ๐ŸŽจ Comprehensive Visualization: Performance metrics, confusion matrices, ROC curves
  • ๐Ÿšจ Alert System: Real-time intrusion alerts and logging
  • ๐Ÿ“ˆ Historical Analytics: Attack pattern analysis and trends

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        NETWORK TRAFFIC SOURCE                        โ”‚
โ”‚                    (Live Packets / Dataset Files)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      DATA PREPROCESSING MODULE                       โ”‚
โ”‚   โ€ข Load Dataset (CICIDS 2017 / NSL-KDD)                           โ”‚
โ”‚   โ€ข Clean Data (Remove duplicates, handle missing values)           โ”‚
โ”‚   โ€ข Encode Features (Label/One-Hot Encoding)                        โ”‚
โ”‚   โ€ข Normalize Features (StandardScaler/MinMaxScaler)                โ”‚
โ”‚   โ€ข Balance Dataset (SMOTE)                                         โ”‚
โ”‚   โ€ข Train-Test Split                                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     FEATURE SELECTION MODULE                         โ”‚
โ”‚   โ€ข Correlation Analysis                                            โ”‚
โ”‚   โ€ข Univariate Selection (Chi-square, ANOVA)                       โ”‚
โ”‚   โ€ข Tree-Based Importance (Random Forest)                          โ”‚
โ”‚   โ€ข Recursive Feature Elimination (RFE)                            โ”‚
โ”‚   โ€ข Dimensionality Reduction (PCA - Optional)                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       MODEL TRAINING MODULE                          โ”‚
โ”‚                                                                      โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚   โ”‚  Traditional ML      โ”‚    โ”‚   Deep Learning      โ”‚            โ”‚
โ”‚   โ”‚  โ€ข Random Forest     โ”‚    โ”‚   โ€ข LSTM             โ”‚            โ”‚
โ”‚   โ”‚  โ€ข XGBoost           โ”‚    โ”‚   โ€ข CNN              โ”‚            โ”‚
โ”‚   โ”‚  โ€ข SVM               โ”‚    โ”‚   โ€ข CNN-LSTM Hybrid  โ”‚            โ”‚
โ”‚   โ”‚  โ€ข KNN               โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
โ”‚   โ”‚  โ€ข Logistic Reg.     โ”‚                                        โ”‚
โ”‚   โ”‚  โ€ข Naive Bayes       โ”‚                                        โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                        โ”‚
โ”‚                                                                      โ”‚
โ”‚   โ€ข Hyperparameter Tuning (GridSearchCV/RandomizedSearchCV)        โ”‚
โ”‚   โ€ข Cross-Validation                                                โ”‚
โ”‚   โ€ข Model Evaluation (Accuracy, Precision, Recall, F1, ROC-AUC)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   SHAP EXPLAINABILITY MODULE                         โ”‚
โ”‚   โ€ข Generate SHAP Values                                            โ”‚
โ”‚   โ€ข Global Feature Importance                                       โ”‚
โ”‚   โ€ข Local Prediction Explanations                                   โ”‚
โ”‚   โ€ข Summary Plots, Force Plots, Waterfall Plots                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   REAL-TIME DETECTION MODULE                         โ”‚
โ”‚   โ€ข Live Packet Capture (Scapy)                                    โ”‚
โ”‚   โ€ข Feature Extraction from Packets                                 โ”‚
โ”‚   โ€ข Real-Time Classification                                        โ”‚
โ”‚   โ€ข Intrusion Logging & Alerts                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                โ”‚
                                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     INTERACTIVE DASHBOARD                            โ”‚
โ”‚                         (Streamlit)                                  โ”‚
โ”‚   โ€ข Live Traffic Monitoring                                         โ”‚
โ”‚   โ€ข Model Performance Metrics                                       โ”‚
โ”‚   โ€ข SHAP Visualizations                                            โ”‚
โ”‚   โ€ข Historical Analytics                                            โ”‚
โ”‚   โ€ข Alert Management                                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“Š Dataset

Primary: CICIDS 2017 Dataset

  • Source: Canadian Institute for Cybersecurity
  • Features: 80+ network traffic features
  • Attack Types: DDoS, PortScan, Brute Force, Web Attacks, Infiltration, Botnet
  • Size: ~2.8M records

Fallback: NSL-KDD Dataset

  • Source: NSL-KDD (improved version of KDD Cup 99)
  • Features: 41 network features
  • Attack Categories: DoS, Probe, R2L, U2R

๐Ÿš€ Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager
  • Administrator/Root privileges (for packet capture)

Step 1: Clone the Repository

git clone https://github.com/Priyanshu-Ku/NIDS-ML.git
cd "Network Intrusion Detection using ML (NIDS)/NIDS-ML"

Step 2: Create Virtual Environment

Windows (PowerShell):

python -m venv venv
.\venv\Scripts\Activate.ps1

Linux/Mac:

python3 -m venv venv
source venv/bin/activate

Step 3: Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Step 4: Download Dataset

  1. Download CICIDS 2017 from here
  2. Extract and place CSV files in data/raw/ folder
  3. Or use NSL-KDD as a fallback

Step 5: Verify Installation

python -c "import numpy, pandas, sklearn, tensorflow, scapy; print('All packages installed successfully!')"

๐Ÿ’ป Usage

1. Data Preprocessing

cd src
python data_preprocessing.py

This will:

  • Load the raw dataset
  • Clean and preprocess data
  • Apply encoding and normalization
  • Balance classes using SMOTE
  • Save preprocessed data

2. Feature Selection

python feature_selection.py

Select the most important features for better model performance.

3. Model Training

python model_training.py

This will:

  • Train multiple ML models
  • Perform hyperparameter tuning
  • Evaluate model performance
  • Save trained models to models/ folder

4. SHAP Explainability

python shap_explainability.py

Generate SHAP explanations and visualizations.

5. Real-Time Detection

Note: Requires administrator/root privileges

# Windows (Run as Administrator)
python realtime_detection.py

# Linux/Mac
sudo python realtime_detection.py

6. Launch Dashboard

streamlit run dashboard.py

Access the dashboard at http://localhost:8501


๐Ÿ“ Module Description

data_preprocessing.py

Handles all data preprocessing tasks including loading, cleaning, encoding, normalization, balancing, and train-test splitting.

Key Classes:

  • DataPreprocessor: Main preprocessing pipeline

Key Functions:

  • load_data(): Load dataset from CSV
  • clean_data(): Remove duplicates and handle missing values
  • encode_features(): Encode categorical features
  • normalize_features(): Standardize/normalize features
  • balance_dataset(): Apply SMOTE for class balancing
  • split_data(): Train-test split with stratification

feature_selection.py

Implements various feature selection techniques to identify the most important features.

Key Classes:

  • FeatureSelector: Feature selection pipeline

Key Methods:

  • correlation_analysis(): Remove highly correlated features
  • univariate_selection(): Chi-square, ANOVA F-test
  • tree_based_importance(): Random Forest importance
  • recursive_feature_elimination(): RFE
  • apply_pca(): Dimensionality reduction

model_training.py

Trains and evaluates multiple ML and DL models.

Key Classes:

  • MLModelTrainer: Traditional ML model training
  • DeepLearningTrainer: Deep learning model training

Key Models:

  • Random Forest, XGBoost, SVM, KNN, Logistic Regression, Naive Bayes
  • LSTM, CNN, CNN-LSTM Hybrid

Key Features:

  • Hyperparameter tuning with GridSearchCV
  • Cross-validation
  • Comprehensive evaluation metrics

shap_explainability.py

Provides explainability for model predictions using SHAP.

Key Classes:

  • SHAPExplainer: SHAP-based explainability

Key Visualizations:

  • Summary plots (global importance)
  • Force plots (individual predictions)
  • Waterfall plots (feature contributions)
  • Dependence plots (feature interactions)

realtime_detection.py

Captures and analyzes live network traffic in real-time.

Key Classes:

  • PacketFeatureExtractor: Extract features from packets
  • RealtimeDetector: Real-time classification engine

Key Features:

  • Live packet capture using Scapy
  • Real-time feature extraction
  • On-the-fly classification
  • Intrusion logging and alerts

dashboard.py

Interactive Streamlit dashboard for visualization and monitoring.

Key Pages:

  • ๐Ÿ  Home: Overview and system status
  • ๐Ÿ“Š Model Performance: Comparison and metrics
  • ๐Ÿ”ด Live Detection: Real-time monitoring
  • ๐Ÿง  Explainability: SHAP visualizations
  • ๐Ÿ“ˆ Analytics: Historical data analysis
  • โš™๏ธ Settings: Configuration options

๐Ÿ“ˆ Results

(To be updated after model training)

Expected Performance Metrics

Model Accuracy Precision Recall F1-Score ROC-AUC
Random Forest 98.5% 98.3% 98.7% 98.5% 99.2%
XGBoost 98.2% 98.0% 98.4% 98.2% 99.0%
SVM 96.8% 96.5% 97.0% 96.7% 98.5%
Deep Learning (LSTM) 97.5% 97.2% 97.8% 97.5% 98.8%

๐Ÿ› ๏ธ Technologies Used

Programming Languages

  • Python 3.10+

Machine Learning & AI

  • scikit-learn: Traditional ML algorithms
  • XGBoost: Gradient boosting
  • TensorFlow/PyTorch: Deep learning
  • imbalanced-learn: SMOTE for class balancing
  • SHAP: Explainable AI

Data Processing

  • pandas: Data manipulation
  • NumPy: Numerical computing
  • scipy: Scientific computing

Network & Security

  • Scapy: Packet capture and analysis

Visualization

  • Matplotlib: Static plots
  • Seaborn: Statistical visualization
  • Plotly: Interactive visualizations
  • Streamlit: Dashboard framework

Utilities

  • joblib: Model persistence
  • logging: System logging
  • tqdm: Progress bars

๐Ÿ”ฎ Future Enhancements

  1. Ensemble Stacking: Combine multiple models for better accuracy
  2. Distributed Detection: Multi-node deployment for large networks
  3. Auto-ML: Automated model selection and hyperparameter tuning
  4. Advanced DL: Transformer-based models, GNNs for network graphs
  5. Edge Deployment: Deploy on edge devices (Raspberry Pi, IoT gateways)
  6. Cloud Integration: AWS/Azure deployment with auto-scaling
  7. Zero-Day Detection: Anomaly detection for unknown attacks
  8. Mobile App: Android/iOS app for mobile monitoring
  9. API Development: REST API for integration with SIEM systems
  10. Blockchain Logging: Immutable intrusion logs using blockchain

๐Ÿ‘ฅ Contributors

  • [Your Name] - Lead Developer - [GitHub Profile]
  • [Teammate 1] - ML Engineer - [GitHub Profile]
  • [Teammate 2] - Data Scientist - [GitHub Profile]

Supervisor: [Supervisor Name]
Institution: [Your College/University]
Year: 2024-2025


๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Canadian Institute for Cybersecurity for CICIDS 2017 dataset
  • NSL-KDD dataset providers
  • Open-source community for amazing tools and libraries
  • Our project supervisor for guidance and support

๐Ÿ“ž Contact

For questions or collaboration:


๐Ÿ“ Citation

If you use this project in your research, please cite:

@misc{nids-ml-2025,
  author = {Your Name},
  title = {Network Intrusion Detection System using Machine Learning},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/yourusername/nids-ml}}
}

โญ Star this repository if you find it helpful!

๐Ÿ› Report bugs and suggest features in the Issues section.


Last Updated: November 10, 2025

Releases

No releases published

Packages

 
 
 

Contributors

Languages