🛡️ Network Intrusion Detection System (NIDS)

A machine learning-based system that classifies network traffic as normal or malicious, comparing the performance of seven supervised learning algorithms on a real-world cybersecurity benchmark dataset.

📌 Table of Contents

Overview
Dataset
Methodology
Tech Stack
Results
Key Findings
Project Structure
Getting Started
Future Improvements

Overview

Network Intrusion Detection Systems (NIDS) are a critical layer of defense in modern cybersecurity infrastructure. They monitor network traffic and flag suspicious connections that may indicate attacks such as DoS, probing, or unauthorized remote access.

This project builds a multi-class classifier that identifies whether a network connection is normal or one of several known attack types. Seven ML models are trained, tuned, and compared using standard classification metrics to determine the best-performing approach.

Dataset

KDD Cup 1999 — a widely-used benchmark in network intrusion detection research.

Source file: KDDCup Data 10 Percent.csv (10% stratified sample of the full dataset)
Size: ~494,000 connection records
Features: 41 features per connection (protocol type, service, flag, byte counts, etc.)
Labels: 1 normal class + 4 attack categories (DoS, Probe, R2L, U2R), further broken down into ~22 specific attack subtypes

Supporting files:

kddcup.txt — column names and feature descriptions
training_attack_types.txt — mapping of attack subtypes to their parent categories

⚠️ Note: The KDD Cup 1999 dataset is a well-known academic benchmark. While it remains useful for learning and prototyping, it does not reflect the complexity of modern network traffic. Real-world deployment would require a contemporary dataset such as CICIDS2017.

Methodology

The project follows a standard ML pipeline:

1. Data Preprocessing

Column assignment using kddcup.txt
Label encoding of categorical features (protocol_type, service, flag)
Attack subtype mapping to parent categories via training_attack_types.txt
Feature scaling with StandardScaler

2. Exploratory Data Analysis

Class distribution analysis
Feature correlation heatmaps using Seaborn
Visualizations of attack type frequencies

3. Model Training & Hyperparameter Tuning

Seven classifiers trained and evaluated
GridSearchCV used for hyperparameter optimization on select models

4. Evaluation Each model was evaluated on:

Accuracy
Precision, Recall, F1-Score (weighted)
Confusion Matrix

Tech Stack

Category	Libraries
Data Manipulation	`pandas`, `numpy`
Visualization	`matplotlib`, `seaborn`
Machine Learning	`scikit-learn`, `xgboost`
Preprocessing	`StandardScaler`, `LabelEncoder`
Tuning	`GridSearchCV`
Environment	`jupyter`

Results

Model	Accuracy
✅ Random Forest	99.76%
Decision Tree	99.59%
K-Nearest Neighbors	99.41%
XGBoost	99.39%
AdaBoost	94.54%
Hist Gradient Boosting	93.10%
Gradient Boosting	92.86%

Key Findings

Random Forest achieved the highest accuracy (99.76%), likely due to its ensemble nature reducing variance on this structured tabular dataset.
Tree-based models (Decision Tree, Random Forest, XGBoost) significantly outperformed boosting approaches (AdaBoost, Gradient Boosting) on this benchmark.
KNN performed surprisingly well (99.41%), suggesting the feature space is highly clustered by class after scaling.
Boosting methods showed lower accuracy — possibly due to the heavy class imbalance in the KDD Cup dataset, which can cause boosting algorithms to overfit to the majority class without specific handling.

Project Structure

Network-Intrusion-Detection-System/
│
├── NIDS.ipynb                     # Main notebook: EDA, training, evaluation
├── KDDCup Data 10 Percent.csv     # Training dataset (10% sample)
├── Workshop Dataset.csv           # Additional dataset used during analysis
├── kddcup.txt                     # Feature/column name definitions
├── training_attack_types.txt      # Attack subtype → category mapping
├── requirements.txt               # Python dependencies
└── README.md

Getting Started

Prerequisites

Python 3.8+
Jupyter Notebook

Installation

# Clone the repository
git clone https://github.com/periwalkeshav/Network-Intrusion-Detection-System.git
cd Network-Intrusion-Detection-System

# Install dependencies
pip install -r requirements.txt

Running the Notebook

jupyter notebook NIDS.ipynb

Make sure KDDCup Data 10 Percent.csv, kddcup.txt, and training_attack_types.txt are in the same directory as the notebook before running.

Future Improvements

Evaluate on a modern dataset (e.g., CICIDS2017, UNSW-NB15) for more realistic benchmarking
Address class imbalance using SMOTE or class-weighted loss functions
Add deep learning baseline (LSTM or 1D CNN) for sequential traffic analysis
Build a real-time inference pipeline using a trained model artifact
Perform feature importance analysis to identify the most predictive network attributes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛡️ Network Intrusion Detection System (NIDS)

📌 Table of Contents

Overview

Dataset

Methodology

Tech Stack

Results

Key Findings

Project Structure

Getting Started

Prerequisites

Installation

Running the Notebook

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
KDDCup Data 10 Percent.csv		KDDCup Data 10 Percent.csv
NIDS.ipynb		NIDS.ipynb
README.md		README.md
Workshop Dataset.csv		Workshop Dataset.csv
kddcup.txt		kddcup.txt
requirements.txt		requirements.txt
training_attack_types.txt		training_attack_types.txt

Folders and files

Latest commit

History

Repository files navigation

🛡️ Network Intrusion Detection System (NIDS)

📌 Table of Contents

Overview

Dataset

Methodology

Tech Stack

Results

Key Findings

Project Structure

Getting Started

Prerequisites

Installation

Running the Notebook

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages