Advanced SIEM Attack Detection and Reconstruction Using Transformers

Assignment Summary

This project implements a Transformer-based framework for detecting and reconstructing multi-stage cyber attacks from Security Information and Event Management (SIEM) logs. The goal is to model event sequences at the session level, identify high-risk attack activity, and reconstruct plausible attack chains using attention-based correlations.

The system performs:

Sequence-level attack detection using a Transformer encoder
Risk-based supervision derived from SIEM metadata
Post-hoc attack reconstruction using attention weights
Greedy decoding and graph-based correlation analysis
Analyst-friendly visualizations of reconstructed attack chains

The project satisfies all required deliverables of the assignment, including probability-based inference, reconstructed attack chains in JSON format, and visual attack-chain representations.

Dataset Source

The experiments use the Advanced SIEM Dataset, a large-scale synthetic SIEM log dataset hosted on Hugging Face:

Dataset URL: https://huggingface.co/datasets/darkknight25/Advanced_SIEM_Dataset

Project Structure

Advanced-SIEM-Transformer/
│
├── data/
│   └── 1_data_loading.py          # Preprocessing and sequence construction
│
├── src/
│   └── model/
│       └── transformer.py         # Transformer encoder with attention extraction
│
├── train_transformer.py           # Training, evaluation, and reconstruction
├── reconstructed_chains.py        # Generates reconstructed_chains.json
│
├── processed/                     # Generated preprocessing artifacts (ignored by git)
├── results/                       # Model outputs and figures (ignored by git)
│
├── README.md
└── .gitignore

How to Run

1. Environment Setup

This project requires Python 3.10+ and the following core libraries:

pip install numpy pandas scikit-learn torch matplotlib networkx

2. Preprocessing

Run the preprocessing script to:

Load the dataset
Normalize timestamps
Construct sessions
Build fixed-length sequences
Encode features and labels

python3 data/1_data_loading.py

This step generates artifacts in the processed/ directory, including:

sequences_cat.npy
sequences_num.npy
sequence_labels.npy
sequence_event_ids.pkl

3. Training and Evaluation

Train the Transformer model, evaluate performance, and generate reconstruction artifacts:

python3 train_transformer.py

This script performs:

Model training with class-weighted loss
Evaluation with accuracy, precision, recall, F1-score, and ROC-AUC
Attention extraction for reconstruction
Greedy decoding of attack paths
Graph reconstruction of event correlations

Outputs are saved in the results/ directory.

4. Reconstructed Attack Chains

Generate analyst-ready reconstructed attack chains in JSON format:

python3 reconstructed_chains.py

This produces:

results/reconstructed_chains.json

The JSON file contains reconstructed attack chains with:

Ordered event IDs
Anomaly scores
Attention-based influence values
Reconstruction method metadata

Example Output Figures

The following visual artifacts are generated automatically:

File	Description
`confusion_matrix.png`	Classification performance
`roc_curve.png`	ROC curve for sequence-level detection
`mean_attention.npy`	Mean attention matrix (used for heatmaps)
`chain_1_timeline.png`	Timeline view of reconstructed attack chain
`chain_1_graph.png`	Graph visualization of reconstructed attack chain

These figures are suitable for direct inclusion in the assignment report.

Notes

Due to the synthetic and highly imbalanced nature of the dataset, ROC-AUC should be interpreted with caution.
Attention weights are used as a proxy for event correlation and are not guaranteed to represent true causality.
Reconstruction is performed post-hoc and does not influence model training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced SIEM Attack Detection and Reconstruction Using Transformers

Assignment Summary

Dataset Source

Project Structure

How to Run

1. Environment Setup

2. Preprocessing

3. Training and Evaluation

4. Reconstructed Attack Chains

Example Output Figures

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
results		results
src		src
.gitignore		.gitignore
README.md		README.md
plot_attack_graph.py		plot_attack_graph.py
plot_attention_heatmap.py		plot_attention_heatmap.py
reconstructed_chains.py		reconstructed_chains.py
requirements.txt		requirements.txt
train_transformer.py		train_transformer.py

Yazan-Hamad/Advanced-SIEM-Transformer

Folders and files

Latest commit

History

Repository files navigation

Advanced SIEM Attack Detection and Reconstruction Using Transformers

Assignment Summary

Dataset Source

Project Structure

How to Run

1. Environment Setup

2. Preprocessing

3. Training and Evaluation

4. Reconstructed Attack Chains

Example Output Figures

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages