Online Algorithms for Real-Time Statistical Computations

Statistics Course Thesis — Cybersecurity

This repository contains the LaTeX source, Python experiments, and compiled PDF for a thesis investigating online algorithms for real-time statistical computation, with applications to intrusion detection and machine learning.

Abstract

This project investigates online algorithms for real-time statistical computations and their applications to cybersecurity and machine learning. We focus on streaming estimators (e.g. online mean and variance), change-point detection methods, and online learning techniques relevant for intrusion detection and anomaly detection in network traffic.

Repository Structure

├── main.tex                 # Root LaTeX document
├── main.pdf                 # Compiled thesis (tracked for convenience)
├── references.bib           # BibLaTeX bibliography
├── chapters/                # Chapter source files
│   ├── 1-Introduction.tex
│   ├── 2-Background.tex
│   ├── 3-Estimation.tex     # Online estimators (Welford, EMA)
│   ├── 4-Detection.tex      # CUSUM, EWMA control charts
│   ├── 5-Learning.tex       # SGD, online logistic regression
│   ├── 6-Study.tex          # NSL-KDD case study
│   └── 7-Conclusion.tex
├── experiment/              # Python experiment code
│   ├── experiment_nsl_kdd.py
│   └── requirements.txt
└── data/                    # NSL-KDD dataset
    ├── KDDTrain+.txt
    └── KDDTest+.txt

Building the Thesis

Requirements

TeX Live 2022+ or equivalent LaTeX distribution
Biber (for BibLaTeX bibliography processing)

Build Commands

# Recommended: use latexmk for automated builds
latexmk -pdf main.tex

# Manual build (if latexmk unavailable)
pdflatex main.tex
biber main
pdflatex main.tex
pdflatex main.tex

Running the Experiment

The case study compares batch vs online logistic regression on the NSL-KDD intrusion detection dataset.

Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
# venv\Scripts\activate   # Windows

# Install dependencies
pip install -r experiment/requirements.txt

Run

python experiment/experiment_nsl_kdd.py \
	--data-root data \
	--eta 0.01 \
	--lambda-reg 1e-4 \
	--bootstrap-runs 1000 \
	--bootstrap-seed 42 \
	--results-json experiment/results/latest_results.json

Key CLI flags:

--data-root, --train-file, --test-file select alternative NSL-KDD splits.
--eta, --lambda-reg, --threshold tune the online learner without editing code.
--bootstrap-runs, --bootstrap-seed request variability estimates via paired bootstrap resampling of the test stream.
--results-json controls where a machine-readable metrics artifact is stored (use --no-json to skip).

Expected Output

The script prints dataset stats, confusion matrices, and a publication-ready LaTeX table comparing:

Batch Logistic Regression — trained once on full training set with class_weight="balanced".
Online Logistic Regression (SGD) — pre-trained with a single pass and updated prequentially on the test stream.

The JSON artifact contains the same metrics, hyperparameters, confusion matrices, runtimes, and (when enabled) bootstrap summaries so that Table 6.1 and the reported confidence intervals can be regenerated directly from the repository.

Note: The executable code currently covers only the logistic-regression study from Chapter 6. The CUSUM/EWMA schemes discussed in Chapter 4 are presented at the theoretical level and do not yet have accompanying simulation scripts in this repository.

Key Topics Covered

Chapter	Topic	Key Algorithms
3	Online Estimation	Welford's algorithm, EMA
4	Change Detection	CUSUM, EWMA control charts
5	Online Learning	SGD, online logistic regression
6	Case Study	NSL-KDD intrusion detection

License

Code (experiment/): MIT License
Dataset (data/): Public domain (see data/README.md for attribution)

See LICENSE.md for full details.

Author

Aldo Ristori
Master of Science in Cybersecurity — Statistics 25/26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Online Algorithms for Real-Time Statistical Computations

Abstract

Repository Structure

Building the Thesis

Requirements

Build Commands

Running the Experiment

Setup

Run

Expected Output

Key Topics Covered

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
chapters		chapters
data		data
experiment		experiment
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
main.pdf		main.pdf
main.tex		main.tex
references.bib		references.bib

Folders and files

Latest commit

History

Repository files navigation

Online Algorithms for Real-Time Statistical Computations

Abstract

Repository Structure

Building the Thesis

Requirements

Build Commands

Running the Experiment

Setup

Run

Expected Output

Key Topics Covered

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages