Hybrid risk scoring engine combining an XGBoost attack predictor, an NLP payload classifier, and CVE-enriched features.
For complete technical coverage of every file under ml/, see ml/ML_MODULE_DOCUMENTATION.md.
The ML module provides the intelligent core of AIris Security. It is used in two ways:
- Embedded in the backend —
ml_service.pyloads trained models at startup and runsrun_hybrid_ml()after every scan - Standalone microservice —
inference_api.pyexposes a FastAPI endpoint on port 9000 (optional)
Core capabilities:
| Capability | Implementation |
|---|---|
| Payload classification | TF-IDF + Logistic Regression (payload_classifier.joblib) |
| Attack type prediction | Multi-class classifier (attack_predictor.joblib) |
| Risk scoring | Hybrid XGBoost + NLP blend, then scanner-evidence boosts |
| CVE context | NVD data enrichment via parse_cve_data.py |
ml/
├── src/
│ ├── __init__.py
│ ├── build_payload_dataset.py Build labelled payload CSV from raw sources
│ ├── build_attack_dataset.py Build attack feature dataset from scan fixtures
│ ├── data_ingest.py Data loading and validation utilities
│ ├── feature_pipeline.py Feature extraction shared by training + inference
│ ├── parse_cve_data.py Parse NVD JSON feeds → processed CSV
│ ├── train_payload_classifier.py Train TF-IDF + LogReg payload classifier
│ ├── train_attack_predictor.py Train multi-class attack predictor
│ ├── inference.py Inference engine (used by backend ml_service)
│ └── inference_api.py Optional standalone FastAPI ML microservice
│
├── models/ Saved models (output of training scripts)
│ ├── payload_classifier.joblib
│ ├── attack_predictor.joblib
│ └── attack_label_encoder.joblib
│
├── data/
│ ├── raw/ Source data (not committed)
│ │ ├── cve/ NVD JSON feeds
│ │ └── payloads/ Raw payload text files
│ └── processed/ Cleaned CSVs ready for training
│ ├── payloads.csv
│ └── attack_features.csv
│
├── tests/
│ ├── test_inference.py Unit tests for inference.py
│ └── test_real_site_simulation.py Integration tests against fixture scan data
│
├── notebooks/ EDA and evaluation notebooks
├── reports/ Generated metrics / confusion matrices
├── requirements.txt
└── README.md
cd ml
python -m venv .venv
# Windows
.venv\Scripts\Activate.ps1
# Linux / Mac
# source .venv/bin/activate
pip install -r requirements.txtKey dependencies:
numpy>=1.21
pandas>=1.3
scikit-learn>=1.0
xgboost>=1.5
joblib>=1.1
nltk>=3.6
tqdm>=4.62
matplotlib>=3.4
Download NVD JSON feeds and parse them:
# Download (example — 2023 feed)
# https://nvd.nist.gov/vuln/data-feeds
python src/parse_cve_data.py --input data/raw/cve/ --output data/processed/Output columns: cve_id, description, severity, cvss_score, attack_vector
Sources: PayloadsAllTheThings, SecLists, synthetically generated benign samples.
python src/build_payload_dataset.py
# Writes: data/processed/payloads.csvSample rows:
| payload | label |
|---|---|
' OR '1'='1' -- |
SQLI |
<script>alert(1)</script> |
XSS |
../../etc/passwd |
PATH_TRAVERSAL |
normal search query |
BENIGN |
Built from scan result fixtures:
python src/build_attack_dataset.py
# Writes: data/processed/attack_features.csvFeatures: open_port_count, critical_port_flag, nikto_warning_count, ssl_issues_flag, dir_critical_count, cve_avg_severity, ...
python src/train_payload_classifier.py- Algorithm: TF-IDF vectoriser + Logistic Regression
- Input: Raw payload strings
- Output classes:
SQLI,XSS,PATH_TRAVERSAL,COMMAND_INJECTION,BENIGN - Saved to:
models/payload_classifier.joblib - Typical accuracy: ~92 % on held-out test set
python src/train_attack_predictor.py- Algorithm: Multi-class XGBoost classifier
- Input: Numerical scan features from
feature_pipeline.py - Output classes:
SQLi,XSS,RCE,Path Traversal,Weak SSL,Open Port,NONE, ... - Saved to:
models/attack_predictor.joblib,models/attack_label_encoder.joblib
from ml.src.inference import predict_attack
result = predict_attack(findings, scanner_results)
# Returns: {attack_type, risk_score, confidence, predicted_attack_types, cve_context}uvicorn ml.src.inference_api:app --host 0.0.0.0 --port 9000{
"scan": {
"open_ports": [22, 80, 443],
"nikto_warnings": 5,
"ssl_issues": true,
"dir_critical_count": 2,
"scanner_text": "Possible SQLi detected in /search?q=",
"cve_list": [{"severity": 9.8}, {"severity": 7.5}]
}
}{
"predicted_attack": "SQLi",
"probabilities": {"SQLi": 0.91, "XSS": 0.05, "NONE": 0.04},
"risk_score": 84,
"explanation": "High SQL-related signals found."
}backend/app/services/ml_service.py implements the full hybrid pipeline:
Scanner findings
│
├── XGBoost attack predictor → attack_prediction / confidence
└── NLP Payload classifier → payload confidence score
│
Hybrid risk blend
│
Scanner evidence boosts:
• SSL weak ciphers +3 each (cap +15)
• Deprecated TLS +5 flat
• Exposed dir paths +2 each (cap +20)
• Critical dir finds +10 flat
│
clamp [0, 100]
│
risk_score
AIris provides 6 comprehensive ML metrics (1 baseline + 5 enhanced features) for deep security analysis:
- Range: 0-100
- Purpose: Overall threat severity assessment
- Calculation: Hybrid blend of ML confidence (60%) + scanner evidence (40%)
- Output: Integer score with color-coded severity (Critical/High/Medium/Low)
- Purpose: Shape of risk - how findings are distributed across severity levels
- Output Structure:
{
"critical": 5,
"high": 8,
"medium": 12,
"low": 6,
"informational": 3,
"total": 34,
"percentages": {"critical": 14.7, "high": 23.5, ...},
"shape": "top-heavy"
}- Shape Classifications:
top-heavy: ≥50% critical/high findings (urgent action required)balanced: Mixed distribution (steady remediation)low-heavy: ≥50% low/info findings (hardened target)
- Use Case: Understand risk concentration and prioritize remediation efforts
- Range: 0-100
- Purpose: Structural exposure measurement (independent of attack likelihood)
- Scoring Breakdown:
- Port exposure (30 pts): Each open port = +3 points
- Path exposure (25 pts): Exposed web paths = +2.5 points each
- Protocol weakness (25 pts): Deprecated TLS + weak ciphers = +5 points each
- Service visibility (20 pts): Services with CVEs = +4 points each
- Output: Integer score with exposure level (Minimal/Low/Moderate/High)
- Use Case: Measure attack surface before implementing security controls
- Purpose: Attack type classification with model uncertainty analysis
- Output Structure:
{
"primary": {"attack_type": "SQL Injection", "probability": 0.68},
"secondary": {"attack_type": "XSS", "probability": 0.22},
"uncertainty_gap": 0.46,
"confidence_level": "high"
}- Confidence Levels:
high: Gap ≥ 0.4 (clear primary threat)medium: Gap 0.2-0.4 (monitor evolving threats)low: Gap < 0.2 (multiple attack vectors likely)
- Use Case: Understand model certainty and prepare for multiple attack scenarios
- Range: 0-100
- Purpose: CVSS-inspired ease-of-exploitation metric
- Scoring Factors:
- Access Complexity (0-40): Remote service exposure
- Authentication Bypass (0-30): Auth vulnerability detection
- Impact Severity (0-30): CVE CVSS scores
- Output Structure:
{
"score": 78,
"level": "high",
"factors": {
"access_complexity": 30,
"authentication_required": 20,
"impact_score": 28
}
}- Levels: Critical (80+), High (60-79), Medium (40-59), Low (<40)
- Use Case: Assess immediate exploitation risk and prioritize patching windows
- Purpose: Prescriptive ranked action list with estimated risk reduction
- Output Structure:
[
{
"priority": 1,
"category": "Patch/Update",
"finding_count": 8,
"severity_breakdown": {"critical": 3, "high": 5},
"estimated_risk_reduction": 35,
"actions": [
"Update Apache to 2.4.59 (CVE-2024-1234)",
"Patch OpenSSL to 3.0.14 (CVE-2024-5678)"
]
}
]- Categories:
- Patch/Update: CVE-related vulnerabilities requiring software updates
- Configuration: SSL/TLS settings, headers, server misconfigurations
- Access Control: Exposed paths, open ports, permission issues
- Input Validation: SQLi, XSS, and other injection vulnerabilities
- Priority Calculation:
- Critical findings = 10 points each
- High findings = 5 points each
- Medium findings = 2 points each
- Boosted by exploitability level (1.3x-1.5x multiplier)
- Use Case: Create actionable remediation roadmap with estimated impact
cd ml
pytest tests/Tests cover:
test_inference.py— unit tests forpredict_attack()with fixture datatest_real_site_simulation.py— integration test simulating a full scan result
After training, evaluation reports are saved to reports/:
model_performance.html— interactive metrics (accuracy, F1, ROC)confusion_matrix.png— multi-class confusion matrix
Payload classifier metrics (typical):
| Metric | Value |
|---|---|
| Accuracy | ~92 % |
| F1 (macro) | ~0.91 |
| Precision | ~0.93 |
| Recall | ~0.90 |
All training data is sourced from public, open-licence repositories:
| Source | Licence | Use |
|---|---|---|
| PayloadsAllTheThings | MIT | Malicious payload samples |
| NVD / NIST | Public domain (US Govt) | CVE severity statistics |
| Synthetic benign samples | N/A — self-generated | Balance payload dataset |
| Kaggle SQLi/XSS datasets | CC0 / public | Additional payload labels |
No proprietary, private, or personally identifiable data is used. No payloads are executed against real systems. Models detect attacks — they do not generate them.
Last updated: March 2026 · v2.1.0