Hatch — Biophysical Protein Disorder Classifier

Disorder signals accumulate like water in a cup. Sustained, uncancelled disorder overflows it. Pure biophysics — no neural network, no sequence alignment, no training database.

89.70% accuracy. 0.052 ms per sequence. Four features. Pure arithmetic.

Forked from kmesiab/concept-model-experiment-protein-folding as an experimental implementation of the Concept Model (4-Layer Matrix) framework.

Performance

Version	Architecture	Mean F1	F1 Folded	F1 Disordered	Test Set
v1	Sliding window, full forgiveness	35.2%	34.9%	35.5%	400 seqs
v2	Consensus drain	46.9%	31.8%	61.9%	400 seqs
v3	Hysteresis thresholds	65.2%	64.3%	66.2%	400 seqs
v4	AUC-weighted scoring	70.0%	73.9%	66.1%	400 seqs
v5	Coordinate Descent calibration	75.4%	77.0%	73.8%	400 seqs
v6	Multi-scale ensemble	75.4%	77.0%	73.8%	400 seqs
v7	Global classifier + CD (7 features, k≥5)	89.4%	90.2%	88.6%	5,680 seqs
v7.1	4-feature unanimous vote	89.7%	89.8%	89.6%	5,680 seqs

The sliding window approach plateaued at 75.4% mean F1 after six versions of architectural improvement. A single architectural change — computing features over the full sequence instead of 30-residue windows — produced a +14% jump to 89.4%. A subsequent ablation study showed the 3 weakest features add noise rather than signal: dropping them and requiring unanimous agreement among the 4 strong features produces the best result at 89.70% with a simpler model.

Quick Start

# v7.1 — recommended (4 features, unanimous vote, 89.70% accuracy)
from inference_engine_v7_1 import load_v71_classifier

clf = load_v71_classifier()

# Hemoglobin alpha (folded)
result = clf.classify("MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPH...")
# {'prediction': 'FOLDED', 'conditions_met': 4, 'confidence': 1.0, 'elapsed_ms': 0.052}

# p53 TAD (disordered)
result = clf.classify("MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLML...")
# {'prediction': 'DISORDERED', 'conditions_met': 2, 'confidence': 0.50, 'elapsed_ms': 0.041}

The Features

Hatch classifies proteins using biophysical properties computed as whole-protein averages. An ablation study identified a clean two-tier structure:

Tier 1 — The Four Signal Features (v7.1)

A protein is predicted FOLDED if all 4 conditions are met (unanimous vote). These 4 features alone achieve 89.70% accuracy.

Feature	Window AUC	Direction	CD Threshold	Biological Rationale
Bulky Hydrophobic Freq	0.838	high=folded	0.2607	Hydrophobic core packing requires W, C, F, Y, I, V, L
Shannon Entropy	0.785	high=folded	3.8419	Folded proteins require diverse amino acid composition
Flexibility	0.778	low=folded	0.8345	Rigid backbone is a prerequisite for stable tertiary structure
Hydrophobicity	0.764	high=folded	0.4030	Hydrophobic collapse drives folding

Tier 2 — Noise Features (dropped in v7.1)

These 3 features have AUC ≤ 0.62 at the window level. Including them in the vote dilutes the signal from Tier 1 and slightly reduces overall accuracy.

Feature	Window AUC	Why Dropped
Proline Frequency	0.617	Marginal discriminative power; adds noise to the vote
Net Charge	0.599	Single charged residues dominate at short sequence lengths
H-Bond Potential	0.598	Barely above random; 3.45× weaker than Bulky Hydrophobic

Ablation Results

Configuration	Accuracy	Mean F1
Bulky Hydrophobic alone	84.24%	84.24%
Top 4 features, k≥3	86.46%	86.15%
Top 4 features, k≥4 (unanimous)	89.70%	89.70%
Full 7 features, k≥5 (v7)	89.47%	89.41%

The Hydrophobic Core alone — a single threshold on a single feature — already outperforms PONDR-FIT (81%), IUPred (80%), and DISOPRED3 (82%). The other three Tier 1 features add the remaining 5% by catching edge cases where hydrophobic composition is atypical.

AUC values measured on 794,870 training windows (646,623 folded, 148,247 disordered).

The Scale Inversion: Shannon Entropy wins globally (85.51% solo) while Bulky Hydrophobic wins locally (AUC=0.838 at W=30). The same two features, the same proteins — the ranking inverts depending on the scale of measurement. Entropy is the global blueprint. Hydrophobicity is the local engine. See docs/SCIENCE.md for the full analysis.

The v7.1 Architecture

# Whole-protein global feature averages
features = compute_features(seq)   # 4-element float vector

# Unanimous vote: all 4 conditions must be met
conditions_met = sum(
    features[feat] >= threshold[feat] if folded_is_high[feat]
    else features[feat] <= threshold[feat]
    for feat in FEATURE_NAMES
)

return 'FOLDED' if conditions_met == 4 else 'DISORDERED'

O(n) time. No GPU. No external database. No sequence alignment. The simplest correct version of HATCH.

The Experiment: v1 → v7

This classifier was not designed from scratch. It was discovered through seven iterations of controlled experimentation, each testing a specific hypothesis about what was limiting performance.

v1: The "Entropy Eraser" (35.2% Mean F1)

The first classifier used a sliding window state machine: a "cup" fills when a window scores below threshold, and overflows to predict DISORDERED. The critical flaw was Full Forgiveness — a single passing window reset the cup to zero instantly, destroying all accumulated disorder evidence the moment the sequence encountered one ordered patch.

v2: Consecutive Window Consensus Drain (+11.7%)

Hypothesis: A single passing window should not erase disorder evidence. Sustained order requires multiple consecutive passing windows.

The cup now only drains after N=2 consecutive passing windows. Mean F1 jumped from 35.2% to 46.9%. Disordered F1 improved by +26.4%. The N=2 optimum is biologically meaningful: two consecutive W=30 windows represent ~31–35 residues — the length of a stable alpha-helix or beta-strand pair. The data independently confirmed that the minimum "sustained order" signal is approximately one secondary structure element.

v3: Hysteresis Asymmetric Thresholds (+18.3%)

Hypothesis: Marginal order (score=3/7) should stop the cup from filling even if it cannot drain it.

Decoupling fill and drain thresholds (T_fill=3, T_drain=2) prevents the system from "chattering" at the decision boundary — the engineering concept of hysteresis applied to biophysical classification. Mean F1 jumped from 46.9% to 65.2%. The dominant driver was T_fill=3 — a phase transition, not a gradual improvement.

v4: AUC-Weighted Feature Scoring (+4.8%)

Hypothesis: H-Bond Potential (AUC=0.598) should not have the same vote as Bulky Hydrophobic Frequency (AUC=0.838).

Replacing the binary feature sum with AUC-weighted scores improved Mean F1 from 65.2% to 70.0%. The unexpected finding: Bulky Hydrophobic Frequency, not Shannon Entropy, is the single most discriminative feature. The Hydrophobic Core signal dominates because disordered proteins structurally cannot maintain a dense hydrophobic core.

v5: Coordinate Descent Per-Feature Calibration (+1.4%)

Coordinate Descent over 7 per-feature thresholds (25 candidates each, ±30% of distribution overlap) improved Mean F1 from 70.0% to 75.4%. The dominant shift: proline_freq threshold dropped 50% — the midpoint was treating proteins with 3–5% proline as "disordered" when low proline is actually a folded signal.

v6: The Wall

Multi-scale ensembles (W=20/30/40), positional terminal de-weighting, and the Uversky charge-hydrophobicity ratio all failed to improve performance. The Uversky ratio — which achieves AUC=0.85–0.90 at the full-sequence level in the literature — scored only AUC=0.57 at W=30. The signal is macroscopic.

The architectural conclusion: The sliding window ceiling is not a tuning failure. The 7 biophysical features are global thermodynamic properties. At W=30, a single charged residue can dominate the net_charge of a window. The sliding window approach was treating local fluctuations as global disorder evidence.

v7: Global Classifier + Coordinate Descent (+14.0%)

Computing the 7 features over the entire protein sequence instead of 30-residue windows produced a +14% jump to 89.4% mean F1. The CD converged in just 2 passes (vs 4 passes for the window classifier) because the full-sequence averaging already removes the inter-feature correlations that required multiple passes to resolve at the window level.

Key CD finding: Bulky Hydrophobic Frequency threshold did not move at all. The dominant feature was already perfectly calibrated at the midpoint. All gains came from secondary features where the midpoints were systematically biased — H-Bond Potential (−0.0575) and Shannon Entropy (+0.0780).

Key Scientific Findings

1. The Biophysical Information Horizon. The sliding window classifier has a hard ceiling at ~75% mean F1 for this feature set. The ceiling exists because the 7 features are macroscopic thermodynamic properties that require full-sequence averaging to be discriminative. This is the "Scale Mismatch": the Uversky ratio achieves AUC=0.85 at the protein level but only AUC=0.57 at W=30.

2. The Hydrophobic Core is the dominant signal. Bulky Hydrophobic Frequency (AUC=0.838, Cohen's d=1.39) is the single most discriminative feature. Disordered proteins cannot maintain a dense hydrophobic core because it requires stable tertiary structure. This is a first-principles measurement of a known biochemical principle.

3. The N=2 structural invariant. The optimal consecutive window requirement is N=2, corresponding to ~31–35 residues — the length of one secondary structure element. The grid search found this independently across multiple parameter sweeps.

4. Coordinate Descent converges faster on global features. 4 passes for window-level calibration vs 2 passes for global calibration. The full-sequence averaging removes the inter-feature correlations that required multiple passes to resolve at the window level.

5. The dominant feature needs no calibration. Bulky Hydrophobic Frequency threshold did not move during CD optimization. The midpoint was already the optimal decision boundary for the most discriminative feature. The physics was right; we just needed to measure it carefully enough to see it.

6. Entropy is the global blueprint; hydrophobicity is the local engine. Shannon Entropy alone achieves 85.51% global accuracy, outperforming Bulky Hydrophobic alone (84.24%) — despite Bulky Hydrophobic having higher window-level AUC (0.838 vs 0.785). The ranking inverts at the global scale because sequence complexity is the information-theoretic prerequisite for folding: a protein cannot encode a unique 3D structure without sufficient amino acid diversity. See docs/SCIENCE.md.

Experiment Documentation

Full per-version experiment documentation is in docs/EXPERIMENTS.md.

Document	Description
docs/EXPERIMENTS.md	Full experiment index with summary table and key findings
docs/EXP_V1.md	v1: Full-Forgiveness baseline — the "Entropy Eraser" failure mode
docs/EXP_V2.md	v2: Consecutive Window Consensus Drain — +26.4% disordered F1
docs/EXP_V3.md	v3: Hysteresis Asymmetric Thresholds — the Hydraulic Phase Transition
docs/EXP_V4.md	v4: AUC-Weighted Feature Scoring — Bulky Hydrophobic is dominant
docs/EXP_V5.md	v5: Coordinate Descent Calibration — plateau entry at 75.4%
docs/EXP_V6.md	v6: Multi-Scale Ensemble + Uversky Ratio — plateau confirmed
docs/EXP_V7.md	v7: Global Classifier + CD — 89.47% accuracy
docs/SCIENCE.md	Deep-dive: entropy vs hydrophobicity, scale mismatch, robustness analysis
docs/ROADMAP.md	v8+ plan: Hybrid architecture targeting 91%+

Repository Structure

hatch/
├── inference_engine_v7_1.py     ← Production classifier (use this) ✓
├── inference_engine_v7.py       ← 7-feature global classifier
├── inference_engine_v5.py       ← Best sliding window classifier
├── global_classifier.py         ← Global classifier training + evaluation
├── global_cd_optimizer.py       ← Coordinate Descent optimizer (global)
├── training_engine.py           ← Feature computation + windowed training
├── feature_importance_scan.py   ← Per-feature AUC scan on 794,870 windows
├── feature_threshold_optimizer.py ← Coordinate Descent optimizer (windows)
├── global_thresholds_v7_cd.json ← CD-optimized thresholds (v7)
├── optimized_thresholds_v5.json ← CD-optimized thresholds (v5)
├── hatch_v7_final.png           ← Final experiment summary visualization
└── docs/
    ├── EXPERIMENTS.md           ← Experiment index + summary table
    ├── SCIENCE.md               ← Deep-dive: entropy vs hydrophobicity finding
    ├── ROADMAP.md               ← v8+ directions
    └── EXP_V1.md → EXP_V7.md   ← Per-version experiment docs

Install & Run

pip install numpy pandas scikit-learn matplotlib requests

# Classify a sequence (v7.1 — production, recommended)
python3 inference_engine_v7_1.py

# Run the full v7 pipeline (downloads data, trains, evaluates)
python3 global_classifier.py
python3 global_cd_optimizer.py

# Reproduce sliding window experiments
python3 grid_search_v5.py

Data Sources

Folded: RCSB PDB — representative non-redundant chains
Disordered: DisProt — experimentally validated intrinsically disordered proteins

Reference

If you use this code, please credit this repository and the original Concept Model framework:

Mesiab, K. (2024). Emergent Concept Modeling: A Paradigm Shift in AI. Substack

Built with first principles, not black boxes.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
proteome		proteome
.gitignore		.gitignore
Concept Model Experiment - Protein Folding Predictions.ipynb		Concept Model Experiment - Protein Folding Predictions.ipynb
FINDINGS_V2.md		FINDINGS_V2.md
FINDINGS_V3.md		FINDINGS_V3.md
FINDINGS_V4.md		FINDINGS_V4.md
FINDINGS_V5.md		FINDINGS_V5.md
LINKEDIN_POST.md		LINKEDIN_POST.md
README.md		README.md
best_config.json		best_config.json
best_config_v2.json		best_config_v2.json
best_config_v3.json		best_config_v3.json
best_config_v4.json		best_config_v4.json
feature_importance.json		feature_importance.json
feature_importance.png		feature_importance.png
feature_importance_scan.py		feature_importance_scan.py
feature_threshold_optimizer.py		feature_threshold_optimizer.py
generate_readme_graphs.py		generate_readme_graphs.py
global_cd_optimizer.py		global_cd_optimizer.py
global_classifier.py		global_classifier.py
global_thresholds_v7.json		global_thresholds_v7.json
global_thresholds_v7_cd.json		global_thresholds_v7_cd.json
grid_search.py		grid_search.py
grid_search_v2.py		grid_search_v2.py
grid_search_v3.py		grid_search_v3.py
grid_search_v4.py		grid_search_v4.py
grid_search_v5.py		grid_search_v5.py
grid_search_v6.py		grid_search_v6.py
grid_search_v6_final.py		grid_search_v6_final.py
hatch_thresholds_W20.json		hatch_thresholds_W20.json
hatch_thresholds_W25.json		hatch_thresholds_W25.json
hatch_thresholds_W30.json		hatch_thresholds_W30.json
hatch_thresholds_W40.json		hatch_thresholds_W40.json
hatch_v2_analysis.png		hatch_v2_analysis.png
hatch_v3_analysis.png		hatch_v3_analysis.png
hatch_v4_analysis.png		hatch_v4_analysis.png
hatch_v5_analysis.png		hatch_v5_analysis.png
hatch_v7_final.png		hatch_v7_final.png
inference_engine.py		inference_engine.py
inference_engine_v2.py		inference_engine_v2.py
inference_engine_v3.py		inference_engine_v3.py
inference_engine_v4.py		inference_engine_v4.py
inference_engine_v5.py		inference_engine_v5.py
inference_engine_v6.py		inference_engine_v6.py
inference_engine_v7.py		inference_engine_v7.py
inference_engine_v7_1.py		inference_engine_v7_1.py
optimization_report.csv		optimization_report.csv
optimization_report_v2.csv		optimization_report_v2.csv
optimization_report_v3.csv		optimization_report_v3.csv
optimization_report_v4.csv		optimization_report_v4.csv
optimization_report_v5.csv		optimization_report_v5.csv
optimization_report_v6.csv		optimization_report_v6.csv
optimized_thresholds_v5.json		optimized_thresholds_v5.json
prep_data.py		prep_data.py
readme_features.png		readme_features.png
readme_progression.png		readme_progression.png
readme_robustness.png		readme_robustness.png
training_engine.py		training_engine.py
visualize_overflow.py		visualize_overflow.py
visualize_v2.py		visualize_v2.py
visualize_v3.py		visualize_v3.py
visualize_v4.py		visualize_v4.py
visualize_v5.py		visualize_v5.py
visualize_v7.py		visualize_v7.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hatch — Biophysical Protein Disorder Classifier

Performance

Quick Start

The Features

Tier 1 — The Four Signal Features (v7.1)

Tier 2 — Noise Features (dropped in v7.1)

Ablation Results

The v7.1 Architecture

The Experiment: v1 → v7

v1: The "Entropy Eraser" (35.2% Mean F1)

v2: Consecutive Window Consensus Drain (+11.7%)

v3: Hysteresis Asymmetric Thresholds (+18.3%)

v4: AUC-Weighted Feature Scoring (+4.8%)

v5: Coordinate Descent Per-Feature Calibration (+1.4%)

v6: The Wall

v7: Global Classifier + Coordinate Descent (+14.0%)

Key Scientific Findings

Experiment Documentation

Repository Structure

Install & Run

Data Sources

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hatch — Biophysical Protein Disorder Classifier

Performance

Quick Start

The Features

Tier 1 — The Four Signal Features (v7.1)

Tier 2 — Noise Features (dropped in v7.1)

Ablation Results

The v7.1 Architecture

The Experiment: v1 → v7

v1: The "Entropy Eraser" (35.2% Mean F1)

v2: Consecutive Window Consensus Drain (+11.7%)

v3: Hysteresis Asymmetric Thresholds (+18.3%)

v4: AUC-Weighted Feature Scoring (+4.8%)

v5: Coordinate Descent Per-Feature Calibration (+1.4%)

v6: The Wall

v7: Global Classifier + Coordinate Descent (+14.0%)

Key Scientific Findings

Experiment Documentation

Repository Structure

Install & Run

Data Sources

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages