Status: Baseline
Mean F1: 35.2%
F1 Folded: 34.9% | F1 Disordered: 35.5%
Best config: W=30, K=3, score_threshold=5
A sliding window state machine that accumulates "disorder evidence" in a cup and overflows when the cup exceeds a threshold K can classify proteins as folded or disordered. A single passing window (one where the sequence meets ≥ score_threshold of the 7 biophysical conditions) resets the cup to zero — full forgiveness.
cup_level = 0
for each window w_i:
score = count(features meeting folded condition) # 0–7
if score >= score_threshold:
cup_level = 0 # Full reset — the Hatch opens
else:
cup_level += 1
if cup_level > K:
return DISORDERED
return FOLDED
The classifier performed at near-random levels (35.2% mean F1) across all 60 parameter combinations. The grid search found no configuration that significantly outperformed random chance. Both F1 scores were approximately equal, indicating the classifier was not learning a meaningful signal.
The "Full-Forgiveness" mechanism is an Entropy Eraser. Disordered proteins frequently contain short hydrophobic patches or low-complexity regions that happen to score ≥ score_threshold in a single window. When this occurs, the cup resets to zero, destroying all accumulated evidence of disorder. The classifier has no memory of sustained disorder — a single ordered window anywhere in the sequence is sufficient to prevent overflow.
The cup trace for p53 TAD (a textbook disordered protein) showed the cup resetting to zero at position 9 (a short hydrophobic patch in the TAD), then slowly refilling, then resetting again. The protein was classified as FOLDED despite being one of the most well-characterized disordered proteins in the literature.
The failure mode is not in the features or the thresholds — it is in the state machine's memory model. Full forgiveness is biologically incorrect: a single ordered window in a disordered protein is not evidence of global fold. It is evidence of a local hydrophobic patch, which is common in intrinsically disordered proteins.
This experiment established the baseline and identified the specific failure mode that all subsequent versions address.