The project is no longer short on individual components. The repo has working sampling/eval harnesses, strict sequence validators, ESM proxy scoring, mined historical failures, Phase 7 structural evidence, and a hardened Phase 8 DPO dataset. The gap is that these pieces are still bricks, not a finished house: we need a coherent training-and-validation loop that teaches the model what physical protein realism means instead of repeatedly asking SFT to imitate narrow positive pockets.
Current checkpoint:
- repo state:
maintagged atphase8-natural-positive-dpo-checkpoint - active local dataset:
data/phase8_dpo/dpo_preferences_hybrid_10k.jsonl - dataset hash:
083ccc9ffa4c66f43451abc26664f548262162d3ab7ff5eba120ffd0de1b0e9c - rows:
10,000 - chosen side: reviewed natural PETase/cutinase reference records only
- rejected side: Phase 7 fold-failed generated hard negatives plus length-preserving synthetic artifact replacements
- DPO smoke status: completed
- DPO pilot status:
3,000pairs trained for1epoch - DPO pilot checkpoint:
tinker://68b86c30-7c34-5c97-bb55-01e139610267:train:0/weights/phase8-bio-dpo-pilot-3k-final - current DPO-only evidence: one completed post-DPO slice,
p12, temperature0.8, seed7 - current structural evidence: folded subset remains weak,
0 / 5CA-triad passes and mean pLDDT25.61-36.27 - interpretation: this is a budget-limited warning slice, not a high-resolution estimate of DPO-only yield
- canonical Phase 8 pilot note: phase8_dpo_pilot_readout.md
- The old positive-only SFT path has a clear empirical limit. It can learn local sequence motifs and shortcut geometry proxies, but it repeatedly failed to produce durable clean single-domain fold behavior.
- Phase 7 converted that failure into useful supervision. ColabFold separated the natural cutinase control from the generated panel, showing that high sequence-level scores were not enough.
- The April 29 natural-positive DPO rebuild fixed the largest paid-run blocker: generated fold-failed rows are no longer used as chosen positives.
- The active repo surface is clean enough to iterate from: current code, current docs, ignored local data, and archived historical clutter have distinct roles.
- The May 30 paid DPO pilot showed that the custom-loss DPO path is operational: DPO loss fell and reward margins rose across the 3k-pair run.
- We do not yet have evidence that one DPO/preference pass will produce a foldable novel enzyme.
- The one completed DPO-only eval slice is too small to estimate DPO-only yield.
- The folded DPO subset did not validate structurally, but the slice is too thin to determine whether this is objective failure, sampling variance, prompt/temperature sensitivity, or selection/folding noise.
- We do not yet know whether sparse OPD is sufficient without full-vocabulary logits.
- The project has not yet shown closed-loop improvement from structural failure evidence back into generation quality.
- ESM and sequence-level catalytic geometry remain useful filters, but they are not proof of fold or function.
The next scientific shape should be DPO characterization plus sparse OPD/multi-teacher comparison:
- Use the base PLM as the natural protein-language prior.
- Use natural PETase/cutinase records as the positive manifold anchor.
- Keep the May 30 DPO checkpoint as the active DPO-only baseline.
- When budget permits, run more DPO-only slices across prompts, temperatures, and seeds to locate where DPO helps or fails.
- Use fold-failed generated artifacts and new low-confidence generated candidates as explicit rejected examples.
- Prepare sparse OPD as the comparison branch for structural hallucination versus novelty.
- Validate DPO versus DPO + sparse OPD on matched compact structural slices before scaling either path.
This direction is stronger than another SFT replay because it directly targets the failure mode we actually observed: the model can satisfy local sequence screens while missing global structural realism.
- infrastructure/readiness: green for DPO; yellow for sparse OPD because teacher traces and target build still need to run
- dataset/preflight quality: green for DPO, not for a production claim
- immediate novel functional protein odds: yellow/red until a folded post-DPO candidate exists
- in-silico foldable novel candidate odds: yellow, plausible but unproven
- novel ML discovery odds: yellow/green, because the failure mode, dataset construction, and validation loop are already concrete
- main risk: mistaking one budget-limited DPO slice for a verdict, in either direction
The practical conclusion: proceed, but treat solo DPO as unresolved. The next paid work should either characterize DPO-only at higher resolution or run a matched sparse-OPD comparison, depending on budget.
The project is at a strategy reset.
The stage-b-lite mined-data engine, strict validators, robustness harness, and local repair tooling all work operationally. The scientific issue is that the current Kimi sampling plus strict-SFT plus repair loop is not reliably producing or preserving a robust PETase/cutinase-family manifold at the short-context p12/p24 gates.
Current canonical mined pool:
1,597,184raw candidates across the first1.0Mtranche plus the596,992add-on tranche179exact-unique functional hits54exact-unique family-faithful hits197lineage clusters at0.85
Core references:
- reports/raft/topoff1m-a-stageb-lite-1p6m-postprocess-20260329/bundle_summary.json
- reports/raft/topoff1m-a-stageb-lite-1p6m-postprocess-20260329/retrain_readiness_selected_only.json
v7 remains the best empirical branch.
- stage-A checkpoint:
tinker://59c10b59-45ec-5ed4-92a9-7c06e4241d0b:train:0/weights/pearl-micro-sft-topoff1m-a-strict-core-v7-repair-stagea-lr1e6-ep3
- stage-B-lite checkpoint:
tinker://7bb7b832-45c0-5ac0-8cea-1c3bc3f1d7ea:train:0/weights/pearl-micro-sft-topoff1m-a-strict-core-v7-repair-stageb-lite-lr5e7-ep1
- stage-A
p48smoke passed:- hits by seed
[0, 2, 1] - prompt coverage
3 / 48
- hits by seed
- full stage-B-lite robustness failed:
p12:[0, 0, 0], coverage0 / 12p24:[0, 2, 0], coverage2 / 24p48:[0, 3, 1], coverage4 / 48
Interpretation:
v7proved that repair-derived strict data can transfer, but it did not prove the model learned a broad, durable manifold.
References:
- reports/raft/topoff1m-a-strict-core-v7-repair-20260412/strict_core_v7_stage_a_summary.json
- reports/raft/topoff1m-a-strict-core-v7-repair-20260412/strict_core_v7_stage_b_lite_summary.json
- reports/robustness/pearl-topoff1m-a-strict-core-v7-repair-stageb-lite-robustness-2phase-p12p24p48-t08-s41s53s67/robustness_summary.json
v8 was built to broaden v7 with bucket-capped strict selection and more bridge-anchor diversity. It failed the intended test.
- stage-A checkpoint:
tinker://0e007439-8486-58fd-8a5a-9769ced7e0b2:train:0/weights/pearl-micro-sft-topoff1m-a-strict-core-v8-coverage-stagea-lr1e6-ep3
- stage-B-lite checkpoint:
tinker://789989aa-dbe7-522b-a82a-1bccd9060a06:train:0/weights/pearl-micro-sft-topoff1m-a-strict-core-v8-coverage-stageb-lite-lr5e7-ep1
- stage-A
p48smoke:- seed
41:3functional,2family-faithful - seed
53:1functional,0family-faithful - seed
67:0functional,0family-faithful
- seed
- full stage-B-lite robustness:
p12: functional[0, 0, 0], family-faithful[0, 0, 0]p24: functional[0, 0, 0], family-faithful[0, 0, 0]p48: functional[0, 3, 3], family-faithful[0, 0, 0]
- stage-A p12/p24 diagnostic:
p12: functional[0, 0, 0], family-faithful[0, 0, 0]p24: functional[0, 0, 0], family-faithful[0, 0, 0]
Interpretation:
Stage B was not the only problem. The
v8stage-A generator itself failed the short-context manifold test.
The v9 rescue tried to repair v8 p12/p24 near-misses locally before training a new branch.
Config:
- configs/experiments/repair/topoff1m_a_v9_p12p24_repair_20260421.json
- configs/experiments/strict/topoff1m_a_strict_core_v9_p12p24_repair_20260421.json
Repair pool:
12source audits134geometry-dominant near-misses0tier-2 hits- mean ESM score
31.6049 - mean geometry score
0.5971
Native repair:
134hits processed47,489local variants evaluated79loose survivors- max survivor ESM
99.08 - mean survivor ESM
95.943
Strict validation:
0strict shortlist0strict bridge0strict family0strict consensus79 / 79rejected
Dominant rejection reasons:
79failed family core screen79missing family serine motif79outside family length band61above strict catalytic gap limit
Readiness:
ready_for_retrain: false- base positives:
0 - survivor positives:
0
Interpretation:
The repair pass found stable geometry-ish sequences, but they were not strict PETase/cutinase-family sequences. The failure is family-manifold drift, not runtime failure.
The manifold pivot produced a validator-first offline constructor and then a capped v1.1 p24-only train/gate. The branch completed operationally, but failed scientifically.
Artifacts:
- postmortem report: reports/analysis/manifold_v11_gate_postmortem_20260423/audit.md
- robustness summary: reports/robustness/pearl-topoff1m-a-manifold-v11-stagea-gate-p24-t08-s41s53s67-c128/robustness_summary.json
- gate decision: reports/robustness/pearl-topoff1m-a-manifold-v11-stagea-gate-p24-t08-s41s53s67-c128/p24_gate_decision.json
Gate result:
- completed runs:
3 - tier-2 hits by seed:
[0, 0, 0] - prompt coverage:
0 / 24 - selected candidates:
72 - raw candidates audited:
9,216 - raw single-motif candidates:
3,030 - raw geometry-valid candidates:
218 - raw ESM-valid candidates:
41 - raw single-motif plus geometry plus ESM candidates:
0
Interpretation:
v1.1 did not fail because the selector missed a hidden strict candidate. The sampled pool itself had no candidate satisfying the tier-2 proxy conjunction. The branch learned proxy fragments, especially stability-only and geometry-only rows, but did not enter the strict PETase/cutinase functional intersection.
The v1.2 offline lane builder has split the failed v1.1 pool into actionable lanes:
43geometry-valid but ESM-failing rows41ESM-valid but geometry-failing rows2,946single-motif background negatives6,186motif-failure negatives55selected length-offtarget failures
These lanes are diagnostic/constructor inputs only. They are not a paid training set until offline replay produces nonzero single-motif plus geometry plus ESM candidates.
The first v1.2 offline repair-frontier pass produced a narrow positive:
4,678strict pre-ESM repaired candidates580prompt-length/core-screen trainable pre-ESM candidates- geometry-valid/ESM-failing smoke:
0 / 32ESM-gate passes - ESM-valid/geometry-failing smoke:
24 / 24ESM-gate passes - ESM-valid smoke score range: min
94.93, mean95.9562, max96.82
Interpretation:
v1.2 has shown that geometry can be repaired into high-ESM candidates for at least one ESM-valid source scaffold. The first ready smoke was too narrow because all
24prompt-length-valid candidates came from one source row.
The follow-up one-per-source diagnostic changed the bottleneck:
41ESM-valid/geometry-failing source representatives scored after repair40 / 41passed ESM>=8535 / 41passed ESM>=95- only
1 / 41remained ready under the original prompt-length gate
Interpretation:
The ESM-valid lane has real source breadth after geometry repair. The failure is prompt/length conditioning, not family-space viability.
The v1.2 breadth selector and length-retargeted curriculum are now built:
- selected strict/core/ESM repair candidates:
39 - unique sources:
38 - unique exact lengths:
29 - ESM score range: min
87.72, mean98.0928, max99.99 - prompt-retargeted rows:
37 / 39 - stage-A dataset:
47rows, including39selected repairs and8purebred anchors - max prompt-length delta after retargeting:
0
Interpretation:
v1.2 was a reasonable small paid p24-only proof because it was not a replay of the original failed prompts. The scientific bet was length-retargeted manifold distillation from repaired strict/core/ESM examples.
The v1.2 paid p24 proof recovered real but narrow transfer:
- completed runs:
3 / 3 - tier-2 hits by seed:
[1, 1, 1] - recovered functional hits:
3 - recovered family-faithful hits:
2 - prompt coverage:
3 / 24 - hit prompt steps:
2,7,14
The v1.3 follow-up replayed the v1.2 hits and added nearby support prompts, but regressed:
- stage-A dataset:
64rows - composition:
39v1.2 breadth anchors,8support prompt scaffolds,9gate-hit replays,8purebred anchors - tier-2 hits by seed:
[0, 0, 1] - prompt coverage:
1 / 24 - family-faithful hits:
0 - only recovered tier-2 event: seed
67, prompt step11, bridge-only
Interpretation:
v1.2 showed a narrow family-faithful basin exists. v1.3 showed that support-prompt widening and higher trainable/stability counts are not enough to preserve that basin.
- mining/data engine: operational
- eval/finalization engine: operational
- local repair tooling: operational
- strict validator: operational and useful
v7: best historical branch, but narrow and possibly partly luckyv8: failed to broadenv7; regressed atp12/p24v9repair rescue: failed to create trainable strict data from p12/p24 near-misses- manifold
v1.1: completed p24-only gate but produced0tier-2 hits and0raw strict-conjunction candidates - manifold
v1.2: recovered real but narrow post-ESM signal, with3tier-2 hits and2family-faithful hits across3 / 24prompts - manifold
v1.3: widened support prompts but regressed to1bridge-only tier-2 hit,0family-faithful hits, and1 / 24prompt coverage - passive local-exploit lane in finalized corpus: absent
- current SFT/mining loop and current manifold stage-A replay recipe are not reliable routes to the strict manifold without a strategy change
Current governing objective:
Construct candidates inside the PETase/cutinase family manifold before optimizing stability or training behavior.
Current negative result:
Length-retargeting was necessary, but it was not sufficient. v1.3 showed that widening nearby prompt support can increase trainability and stability while still losing family-faithful bridge transfer. The next branch must optimize for family-faithful manifold retention, not just stability, geometry, or trainability.
Current positive result:
The tooling is good enough to separate bridge-only, stability-only, and family-faithful outcomes. That makes another blind replay hard to justify and gives the next offline constructor branch a clean positive/negative panel to learn from.
The scaffold-first pivot now has a concrete local entrypoint:
- config: configs/experiments/manifold/topoff1m_a_phase1_constructor_20260422.json
- runner: scripts/manifold_construction_experiment.py
- summary: reports/manifold/topoff1m-a-manifold-phase1-20260422/summary.json
- round-trip report: reports/manifold/topoff1m-a-manifold-phase1-20260422/roundtrip_report.json
Current Phase 1 result:
12,619unique sequences in the scaffold bank4,893family-manifold scaffolds3,769strict-manifold scaffolds274strict candidate positives272strict-positive rows round-tripped with0rejects79recoveredv9negative rows, with0negative family-manifold passes
The shallow same-length search now has an ESM-scored frontier:
- frontier: reports/manifold/topoff1m-a-manifold-phase1-20260422/phase2_pre_esm_frontier.jsonl
- summary: reports/manifold/topoff1m-a-manifold-phase1-20260422/phase2_pre_esm_summary.json
- scored frontier: reports/manifold/topoff1m-a-manifold-phase1-20260422/phase2_esm_scored.jsonl
- score summary: reports/manifold/topoff1m-a-manifold-phase1-20260422/phase2_esm_score_summary.json
Current Phase 2 result:
10,000strict-manifold same-length candidates4,067one-mutants5,933two-mutants96selected parent scaffolds79contributing parent scaffolds before the frontier cap was reached8unique lengths10,000 / 10,000ESM-scored on the L40S- min
99.73, mean99.9121, max99.98 - all
10,000scored>=95 - diversity/readiness selection passed with
230selected strict candidates - selected pool covers
79parent scaffolds,8lengths,133bridge-quality rows across48parents, and100two-mutants - selected ESM summary: min
99.8, mean99.9225, max99.98
We built the first small curriculum from the Phase 2 selected pool and tested whether the signal transferred back into Kimi generation.
Artifacts:
- config: configs/experiments/strict/topoff1m_a_manifold_curriculum_v1_20260422.json
- dataset summary: reports/raft/topoff1m-a-manifold-curriculum-v1-20260422/manifold_v1_stage_a_summary.json
- training report: reports/warmstart/pearl-micro-sft-topoff1m-a-manifold-v1-stagea-lr8e7-ep2/report.json
- robustness summary: reports/robustness/pearl-topoff1m-a-manifold-v1-stagea-gate-p12p24-t08-s41s53s67-c128/robustness_summary.json
- p12 gate: reports/robustness/pearl-topoff1m-a-manifold-v1-stagea-gate-p12p24-t08-s41s53s67-c128/p12_gate_decision.json
- p24 gate: reports/robustness/pearl-topoff1m-a-manifold-v1-stagea-gate-p12p24-t08-s41s53s67-c128/p24_gate_decision.json
Curriculum:
238pairs230selected manifold Phase 2 rows8canonical purebred rows234unique sequences133bridge-quality selected rows
Gate result:
p12: passed, tier-2 hits by seed[1, 2, 0],2 / 3seeds with hits,3prompts coveredp24: failed, tier-2 hits by seed[0, 1, 0],1 / 3seeds with hits,1prompt covered
Interpretation:
The manifold pool is not inert; it can induce strict hits. But v1 still behaves like a narrow attractor, not a robust learned manifold. The immediate failure is p24 prompt coverage, not runtime or scoring infrastructure.
The v1.1 repair attacks the specific v1 failure mode: p24 prompt/length coverage.
Artifacts:
- audit report: reports/analysis/manifold_v1_gate_audit_20260422/audit.md
- audit JSON: reports/analysis/manifold_v1_gate_audit_20260422/audit.json
- v1.1 config: configs/experiments/strict/topoff1m_a_manifold_curriculum_v11_20260422.json
- v1.1 dataset summary: reports/raft/topoff1m-a-manifold-curriculum-v11-20260422/manifold_v11_stage_a_summary.json
Audit read:
23p24 prompt holes1weak-hit p24 prompt20 / 20unique p24 requested lengths absent from the Phase 2 selected pool- strict scaffold anchors exist at or within
1aa of those p24 requested lengths
v1.1 dataset:
216rows160balanced high-ESM Phase 2 anchors48exact p24 prompt-replay strict scaffold anchors8canonical purebred anchors33length buckets- p24 replay anchor mean absolute length delta
0.042; max absolute delta1
Interpretation:
v1.1 is not another blind retry. It directly patches the p24 length/prompt hole that v1 exposed. It is still only an offline dataset until reviewed.
Primary next phase:
- consume the manifold v2 objective panel before spending again
- freeze its v1.2 family-faithful hits as positive anchors
- treat its v1.3 stable-only and geometry-only finalists as hard negatives
- include its v9/v1.1 drift examples as additional negative contrast
- start from natural references, canonical purebreds, old strict hits, mined family-faithful reps, April 12 strict repairs, and the v1.2 family-faithful hits
- infer and lock active-site blueprints
- permit only same-length edits that preserve:
- family length band
- canonical
GxSxGmotif identity - single active-site motif
- catalytic
S/D/Hspacing - family core screen
- optimize ESM/stability and novelty only after strict family validity is guaranteed
- require nonzero family-faithful density and prompt/length obedience offline before any new paid gate
Current v2 objective panel:
2v1.2 family-faithful positive anchors45v1.3 hard negatives305v9/v1.1 drift negatives190historical support positives- readiness: not paid-gate ready; this is the objective input for the next offline constructor pass
Current v2 offline constructor and curriculum:
340hard-gated pre-ESM frontier candidates- expanded scoring pool:
192candidates, all ESM>=85 - final reselected set:
34strict/core/ESM candidates - breadth:
18parent source keys,14exact lengths, and8length bins - finalized curriculum:
42rows, with34v2-selected candidates and8purebred anchors - p24/c128 diagnostic: completed operationally but failed durability with tier-2 hits
[0, 1, 0], prompt coverage1 / 24, and0family-faithful hits
Current status:
- v2.1: Learned geometry but collapsed stability (repeat-assisted signal).
- v2.2: Restored stability but lost bridge basin.
- v2.3: Rediscovered bridge but revealed major "tandem-repeat" artifact loophole.
- v2.4: Clean-room revalidation (repeat gate enforced). 0 bridge hits.
- v2.5: Revealed boundary optimization (16aa repeat dependency), but found True Unicorn v1 (v2.5-Hit2).
- v2.6: Clean-manifold promotion. 0 clean hits. Proved SFT cannot generatively expand the clean bridge without anti-artifact constraints.
- v2.7: K2.6 control. 0 clean hits. Confirmed limitation persists in stronger models.
- Verdict: SFT discovery campaign complete. Generative SFT limit reached.
Interpretation:
The clean bridge manifold expansion now requires either local library design/directed evolution around True Unicorn v1, or contrastive/preference/RL training with explicit anti-artifact penalties. The generative SFT discovery campaign is formally concluded.
Reference:
- manifold_construction.md
- reports/analysis/manifold_v2_objective_panel_20260424/v2_objective_panel_summary.json
- reports/analysis/manifold_v2_offline_constructor_20260424/v2_constructor_summary.json
- reports/analysis/manifold_v2_offline_constructor_20260424_batch2/v2_constructor_final_selection_summary.json
- reports/curriculum/manifold_v2_20260424/summary.json
- reports/curriculum/manifold_v21_20260424/summary.json
Optional paid diagnostic:
50k-75kexact p12/p24 hole sweep- only scale to
250k-300ktargeted mining if strict or near-strict density appears - avoid a blind
1Mrun unless smaller diagnostics justify it - do not use paid mining as the immediate next step after the manifold v1.x failures
Current ruled-out default paths:
- another tiny strict-core SFT tweak
- training on the failed
v9repair outputs - retrying manifold v1 unchanged
- launching a v1.4-shaped replay of v1.3
- treating
p48functional hits without family-faithful signal as success - blind
1Mmining as the next default move - continuing the local Gemma path unchanged
The paid v1.2 p24 proof changed the diagnosis:
- recovered functional hits:
3, one in each seed - family-faithful hits:
2 - recovered hit prompt lengths:
241,215,236 - prompt coverage across seeds:
3 / 24
Interpretation:
v1.2 did reach the strict family manifold often enough to show the pivot was real. The remaining problem is basin width, not total absence of hits.
The v1.3 offline branch tested whether nearby support prompts would widen that basin:
- keep the
39breadth-positivev1.2anchors - add
9exact replays of the recovered gate hits - add
8scaffold-backed support prompts around the recovered hit lengths - keep
8purebred anchors
The paid v1.3 p24 gate failed that bet:
- tier-2 hits by seed:
[0, 0, 1] - prompt coverage:
1 / 24 - family-faithful hits:
0 - only recovered tier-2 event: seed
67, prompt step11, bridge-only
Interpretation:
v1.3 raised trainable and stability-dominant counts, but did not preserve the v1.2 family-faithful basin. The next pass should be a v2 offline objective redesign, not another paid replay.
Reference artifacts:
- reports/analysis/manifold_v12_gate_audit_20260423/audit.md
- reports/raft/topoff1m-a-manifold-curriculum-v13-20260423/manifold_v13_stage_a.jsonl
- reports/raft/topoff1m-a-manifold-curriculum-v13-20260423/manifold_v13_stage_a_summary.json
- configs/experiments/strict/topoff1m_a_manifold_curriculum_v13_20260423.json
- supported workflow control flow is config-driven
- shared reusable logic lives under src/pearl
- historical PETase campaign wrappers live under archive/2026q1_topoff1m_a/scripts with compatibility symlinks left behind in
scripts/
For full chronology and engineering incidents, use: