Skip to content

Latest commit

 

History

History
288 lines (232 loc) · 15.9 KB

File metadata and controls

288 lines (232 loc) · 15.9 KB

Workflows

This repo currently supports a small set of reusable workflows. If a task does not fit one of these, it is probably campaign history rather than supported surface.

1. Mine

Purpose:

  • run a stage1 mining wave from a chosen prior over a prompt pack or prompt slice

Current scientific default:

  • use mining only when the team explicitly wants a paid diagnostic or targeted tranche
  • do not treat a blind broad 1M run as the default next move
  • if mining is used next, start with a 50k-75k p12/p24 exact-hole sweep before scaling to a 250k-300k targeted tranche
  • the prepared coverage-aware million remains a fallback reference design, but the current recommendation is offline manifold v2 construction

Primary entrypoints:

Config-driven mining examples:

Local-hosted stage1 trial path:

Outputs:

  • reports/raft/...
  • reports/raft_prompt_packs/...

2. Postprocess

Purpose:

  • finalize mining waves
  • merge finalized waves into exact-deduped and lineage-aware bundles

Primary entrypoints:

Outputs:

  • reports/raft/.../finalization_summary.json
  • reports/raft/.../bundle_summary.json

3. Analyze

Purpose:

  • inventory the broader finalized historical mining universe, not just the current canonical retrain bundle
  • measure anchor neighborhoods around strict and bridge-only hits before dedup/clustering compresses local structure away
  • test whether saved historical surfaces already contain a usable anchor-centered repair lane

Primary entrypoints:

Config-driven analysis example:

Operational rule:

  • this workflow is safe to stage on a dry-run launch pad first
  • do not process the historical corpus until you explicitly launch it

Current scientific read:

  • finalized-hit surfaces came back sparse
  • widening to screened finalized-report winners also came back sparse
  • so the analysis workflow now supports a real negative result as well as a discovery pass
  • current conclusion: no passive local basin exists in the saved finalized corpus; repair has to come from candidate-audit surfaces or deliberate perturbation generation

Outputs:

  • reports/analysis/.../universe
  • reports/analysis/.../neighborhoods
  • reports/analysis/.../shortlist

4. Build Dataset

Purpose:

  • build strict retrain curricula from old strict hits, mined strict reps, purebreds, and optional small anchors

Primary entrypoint:

Important current behavior:

5. Repair

Purpose:

  • build a repair pool from candidate-audit surfaces
  • run same-length native repair around the strongest geometry-positive anchors
  • validate strict survivors and check whether they materially improve retrain readiness

Primary entrypoints:

Config-driven repair example:

Operational rule:

  • treat this as the current gated branch, not as an unproven side experiment
  • scale it only with explicit concentration caps across parent runs and source waves
  • use a capped parent pool before native repair instead of trusting the raw merged pool order

Current result:

  • the April 10 pilot succeeded:
    • 48 parents
    • 13,033 evaluated variants
    • 577 survivors
    • 192 strict shortlist rows
    • 122 strict-bridge consensus rows
    • readiness passed
  • the April 12 diversity-capped scale-up also succeeded:
    • 96 parents
    • 28,030 evaluated variants
    • 1,071 survivors
    • 231 strict shortlist rows
    • 128 strict-bridge consensus rows
    • readiness passed with 443 deduped tier-2 positives, 280 tier-1 proxy positives, largest_cluster_share = 0.0293, and max_source_share = 0.0655
  • the repair-derived strict set was then promoted into strict-core-v7-repair, which passed the stricter p48 smoke gate and reached stage-b-lite
  • the April 21/22 p12/p24 v9 repair rescue failed:
    • 134 geometry-dominant near-misses
    • 47,489 local variants evaluated
    • 79 loose repair survivors
    • 0 strict shortlist rows
    • 0 strict bridge/family/consensus rows
    • readiness failed with 0 retrain positives

Current operating rule:

  • repair remains a supported tool, but not a sufficient p12/p24 rescue path as currently implemented
  • do not train on failed repair survivors that pass ESM but fail family scaffold validation
  • future repair should preserve family manifold constraints before optimizing ESM or local geometry

6. Train

Purpose:

  • run SFT warmstarts from a dataset and base checkpoint

Primary entrypoints:

For strict experiment chains, the supported path is now config-driven:

Operator rule:

  • v9 strict config is a record of the attempted path, not a branch to train from the failed repair output
  • only build/train a new strict branch after the source strict pool passes readiness
  • manifold v1.x configs are now records of tested branch shapes; do not launch another paid replay without a new offline constructor result

7. Robustness

Purpose:

  • run fixed p12/p24/p48 robustness suites
  • run smaller p48 smoke gates before promoting a branch

Current strict-branch promotion rule:

  • a smoke pass now requires at least 2 seeds with hits and at least 2 prompts hit across seeds
  • isolated strict hits no longer justify automatic stage-B promotion
  • p48 functional hits without family-faithful signal no longer justify optimism by themselves
  • p12/p24 diagnostics should be run before spending on broader robustness when a branch is suspected of short-context collapse

Primary entrypoints:

8. Reranker

Purpose:

  • build prompt-matched preference pairs from mined outputs
  • train a lightweight reranker and compare it against scalar proxy baselines

Current scientific role:

  • reranker is a diagnostic / measurement lane first
  • do not treat it as a generator-training default until it clearly beats the strongest scalar reward baselines on the harder held-out splits

Primary entrypoints:

9. Manifold Construction

Purpose:

  • construct PETase/cutinase-family candidates from valid scaffolds instead of asking a generator to rediscover the whole joint constraint
  • preserve family length band, motif identity, active-site blueprint, catalytic spacing, and family-core screen as hard constraints
  • only optimize stability, novelty, and diversity after strict family validity is guaranteed

Current scientific role:

  • this is the recommended hard-route pivot after the v8 p12/p24 collapse and the failed v9 local repair rescue
  • Phase 1 is a supported validator-first local workflow
  • repair-frontier and curriculum builders are now available for offline branch design
  • the next pass should consume the v2 objective panel and construct new candidates offline before paying for another gate

Primary entrypoints:

Config-driven manifold example:

Current Phase 1 result:

  • 12,619 unique sequences in the scaffold bank
  • 4,893 family-manifold scaffolds
  • 3,769 strict-manifold scaffolds
  • 274 strict candidate positives
  • 0 strict-positive round-trip rejects
  • 79 recovered v9 negative rows with 0 family-manifold passes

Current Phase 2 result:

  • 10,000 same-length strict-manifold candidates
  • 4,067 one-mutants and 5,933 two-mutants
  • 79 contributing parent scaffolds
  • all 10,000 ESM-scored on the L40S
  • min 99.73, mean 99.9121, max 99.98
  • diversity selection passed readiness with 230 selected strict candidates, 79 parent scaffolds, 8 lengths, 133 bridge-quality rows, and 100 two-mutants

Manifold v1 transfer result:

Manifold v1.1 through v1.3 result:

Manifold v2.3 to v2.7 results:

  • v2.3 recovered the bridge but was found to be repeat-dependent (32-35aa).
  • v2.4 enforced a 20aa repeat gate and found 0 clean hits, exposing the limit of the prior objective.
  • v2.5 revealed the model optimizing to the boundary (16aa repeat dependency).
  • v2.5-Hit2 was authenticated as the first True Unicorn (repeat-independent).
  • v2.6 built a clean manifold around v2.5-Hit2 but generated 0 clean hits, proving SFT cannot generatively expand the clean bridge without explicit anti-artifact constraints.
  • v2.7 confirmed the same limitation persists in the stronger K2.6 base model.

Next workflow (Campaign Concluded):

  • The generative SFT discovery campaign is formally concluded.
  • The project pivots to Phase 6: Local Library Design (offline) or Contrastive/Preference/RL training with explicit anti-artifact penalties (DPO).

Reference: