Clarify FX Borzoi rescue trajectory

AbdelStark · AbdelStark · commit ed7ba25ddd91 · 2026-06-11T13:54:51.000+02:00
diff --git a/README.md b/README.md
@@ -324,8 +324,9 @@ jobs. Fixture outputs are test evidence, not model results.
   useful planning behavior.
 - No GenoLeWM-FX model or demo ships; the FX pivot is stopped at the
   feasibility gate.
-- The precomputed-Borzoi rescue path is an overlap audit first, not a
-  training or benchmark claim.
+- The precomputed-Borzoi rescue path is a TraitGym-native row-alignment
+  audit first; a full fipip table join is optional staged provenance, not
+  a training or benchmark claim.
 - No clinical utility claim; the public model evidence is not clinical
   evidence.
 - Personal-genome workflows are local-first, but local execution is not
diff --git a/docs/index.md b/docs/index.md
@@ -51,7 +51,8 @@ for Carbon-backed training paths.
 - No useful-planning claim from the current planning demo.
 - No GenoLeWM-FX model or demo ships; the FX pivot is stopped at the
   feasibility gate.
-- The only active FX follow-up is a narrow precomputed-Borzoi overlap
-  audit, not a model-quality claim.
+- The only active FX follow-up is a narrow TraitGym-native
+  precomputed-Borzoi row-alignment audit; a full fipip table join is
+  optional staged provenance, not a model-quality claim.
 - No runtime or privacy assurance beyond local execution contracts and
   checksum provenance.
diff --git a/docs/research/fx-borzoi-rescue-plan.md b/docs/research/fx-borzoi-rescue-plan.md
@@ -4,26 +4,44 @@ Status: follow-up trajectory #266 after issue #257.
 
 This plan does not reverse the #257 kill decision. It defines one narrow
 way to reopen the FX question without running expensive teacher
-inference: use precomputed Borzoi scores as the teacher-derived
-functional substrate, then test whether enough public TraitGym variants
-overlap to justify the residual-model path.
+inference: use public precomputed Borzoi-derived scores as the
+teacher-derived functional substrate, then test whether enough public
+TraitGym variants can be aligned to justify the residual-model path.
 
 ## Source Fact Pattern
 
 The statgen/fipip repository documents precomputed Borzoi scores for
 more than 19 million common and low-frequency variants. Those scores are
 based on hg19 and include both variant-effect predictions and principal
-components derived from those VEPs.
-
-That changes the blocker from "run a teacher" to "prove overlap,
-coordinate compatibility, score semantics, and reproducibility." If the
-overlap is too small or the score columns do not support the FX target,
-the kill decision remains in force.
+components derived from those VEPs. The public score table is available
+through Google Cloud Storage under
+`seqnn-share/sniff/borzoi_102_annotation_set/`, with the main compressed
+annotation table published as `sniff_102_annotations.gz`.
+
+The practical implementation finding is narrower than the original
+overlap wording. The full fipip table is public but large enough that a
+first-pass rescue should not pretend to have staged or joined it unless a
+local copy is explicitly provided. TraitGym also publishes row-aligned
+precomputed Borzoi artifacts for its matched slices, including the
+compact `complex_traits_matched_9/preds/all/Borzoi_L2_L2.plus.all.parquet`
+score vector. That artifact is the default executable substrate for the
+rescue trajectory. A full fipip table exact join remains an optional
+provenance and coordinate-integrity audit, not a prerequisite for the
+first cache/baseline gate.
+
+This changes the blocker from "run a teacher" to "prove row alignment,
+source identity, score semantics, split integrity, and reproducibility."
+If the TraitGym-native Borzoi artifact is missing, row counts do not
+match, labels/splits are unusable, the exact fipip join contradicts the
+row-aligned substrate when run, or the score columns do not support the
+FX target, the kill decision remains in force.
 
 Relevant public sources:
 
 - fipip precomputed Borzoi score path:
   <https://github.com/statgen/fipip>
+- fipip public score prefix:
+  <https://console.cloud.google.com/storage/browser/seqnn-share/sniff/borzoi_102_annotation_set>
 - Borzoi model repository:
   <https://github.com/calico/borzoi>
 - TraitGym regulatory variant benchmark:
@@ -36,8 +54,13 @@ or cached for a public 10k-50k slice. The rescue hypothesis is narrower:
 
 - input: TraitGym variant identity, local edit/action metadata, source
   metadata, and optional GenoLeWM/Carbon features;
-- target: precomputed Borzoi VEP or Borzoi-PC score columns matched by
-  normalized variant identity;
+- primary target: TraitGym-native, row-aligned precomputed Borzoi score
+  artifacts matched to the public TraitGym slice by row identity and
+  validated against the slice length, labels, split policy, and artifact
+  receipts;
+- optional audit target: fipip Borzoi VEP or Borzoi-PC score columns
+  matched by normalized variant identity when the large score table is
+  staged locally;
 - objective: predict a residual over zero/source-only, Carbon, direct
   Borzoi score, and linear/logistic probe baselines;
 - success: a locked overlap-backed benchmark shows a meaningful gain
@@ -57,39 +80,56 @@ superiority, or proof of useful planning.
 Lock the rescue-specific contract before building caches:
 
 - exact source URLs and revisions;
-- fipip/precomputed-score access path;
-- genome build and liftover rules;
+- fipip/precomputed-score access path and TraitGym-native Borzoi
+  artifact path;
+- genome build and liftover rules for any exact fipip table join;
 - variant key normalization for `chrom,pos,ref,alt`;
 - selected Borzoi VEP/PC score columns;
+- row-alignment requirements for TraitGym-native score vectors;
 - minimum overlap threshold;
 - leakage and split rules;
 - claim boundaries.
 
 ### Stage 1 - Overlap Audit
 
-Join TraitGym variants to precomputed Borzoi score identities without
-training a model. The go threshold is:
+Validate TraitGym variants against public precomputed Borzoi artifacts
+without training a model. The default go threshold is:
 
-- at least 10,000 matched variants in one primary public task slice;
-- no unresolved ref/alt flips or build mismatches in the matched set;
+- at least 10,000 matched variants in one primary public task slice,
+  where "matched" means the public TraitGym slice and the row-aligned
+  Borzoi score vector have identical row counts and stable artifact
+  receipts;
+- no unresolved row-order, duplicate-key, label, or split mismatch in
+  the matched set;
 - enough positives and negatives after the locked holdout rule;
 - a publishable overlap manifest with checksums.
 
+If a local fipip score table is staged, the audit should also join by
+`chrom,pos,ref,alt` with explicit handling for allele flips, strand
+issues, multi-allelic rows, and duplicates. If that exact join is not
+run, the report must say so directly and must not claim exact fipip
+table overlap.
+
 If this fails, stop and publish the overlap no-go report.
 
 ### Stage 2 - Score Cache
 
-Materialize only the matched precomputed columns and metadata needed for
-the experiment. The cache must include source revision, checksum,
-genome-build handling, matched/unmatched counts, split identity, and
+Materialize only the matched precomputed score columns and metadata
+needed for the experiment. The first cache should consume the compact
+TraitGym-native Borzoi score vector, not the full fipip table. If a
+later exact fipip join is run, the cache may add selected fipip-derived
+VEP/PC columns behind the same manifest contract.
+
+The cache must include source revision, checksum, genome-build handling
+where applicable, matched/unmatched counts, split identity, and
 redaction-safe commands.
 
 ### Stage 3 - Baseline Gate
 
-Run source-only, label-prior, Carbon where applicable, direct Borzoi
-score, and linear/logistic probe baselines. The path continues only if
-the task is not saturated and the Borzoi-derived target has enough
-signal to make a residual model meaningful.
+Run source-only, label-prior, Carbon where applicable, direct
+TraitGym-native Borzoi score, and linear/logistic probe baselines. The
+path continues only if the task is not saturated and the Borzoi-derived
+target has enough signal to make a residual model meaningful.
 
 ### Stage 4 - Residual Model And Locked Eval
 
@@ -104,9 +144,12 @@ historical kill report for the teacher-inference path. The follow-up
 children are:
 
 - #267 - lock rescue contract and coordinate rules;
-- #268 - audit TraitGym coverage by precomputed Borzoi scores;
-- #269 - build the manifest-backed precomputed Borzoi score cache;
-- #270 - run the leakage-aware baseline and saturation gate;
+- #268 - audit TraitGym-native Borzoi score-vector alignment, and record
+  exact fipip join status separately;
+- #269 - build the manifest-backed precomputed Borzoi score cache from
+  the compact TraitGym-native score vector first;
+- #270 - run the leakage-aware baseline and saturation gate against the
+  direct TraitGym-native Borzoi score and simple probes;
 - #271 - train a residual model only after the baseline gate;
 - #272 - publish the locked result or overlap kill report.
 
diff --git a/docs/research/fx-decision-package.md b/docs/research/fx-decision-package.md
@@ -44,11 +44,22 @@ target was not reproducible under the contract.
 ## Follow-Up Trajectory
 
 The #257 kill decision remains correct for the teacher-inference path.
-A separate follow-up trajectory can test whether precomputed Borzoi
-scores from statgen/fipip rescue the idea without running a teacher. The
-follow-up must start with an overlap audit against TraitGym variant
-identities and must stop quickly if there is not a reproducible,
-public, checksum-backed 10k-50k matched slice.
+A separate follow-up trajectory can test whether public precomputed
+Borzoi-derived scores rescue the idea without running a teacher. The
+trajectory now has two explicit lanes:
+
+- the default executable lane uses TraitGym's row-aligned
+  `Borzoi_L2_L2.plus.all` score artifact for the matched complex-trait
+  slice;
+- the optional provenance lane joins against the large statgen/fipip
+  Borzoi table only when that table is explicitly staged locally.
+
+The follow-up must start with an alignment and artifact-receipt audit
+against TraitGym identities, labels, splits, and the row-aligned Borzoi
+score vector. It must stop quickly if there is not a reproducible,
+public, checksum-backed 10k-50k matched slice. If the optional full
+fipip table join is not run, reports must say that directly and must not
+claim exact fipip table overlap.
 
 That follow-up is tracked in #266 and documented in
 [GenoLeWM-FX Borzoi rescue plan](fx-borzoi-rescue-plan.md).
diff --git a/tests/lint/test_fx_contract_docs.py b/tests/lint/test_fx_contract_docs.py
@@ -64,7 +64,8 @@ def test_public_docs_link_fx_research_without_success_language() -> None:
     assert "FX pivot" in combined
     assert "No GenoLeWM-FX model or demo ships" in combined
     assert "precomputed-Borzoi" in normalized
-    assert "overlap audit" in normalized
+    assert "row-alignment audit" in normalized
+    assert "full fipip table join is optional staged provenance" in normalized
     assert "GenoLeWM-FX improves" not in combined
     assert "GenoLeWM-FX outperforms" not in combined
 
@@ -75,6 +76,9 @@ def test_fx_borzoi_rescue_plan_is_overlap_first_and_claim_bounded() -> None:
     required = (
         "This plan does not reverse the #257 kill decision",
         "precomputed Borzoi scores",
+        "TraitGym-native, row-aligned precomputed Borzoi score artifacts",
+        "full fipip table exact join remains an optional",
+        "must not claim exact fipip table overlap",
         "more than 19 million common and low-frequency variants",
         "based on hg19",
         "at least 10,000 matched variants",