Commit 6bb593a
committed
fixup(factors): address strict_bench code review findings
Independent adversarial review (run before upstream maintainer
review) surfaced 10 issues in run_bench_strict. This commit fixes
the high- and medium-impact ones in-place and adds 14 regression
tests. The strict bench's design contract is unchanged; the changes
are correctness and back-compat-only.
Fixes
-----
A1. OOS train/test boundary date no longer double-counts.
.loc[:t] and .loc[t:] are both label-inclusive in pandas, so the
split date previously appeared in both buckets. Replaced with
explicit comparisons:
train = alpha_full[alpha_full.index <= oos_ts]
test = alpha_full[alpha_full.index > oos_ts]
Also added ic_count_train / ic_count_test per row so
categorise_strict can later enforce per-bucket min_ic_count.
A2. compute_random_ic_series now uses an inner join across seeds.
pd.concat(axis=1, join='inner') ensures every retained date is
the mean of *all* available seeds — not a hodgepodge of 1-seed
and 5-seed averages on dates where some seeds dropped out due to
the _MIN_VALID_PER_DATE=5 guard.
A3. on_progress callback is now exception-safe in every branch.
Refactored to a single _fire_progress() helper invoked from both
the empty-IC continue path and the normal end-of-loop path.
A4. _shuffle_within_rows pins ±inf in place like NaN.
Switched the mask from ~np.isnan() to np.isfinite() so an
inf/-inf cell stays at its original position. Defensive against
third-party zoos that bypass _validate_output.
A5. OOS sign-flip is now categorised reversed_strict, not train_only.
Bucket order: t_full >= thr AND t_test <= -thr → reversed_strict
(most diagnostic failure); t_full >= thr AND t_test in noise band
→ train_only (benign decay).
A6. Per-bucket ic_count fields surfaced on each row.
A7. Sorting uses unrounded _ir_raw / _alpha_t_full_raw / _ic_mean_raw
helper keys to keep top-N stable across runs.
C1. Wire schema regression — strict result now carries legacy aliases
alive/reversed/dead/by_theme alongside the strict-specific keys.
Existing dashboards keep rendering without code changes:
alive = confirmed_alive
reversed = reversed_strict
dead = noise + train_only
by_theme is built by a strict-aware variant that emits both the
new four-way and the legacy three-way counts per theme.
C2. _slim payload re-adds formula_latex so the wiki / dashboard top-N
cards keep showing the formula column.
C5. Error envelope is now schema-complete from the start. Every error
path (empty zoo, bad universe, bad forward-returns, bad oos_split)
returns a dict with zeroed counters, empty lists, and the rail
metadata intact — downstream consumers can depend on every key
being present regardless of status.
C6. n_random_seeds=0 is clamped to 1, AND the effective value is
persisted to entry['n_random_seeds'] so the wire response doesn't
lie about the seed count.
Internal sort-helper keys (_ir_raw etc) are stripped from public_rows
before they reach the wire payload — only _category is retained so
external consumers can read the bucket label.
Tests
-----
14 new regression tests in test_bench_strict.py — total now 33
strict + 988 existing = full agent/tests/factors/ suite is 1002
passed, 1 skipped, 24.25 s. Zero regression in the existing factor
tests.
The new tests:
- test_oos_train_test_split_does_not_double_count_boundary (A1)
- test_compute_random_ic_series_inner_joins_seed_dates (A2)
- test_run_bench_strict_on_progress_exception_is_caught (A3)
- test_shuffle_handles_inf_like_nan (A4)
- test_categorise_oos_sign_flip_is_reversed_strict_not_train_only (A5)
- test_categorise_oos_decay_to_noise_band_is_train_only (A5 companion)
- test_run_bench_strict_emits_legacy_alive_dead_reversed_keys (C1)
- test_run_bench_strict_legacy_alive_equals_confirmed_alive (C1)
- test_run_bench_strict_top_lists_include_formula_latex (C2)
- test_run_bench_strict_empty_zoo_returns_schema_with_counters (C5)
- test_run_bench_strict_n_random_seeds_zero_is_clamped (C6)
- test_run_bench_strict_rows_drop_underscore_prefixed_sort_keys (sort key
hygiene)
- test_run_bench_strict_catches_planted_alive_signal (closes the
'integration tests cheat' finding)
- test_run_bench_strict_catches_planted_reversed_signal (same)
Signed-off-by: Soli22de <177382421+Soli22de@users.noreply.github.com>1 parent b4f983d commit 6bb593a
2 files changed
Lines changed: 488 additions & 46 deletions
0 commit comments