Skip to content

Commit 6bb593a

Browse files
committed
fixup(factors): address strict_bench code review findings
Independent adversarial review (run before upstream maintainer review) surfaced 10 issues in run_bench_strict. This commit fixes the high- and medium-impact ones in-place and adds 14 regression tests. The strict bench's design contract is unchanged; the changes are correctness and back-compat-only. Fixes ----- A1. OOS train/test boundary date no longer double-counts. .loc[:t] and .loc[t:] are both label-inclusive in pandas, so the split date previously appeared in both buckets. Replaced with explicit comparisons: train = alpha_full[alpha_full.index <= oos_ts] test = alpha_full[alpha_full.index > oos_ts] Also added ic_count_train / ic_count_test per row so categorise_strict can later enforce per-bucket min_ic_count. A2. compute_random_ic_series now uses an inner join across seeds. pd.concat(axis=1, join='inner') ensures every retained date is the mean of *all* available seeds — not a hodgepodge of 1-seed and 5-seed averages on dates where some seeds dropped out due to the _MIN_VALID_PER_DATE=5 guard. A3. on_progress callback is now exception-safe in every branch. Refactored to a single _fire_progress() helper invoked from both the empty-IC continue path and the normal end-of-loop path. A4. _shuffle_within_rows pins ±inf in place like NaN. Switched the mask from ~np.isnan() to np.isfinite() so an inf/-inf cell stays at its original position. Defensive against third-party zoos that bypass _validate_output. A5. OOS sign-flip is now categorised reversed_strict, not train_only. Bucket order: t_full >= thr AND t_test <= -thr → reversed_strict (most diagnostic failure); t_full >= thr AND t_test in noise band → train_only (benign decay). A6. Per-bucket ic_count fields surfaced on each row. A7. Sorting uses unrounded _ir_raw / _alpha_t_full_raw / _ic_mean_raw helper keys to keep top-N stable across runs. C1. Wire schema regression — strict result now carries legacy aliases alive/reversed/dead/by_theme alongside the strict-specific keys. Existing dashboards keep rendering without code changes: alive = confirmed_alive reversed = reversed_strict dead = noise + train_only by_theme is built by a strict-aware variant that emits both the new four-way and the legacy three-way counts per theme. C2. _slim payload re-adds formula_latex so the wiki / dashboard top-N cards keep showing the formula column. C5. Error envelope is now schema-complete from the start. Every error path (empty zoo, bad universe, bad forward-returns, bad oos_split) returns a dict with zeroed counters, empty lists, and the rail metadata intact — downstream consumers can depend on every key being present regardless of status. C6. n_random_seeds=0 is clamped to 1, AND the effective value is persisted to entry['n_random_seeds'] so the wire response doesn't lie about the seed count. Internal sort-helper keys (_ir_raw etc) are stripped from public_rows before they reach the wire payload — only _category is retained so external consumers can read the bucket label. Tests ----- 14 new regression tests in test_bench_strict.py — total now 33 strict + 988 existing = full agent/tests/factors/ suite is 1002 passed, 1 skipped, 24.25 s. Zero regression in the existing factor tests. The new tests: - test_oos_train_test_split_does_not_double_count_boundary (A1) - test_compute_random_ic_series_inner_joins_seed_dates (A2) - test_run_bench_strict_on_progress_exception_is_caught (A3) - test_shuffle_handles_inf_like_nan (A4) - test_categorise_oos_sign_flip_is_reversed_strict_not_train_only (A5) - test_categorise_oos_decay_to_noise_band_is_train_only (A5 companion) - test_run_bench_strict_emits_legacy_alive_dead_reversed_keys (C1) - test_run_bench_strict_legacy_alive_equals_confirmed_alive (C1) - test_run_bench_strict_top_lists_include_formula_latex (C2) - test_run_bench_strict_empty_zoo_returns_schema_with_counters (C5) - test_run_bench_strict_n_random_seeds_zero_is_clamped (C6) - test_run_bench_strict_rows_drop_underscore_prefixed_sort_keys (sort key hygiene) - test_run_bench_strict_catches_planted_alive_signal (closes the 'integration tests cheat' finding) - test_run_bench_strict_catches_planted_reversed_signal (same) Signed-off-by: Soli22de <177382421+Soli22de@users.noreply.github.com>
1 parent b4f983d commit 6bb593a

2 files changed

Lines changed: 488 additions & 46 deletions

File tree

0 commit comments

Comments
 (0)