Skip to content

[nightshift] investigate slow/flaky CI tests#5762

Open
claude-nightshift[bot] wants to merge 1 commit into
mainfrom
nightshift/ci-tests-20260515
Open

[nightshift] investigate slow/flaky CI tests#5762
claude-nightshift[bot] wants to merge 1 commit into
mainfrom
nightshift/ci-tests-20260515

Conversation

@claude-nightshift
Copy link
Copy Markdown
Contributor

Twin tests run in red
Same path traced by two siblings
Prune the slower branch

(seed: fd25305b)

Summary

Nightshift CI test audit flagged four slow unit tests in the Marin - Unit workflow. Investigation showed one of them — tests/processing/classification/deduplication/test_fuzzy.py::test_high_char_5gram_jaccard_pair_clusters_end_to_end (~33s) — provides no unique coverage:

  • It runs the full normalize → minhash → fuzzy_dups pipeline on a synthetic J≈0.97 char-5-gram pair and asserts both docs share one dup_cluster_id with exactly one canonical.
  • test_fuzzy_dups_single_source_schema_and_pair already runs the same end-to-end pipeline (including connected components and per-source attr-output emission) on the fox-corpus fuzzy pair (test_contaminated_1 / test_high_overlap, also well above the LSH collision threshold at b=26, r=11) and asserts the same shared-dup_cluster_id/exactly-one-canonical properties.
  • LSH-level recall across the Jaccard band is independently covered by the parametrized test_high_char_5gram_jaccard_pairs_share_lsh_bucket (which exercises the synthetic-pair construction at b=26, r=11 across target_j ∈ {0.95, 0.97, 0.99} × 20 seeds).

Removing the redundant test trims ~32s from the unit-test job's slowest tail without weakening coverage.

The other three slow candidates (test_fuzzy_dups_single_source_schema_and_pair, test_connected_components_happy_path, test_ghalogs_public_normalize_steps_write_datakit_normalized_train_partition) each provide unique end-to-end coverage of a real pipeline path and their cost is dominated by intrinsic ZephyrContext / StepRunner setup, so no in-scope speedup justified touching them in this pass.

Test plan

  • ./infra/pre-commit.py --files tests/processing/classification/deduplication/test_fuzzy.py — green.
  • uv run pytest tests/processing/classification/deduplication/test_fuzzy.py — 71 passed, 1 xfailed (data_integration tests included).

`test_high_char_5gram_jaccard_pair_clusters_end_to_end` runs the full
normalize → minhash → fuzzy_dups pipeline on a synthetic J≈0.97 pair to
verify that high-Jaccard pairs share one `dup_cluster_id` with exactly
one canonical. The same end-to-end path with the same assertions is
already exercised by `test_fuzzy_dups_single_source_schema_and_pair` on
the fox-corpus high-Jaccard pair (also well above the LSH threshold at
b=26, r=11). LSH-level recall across the Jaccard band is independently
covered by the parametrized `test_high_char_5gram_jaccard_pairs_share_lsh_bucket`.
Removing the redundant test trims ~32s from the unit-test job's slowest
tail without weakening coverage.
@claude-nightshift claude-nightshift Bot added agent-generated Created by automation/agent nightshift Automated nightshift fixes labels May 15, 2026
@claude-nightshift claude-nightshift Bot enabled auto-merge (squash) May 15, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent nightshift Automated nightshift fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants