[evals] Add capped ASR/OCR noisy-text perplexity-gap slices#5118
[evals] Add capped ASR/OCR noisy-text perplexity-gap slices#5118
Conversation
|
Claude finished @dlwh's task in 6m 17s —— View job Code review
Findings: 1 minor AGENTS.md violation, 0 bugs. Four parallel reviewers (2× sonnet CLAUDE.md/AGENTS.md compliance, 2× opus bug hunt) looked at the three new files. Bug hunters found nothing — cap logic ( One real AGENTS.md violation posted inline: local import at |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b5b9e68f41
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| assert datasets[clean_key].tags[-1] == "variant:clean" | ||
|
|
||
|
|
||
| def test_materialize_noisy_asr_ocr_raw_respects_per_slice_cap(tmp_path, monkeypatch): |
There was a problem hiding this comment.
Move test import to module top level
The local import from experiments.evals import asr_ocr_noisy_ppl violates the repository rule in /workspace/marin/AGENTS.md that imports must be at the top of the file except for circular-dependency or optional-dependency cases, and neither exception applies here because the same module is already imported at module load time. Keeping it local introduces unnecessary inconsistency and can delay import-time failures to test execution instead of collection.
Useful? React with 👍 / 👎.
|
|
||
|
|
||
| def test_materialize_noisy_asr_ocr_raw_respects_per_slice_cap(tmp_path, monkeypatch): | ||
| from experiments.evals import asr_ocr_noisy_ppl |
There was a problem hiding this comment.
This local import violates the AGENTS.md rule: "All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps."
The module isn't imported at the top (the top-level import at L7-L16 pulls specific names, not the module object) and this usage — monkeypatch.setattr(asr_ocr_noisy_ppl, "load_dataset", ...) — is neither a circular-dep workaround nor an optional-dep guard.
Move the import to the top alongside the existing imports:
from experiments.evals import asr_ocr_noisy_pplThen drop the local import here.
Add an opt-in ASR/OCR noisy-text eval bundle for issue #5097 with paired clean/noisy slice registration and a dedicated perplexity-gap experiment entrypoint. HypR ASR hypotheses are linearized to top-1 noisy text and ReadingTimeMachine OCR source/target pairs are preserved as noisy/clean fields, both capped to 512 rows per source to avoid bulk downloads. Includes synthetic tests for row linearization, capped materialization, and clean/noisy registration naming.
Fixes #5097