Skip to content

[evals] Add capped ASR/OCR noisy-text perplexity-gap slices#5118

Open
dlwh wants to merge 1 commit intomainfrom
agent/20260423-fix-5097
Open

[evals] Add capped ASR/OCR noisy-text perplexity-gap slices#5118
dlwh wants to merge 1 commit intomainfrom
agent/20260423-fix-5097

Conversation

@dlwh
Copy link
Copy Markdown
Member

@dlwh dlwh commented Apr 23, 2026

Add an opt-in ASR/OCR noisy-text eval bundle for issue #5097 with paired clean/noisy slice registration and a dedicated perplexity-gap experiment entrypoint. HypR ASR hypotheses are linearized to top-1 noisy text and ReadingTimeMachine OCR source/target pairs are preserved as noisy/clean fields, both capped to 512 rows per source to avoid bulk downloads. Includes synthetic tests for row linearization, capped materialization, and clean/noisy registration naming.

Fixes #5097

@dlwh dlwh added the agent-generated Created by automation/agent label Apr 23, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 23, 2026

Claude finished @dlwh's task in 6m 17s —— View job


Code review

  • Verify PR is reviewable
  • Collect relevant CLAUDE.md/AGENTS.md paths
  • Summarize PR changes
  • Run 4 parallel review agents (2× CLAUDE.md compliance, 2× bug hunt)
  • Validate flagged issues
  • Post inline review comments

Findings: 1 minor AGENTS.md violation, 0 bugs.

Four parallel reviewers (2× sonnet CLAUDE.md/AGENTS.md compliance, 2× opus bug hunt) looked at the three new files. Bug hunters found nothing — cap logic (enumerate + break index >= row_cap) is correct, streaming dataset is consumed once, path construction and gzip writes match codebase precedent, and verified API signatures (ResourceConfig.with_cpu/with_tpu, remote(...), raw_text_dataset, default_model_perplexity_gap, HfDatasetSpec, ExecutorStep) all line up.

One real AGENTS.md violation posted inline: local import at tests/evals/test_asr_ocr_noisy_ppl.py:48 — see inline comment. A candidate isinstance(..., str) tautological-test flag and the load_dataset monkeypatch concern were considered but not posted — the isinstance isn't purely tautological (would fail if a Path were returned) and load_dataset is an HF network boundary, which is an allowed I/O mock per AGENTS.md.
• Branch: agent/20260423-fix-5097

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5b9e68f41

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

assert datasets[clean_key].tags[-1] == "variant:clean"


def test_materialize_noisy_asr_ocr_raw_respects_per_slice_cap(tmp_path, monkeypatch):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move test import to module top level

The local import from experiments.evals import asr_ocr_noisy_ppl violates the repository rule in /workspace/marin/AGENTS.md that imports must be at the top of the file except for circular-dependency or optional-dependency cases, and neither exception applies here because the same module is already imported at module load time. Keeping it local introduces unnecessary inconsistency and can delay import-time failures to test execution instead of collection.

Useful? React with 👍 / 👎.



def test_materialize_noisy_asr_ocr_raw_respects_per_slice_cap(tmp_path, monkeypatch):
from experiments.evals import asr_ocr_noisy_ppl
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This local import violates the AGENTS.md rule: "All imports at the top of the file. No local imports except to break circular dependencies or guard optional deps."

The module isn't imported at the top (the top-level import at L7-L16 pulls specific names, not the module object) and this usage — monkeypatch.setattr(asr_ocr_noisy_ppl, "load_dataset", ...) — is neither a circular-dep workaround nor an optional-dep guard.

Move the import to the top alongside the existing imports:

from experiments.evals import asr_ocr_noisy_ppl

Then drop the local import here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[evals] Add ASR and OCR-noisy text PPL evals

1 participant