Skip to content

ci: add real-data task input A/B snapshot gate#1109

Merged
Luodian merged 3 commits into
dev-v0d7from
feat/task-input-ab-ci
Feb 20, 2026
Merged

ci: add real-data task input A/B snapshot gate#1109
Luodian merged 3 commits into
dev-v0d7from
feat/task-input-ab-ci

Conversation

@Luodian
Copy link
Copy Markdown
Contributor

@Luodian Luodian commented Feb 20, 2026

Summary

  • add a request-boundary snapshot checker (tools/task_input_capture.py) that loads representative real tasks and compares base/head request payload boundaries without serializing image/video content
  • add real-task capture spec coverage for MMMU + MMLU + MMLU loglikelihood (test/eval/task_input_specs/redundancy_refactor.yaml)
  • add a dedicated CI workflow (.github/workflows/task-input-ab.yml) to capture base/head snapshots, compare JSON output, and upload artifacts on mismatch

What changed in CI

  • this gate checks request concatenation and answer-format-sensitive boundaries from real datasets
  • chat task coverage (mmmu_val) uses text_and_structure mode to verify message text + role/content-type structure while excluding media payloads
  • MMLU paths use request_only mode to verify ctx + continuation boundary stability
  • pull_request runs pin checker/spec from the base revision to avoid mutable-comparator false negatives
  • workflow now triggers on both lmms_eval/tasks/** and lmms_eval/api/** changes

Why

  • protect refactors from silently changing model-facing request formatting
  • keep CI representative and useful (real data) while avoiding expensive/noisy image/video payload comparisons

Use representative tasks to compare request-boundary payloads between base and head without serializing image/video content.
@Luodian Luodian merged commit 0f33ba0 into dev-v0d7 Feb 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant