fix(jumpscore): align message format and video lookup by mathCrazyy · Pull Request #1330 · EvolvingLMMs-Lab/lmms-eval

mathCrazyy · 2026-05-12T12:34:25Z

Summary

Remove legacy count QA context from doc_to_messages so evaluation input matches the intended timestamp-only prompt.

In scope

Update lmms_eval/tasks/jumpscore/utils.py.
Change JumpScore chat message construction from multi-turn count QA context to a single-turn timestamp query.

Out of scope

No model implementation changes.
No metric or scoring logic changes.
No dataset content changes.
No changes to prompts outside JumpScore.
No changes to other tasks.

Validation

Confirmed the committed diff only touches lmms_eval/tasks/jumpscore/utils.py.
Verified the message construction now produces one user turn containing the video and timestamp question.
Verified the video lookup includes existing cache paths plus HF snapshot fallback paths.
Pushed commit cf7f49f to origin/main.

Risk / Compatibility

Low risk for model code because this only changes JumpScore task input construction.
Expected behavior change for JumpScore evaluation prompts: legacy count QA history is no longer included.
Compatible with existing cache layouts; adds support for HF snapshot cache layout as a fallback.
Results may differ from prior JumpScore runs because the evaluation input format is now aligned to the single-turn protocol.

Type of Change

Bug fix (non-breaking change)
Evaluation/task configuration alignment

…dule import The judge server was initialized at module import time, causing OpenAI API errors in CI environments where OPENAI_API_KEY is not set. Now the server is created on first use via _get_judge_server() instead.

…or on module import" This reverts commit 18dd0c3.

…wnload snapshot_download was called at module level, causing CI to fail when loading task configs without HF credentials. Moved to _get_cache_dir() which is called on first actual use, following the same pattern as other tasks (e.g. vbvr/utils.py).

…dule import The judge server was initialized at module level, causing an OpenAIError in CI environments where OPENAI_API_KEY is not set. Replaced the top-level initialization with _get_judge_server(), which creates the server on first actual use, consistent with how jump_rope/utils.py handles its HF download.

The BASE worktree may contain pre-existing import-time errors (e.g. module-level OpenAI client init requiring OPENAI_API_KEY, or network calls at import time). These cause the BASE capture step to fail, blocking all PRs even when the PR itself introduces no regression. Changes: - Add continue-on-error: true to 'Capture BASE snapshot' step - Update 'Compare snapshots' to skip diff when base.json is absent, printing a clear warning instead of failing the workflow

…or on module import" This reverts commit 917a3ed.

…hot fails" This reverts commit 86f7f9a.

kcz358 · 2026-05-14T12:14:17Z

JumpScore does not zip the data yet. Will it be zipped later? If this is the case, I will merge this PR. Thanks!

mathCrazyy · 2026-05-15T00:33:43Z

JumpScore does not zip the data yet. Will it be zipped later? If this is the case, I will merge this PR. Thanks!

Thank you for the review!
The data has been updated to zip format, and I’ve already adapted the code to support it.
Feel free to merge this PR when you’re ready. Thanks!

mathCrazyy added 13 commits May 11, 2026 15:09

feat: add jump rope evaluation task

455d699

Revert "fix(mmmu): lazy-load judge server to avoid OpenAI API key err…

e4c6438

…or on module import" This reverts commit 18dd0c3.

refactor(jump_rope): rename task directory from jump_rope to jumpscore

1f26f50

Revert "fix(mmmu): lazy-load judge server to avoid OpenAI API key err…

191ff52

…or on module import" This reverts commit 917a3ed.

Revert "ci(task-input-ab): gracefully skip comparison when BASE snaps…

4ecc683

…hot fails" This reverts commit 86f7f9a.

fix(jumpscore): configure video cache in yaml

ac2becf

fix(jumpscore): expose map metric

c8ccfc5

Merge branch 'EvolvingLMMs-Lab:main' into main

3f95650

fix(jumpscore): align message format and video lookup

cf7f49f

kcz358 reviewed May 13, 2026

View reviewed changes

Comment thread lmms_eval/tasks/jumpscore/utils.py Outdated

fix(jumpscore): remove snapshot cache fallback

a28aad4

mathCrazyy and others added 2 commits May 15, 2026 08:26

fix(jumpscore): support zipped video cache

0d93b72

style: auto-fix lint (black + isort)

9eb13a9

kcz358 merged commit a1ba778 into EvolvingLMMs-Lab:main May 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(jumpscore): align message format and video lookup#1330

fix(jumpscore): align message format and video lookup#1330
kcz358 merged 16 commits into
EvolvingLMMs-Lab:mainfrom
mathCrazyy:main

mathCrazyy commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

kcz358 commented May 14, 2026

Uh oh!

mathCrazyy commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mathCrazyy commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

In scope

Out of scope

Validation

Risk / Compatibility

Type of Change

Uh oh!

Uh oh!

kcz358 commented May 14, 2026

Uh oh!

mathCrazyy commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mathCrazyy commented May 12, 2026 •

edited

Loading