fix(api/task): handle None generation responses in process_results by dankit · Pull Request #1311 · EvolvingLMMs-Lab/lmms-eval

dankit · 2026-04-26T22:52:59Z

Summary

Prevent Task.process_results from crashing when a generated response slot, resps is None. This can occur in instances when token budget is not sufficient, and thinking generation had not completed yet resulting in an empty response. Even if 99/100 examples succeed, the entire eval is currently discarded.
This change prevents discarding the entire evaluation run by treating missing generation responses as empty strings so postprocessing can finish and users can inspect degraded eval outputs, token counts on empty responses, and failure context. Otherwise there is no observability on the issue even with verbose output.
This is distinct from prior fixes for OpenAI-compatible message.content normalization and empty results = [] handling; this PR covers resps = [None] / resps = [[None]].

In scope

Updates lmms_eval/api/task.py so generated outputs are checked for None before calling .strip().
Applies to generate_until, generate_visual_cot, and the later generate_until.
Preserves existing behavior for normal string responses.

Out of scope

Does not change model generation, token budgeting, scoring logic, metrics, or sample logging.
Does not introduce new logging or warning behavior in order to keep the fix minimal and non-disruptive.

Validation

IFEval run with a missing generation response. Also ran MMMU val set, mmstar, screenspot_v2. | sample size: N=full run | key metrics: postprocessing completes; sample outputs/token counts remain inspectable even when results are degraded | result: pass
uv run pre-commit run --all-files | sample size: N=all files | key metrics: Python formatting via black and import ordering via isort | result: pass

Risk / Compatibility

Low risk: normal string responses keep the same .strip() behavior, and only None generation responses are treated as empty responses instead of aborting postprocessing.
This could be perceived as making a missing response less visible; I kept the change minimal and non-disruptive, but can add explicit logging if maintainers prefer stronger observability.

P.S. I know it says to create an issue first before PR if the bug is new, but this is in the same boat as #1218 , and I don't think it's quite just at the api level? There's probably a few different ways to look at this.

Type of Change

fix(api/task): handle None generation responses in process_results

33a2b08

kcz358 approved these changes May 6, 2026

View reviewed changes

kcz358 merged commit a31a7de into EvolvingLMMs-Lab:main May 6, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api/task): handle None generation responses in process_results#1311

fix(api/task): handle None generation responses in process_results#1311
kcz358 merged 1 commit into
EvolvingLMMs-Lab:mainfrom
dankit:fix/none-generation-response-postprocess

dankit commented Apr 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dankit commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

In scope

Out of scope

Validation

Risk / Compatibility

Type of Change

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dankit commented Apr 26, 2026 •

edited

Loading