Skip to content

fix(normalizations): guard against index out of range in LogProbToken…#1180

Open
inakiLakunza wants to merge 1 commit intohuggingface:mainfrom
inakiLakunza:patch-2
Open

fix(normalizations): guard against index out of range in LogProbToken…#1180
inakiLakunza wants to merge 1 commit intohuggingface:mainfrom
inakiLakunza:patch-2

Conversation

@inakiLakunza
Copy link

…Norm

Problem

When running evaluations with cached predictions, normalize_log_probs crashes with an IndexError: list index out of range inside the LogProbTokenNorm case. This happens because the cached output_tokens list can be shorter than choices_logprob, for example when a task's choices change between runs but the cache is not invalidated.

Affected path:
lighteval/metrics/normalizations.pynormalize_log_probsLogProbTokenNorm branch

Fix

  • Use min(len(choices_logprob), len(choices_tokens)) to safely cap the iteration range instead of blindly iterating over len(choices_logprob).
  • Emit a logger.warning when truncation occurs so users are alerted to potential cache corruption and can take action (e.g. clearing the cache).
  • Add a module-level logger = logging.getLogger(__name__) for consistent logging.

Changes

  • src/lighteval/metrics/normalizations.py
    • Added import logging and module-level logger
    • Replaced bare list comprehension in LogProbTokenNorm with length-guarded version

How to reproduce the original bug

  1. Run an evaluation that uses LogProbTokenNorm (e.g. belebele_mkd_Cyrl_cf).
  2. Allow results to be cached.
  3. Modify the number of choices for the task, or use a stale cache from a previous run.
  4. Re-run, the pipeline crashes at the metric computation stage with IndexError: list index out of range.

…Norm

## Problem

When running evaluations with cached predictions, `normalize_log_probs` crashes with an `IndexError: list index out of range` inside the `LogProbTokenNorm` case. This happens because the cached `output_tokens` list can be shorter than `choices_logprob`, for example when a task's choices change between runs but the cache is not invalidated.

Affected path:
`lighteval/metrics/normalizations.py` → `normalize_log_probs` → `LogProbTokenNorm` branch

## Fix

- Use `min(len(choices_logprob), len(choices_tokens))` to safely cap the iteration range instead of blindly iterating over `len(choices_logprob)`.
- Emit a `logger.warning` when truncation occurs so users are alerted to potential cache corruption and can take action (e.g. clearing the cache).
- Add a module-level `logger = logging.getLogger(__name__)` for consistent logging.

## Changes

- `src/lighteval/metrics/normalizations.py`
  - Added `import logging` and module-level logger
  - Replaced bare list comprehension in `LogProbTokenNorm` with length-guarded version

## How to reproduce the original bug

1. Run an evaluation that uses `LogProbTokenNorm` (e.g. `belebele_mkd_Cyrl_cf`).
2. Allow results to be cached.
3. Modify the number of choices for the task, or use a stale cache from a previous run.
4. Re-run, the pipeline crashes at the metric computation stage with `IndexError: list index out of range`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant