Skip to content

[BUG] IndexError: list index out of rang #1162

@2niuhe

Description

@2niuhe

Describe the bug

IndexError: list index out of rang

To Reproduce

(eval_venv) root@b2c98f779d6b:~/workshop# cat qwen3_nothink.yaml
model_parameters:
provider: "openai"
model_name: "openai/qwen3-1.7b"
base_url: "http://192.168.5.39:9001/v1"
api_key: "EMPTY"
generation_parameters:
temperature: 0.7
top_k: 20
top_p: 0.8
min_p: 0

(eval_venv) root@b2c98f779d6b:~/workshop# lighteval endpoint litellm ./qwen3_nothink.yaml 'ifbench_multiturn,lcb:codegeneration_release_latest,narrativeqa' --max

Generating train split: 10 examples [00:00, 3347.68 examples/s]
Generating train split: 10 examples [00:00, 3121.46 examples/s]
[2026-01-31 17:30:06,855] [    INFO]: --- POST-PROCESSING MODEL RESPONSES --- (pipeline.py:349)
[2026-01-31 17:30:06,855] [    INFO]: --- COMPUTING METRICS --- (pipeline.py:376)
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.80s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.76s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.89s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.86s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.81s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.80s/it]
╭──────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/main_endpoint.py:312 in litellm    │
│                                                                                                    │
│   309 │   │   metric_options=metric_options,                                                       │
│   310 │   )                                                                                        │
│   311 │                                                                                            │
│ ❱ 312 │   pipeline.evaluate()                                                                      │
│   313 │                                                                                            │
│   314 │   pipeline.show_results()                                                                  │
│   315                                                                                              │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:291 in evaluate        │
│                                                                                                    │
│   288 │   │                                                                                        │
│   289 │   │   if self.is_main_process():                                                           │
│   290 │   │   │   self._post_process_outputs(outputs)                                              │
│ ❱ 291 │   │   │   self._compute_metrics(outputs)                                                   │
│   292 │   │   │                                                                                    │
│   293 │   │   │   self.evaluation_tracker.general_config_logger.log_end_time()                     │
│   294 │   │   │   self.evaluation_tracker.metrics_logger.aggregate(                                │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:391 in                 │
│ _compute_metrics                                                                                   │
│                                                                                                    │
│   388 │   │   │   │   docs = [doc for doc, _ in samples]                                           │
│   389 │   │   │   │   responses = [response for _, response in samples]                            │
│   390 │   │   │   │                                                                                │
│ ❱ 391 │   │   │   │   outputs = apply_metric(                                                      │
│   392 │   │   │   │   │   docs=docs,                                                               │
│   393 │   │   │   │   │   responses=responses,                                                     │
│   394 │   │   │   │   │   metrics=metric_category_metrics,                                         │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/__init__.py:54 in          │
│ apply_metric                                                                                       │
│                                                                                                    │
│   51 │   │   # Add non-batched metric results for this sample                                      │
│   52 │   │   for metric in non_batched_metrics:                                                    │
│   53 │   │   │   output.update(                                                                    │
│ ❱ 54 │   │   │   │   metric.compute_sample(                                                        │
│   55 │   │   │   │   │   model_response=responses[i],                                              │
│   56 │   │   │   │   │   doc=docs[i],                                                              │
│   57 │   │   │   │   )                                                                             │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/utils/metric_utils.py:59   │
│ in compute_sample                                                                                  │
│                                                                                                    │
│    56 │   │                                                                                        │
│    57 │   │   if isinstance(self, MetricGrouping):                                                 │
│    58 │   │   │   return sample_level_fn(**kwargs)                                                 │
│ ❱  59 │   │   return {self.metric_name: sample_level_fn(**kwargs)}                                 │
│    60 │                                                                                            │
│    61 │   def get_corpus_aggregations(self) -> dict:                                               │
│    62 │   │   if isinstance(self, MetricGrouping):                                                 │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/metrics_sample.py:131 in   │
│ compute                                                                                            │
│                                                                                                    │
│    128 │   │   """
│    129 │   │   results = []                                                                        │
│    130 │   │   # We might need to flatten golds if they are a list of lists                        │
│ ❱  131 │   │   golds = doc.get_golds()                                                             │
│    132 │   │   for gold in golds:                                                                  │
│    133 │   │   │   for pred in model_response.final_text:                                          │
│    134 │   │   │   │   results.append(self.compute_one_item(gold=gold, pred=pred))                 │
│                                                                                                    │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/requests.py:222 in get_golds │
│                                                                                                    │
│   219 │   │   gold_indices = as_list(self.gold_index)                                              │
│   220 │   │   golds = []                                                                           │
│   221 │   │   for gold_ix in gold_indices:                                                         │
│ ❱ 222 │   │   │   golds.extend(as_list(self.choices[gold_ix]))                                     │
│   223 │   │   return golds                                                                         │
│   224 │                                                                                            │
│   225 │   def __repr__(self):                                                                      │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range

Expected behavior

A clear and concise description of what you expected to happen.

Version info

lighteval 0.13.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions