-
Notifications
You must be signed in to change notification settings - Fork 437
Open
Labels
Description
Describe the bug
IndexError: list index out of rang
To Reproduce
(eval_venv) root@b2c98f779d6b:~/workshop# cat qwen3_nothink.yaml
model_parameters:
provider: "openai"
model_name: "openai/qwen3-1.7b"
base_url: "http://192.168.5.39:9001/v1"
api_key: "EMPTY"
generation_parameters:
temperature: 0.7
top_k: 20
top_p: 0.8
min_p: 0
(eval_venv) root@b2c98f779d6b:~/workshop# lighteval endpoint litellm ./qwen3_nothink.yaml 'ifbench_multiturn,lcb:codegeneration_release_latest,narrativeqa' --max
Generating train split: 10 examples [00:00, 3347.68 examples/s]
Generating train split: 10 examples [00:00, 3121.46 examples/s]
[2026-01-31 17:30:06,855] [ INFO]: --- POST-PROCESSING MODEL RESPONSES --- (pipeline.py:349)
[2026-01-31 17:30:06,855] [ INFO]: --- COMPUTING METRICS --- (pipeline.py:376)
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.80s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.76s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.89s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.86s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.81s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.79s/it]
100%|███████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.80s/it]
╭──────────────────────────────── Traceback (most recent call last) ─────────────────────────────────╮
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/main_endpoint.py:312 in litellm │
│ │
│ 309 │ │ metric_options=metric_options, │
│ 310 │ ) │
│ 311 │ │
│ ❱ 312 │ pipeline.evaluate() │
│ 313 │ │
│ 314 │ pipeline.show_results() │
│ 315 │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:291 in evaluate │
│ │
│ 288 │ │ │
│ 289 │ │ if self.is_main_process(): │
│ 290 │ │ │ self._post_process_outputs(outputs) │
│ ❱ 291 │ │ │ self._compute_metrics(outputs) │
│ 292 │ │ │ │
│ 293 │ │ │ self.evaluation_tracker.general_config_logger.log_end_time() │
│ 294 │ │ │ self.evaluation_tracker.metrics_logger.aggregate( │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/pipeline.py:391 in │
│ _compute_metrics │
│ │
│ 388 │ │ │ │ docs = [doc for doc, _ in samples] │
│ 389 │ │ │ │ responses = [response for _, response in samples] │
│ 390 │ │ │ │ │
│ ❱ 391 │ │ │ │ outputs = apply_metric( │
│ 392 │ │ │ │ │ docs=docs, │
│ 393 │ │ │ │ │ responses=responses, │
│ 394 │ │ │ │ │ metrics=metric_category_metrics, │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/__init__.py:54 in │
│ apply_metric │
│ │
│ 51 │ │ # Add non-batched metric results for this sample │
│ 52 │ │ for metric in non_batched_metrics: │
│ 53 │ │ │ output.update( │
│ ❱ 54 │ │ │ │ metric.compute_sample( │
│ 55 │ │ │ │ │ model_response=responses[i], │
│ 56 │ │ │ │ │ doc=docs[i], │
│ 57 │ │ │ │ ) │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/utils/metric_utils.py:59 │
│ in compute_sample │
│ │
│ 56 │ │ │
│ 57 │ │ if isinstance(self, MetricGrouping): │
│ 58 │ │ │ return sample_level_fn(**kwargs) │
│ ❱ 59 │ │ return {self.metric_name: sample_level_fn(**kwargs)} │
│ 60 │ │
│ 61 │ def get_corpus_aggregations(self) -> dict: │
│ 62 │ │ if isinstance(self, MetricGrouping): │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/metrics/metrics_sample.py:131 in │
│ compute │
│ │
│ 128 │ │ """ │
│ 129 │ │ results = [] │
│ 130 │ │ # We might need to flatten golds if they are a list of lists │
│ ❱ 131 │ │ golds = doc.get_golds() │
│ 132 │ │ for gold in golds: │
│ 133 │ │ │ for pred in model_response.final_text: │
│ 134 │ │ │ │ results.append(self.compute_one_item(gold=gold, pred=pred)) │
│ │
│ /root/workshop/eval_venv/lib/python3.12/site-packages/lighteval/tasks/requests.py:222 in get_golds │
│ │
│ 219 │ │ gold_indices = as_list(self.gold_index) │
│ 220 │ │ golds = [] │
│ 221 │ │ for gold_ix in gold_indices: │
│ ❱ 222 │ │ │ golds.extend(as_list(self.choices[gold_ix])) │
│ 223 │ │ return golds │
│ 224 │ │
│ 225 │ def __repr__(self): │
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of rangeExpected behavior
A clear and concise description of what you expected to happen.
Version info
lighteval 0.13.0
Reactions are currently unavailable