Skip to content

[BENCHMARK BUG] Failed to load dataset #12

@mabuyun

Description

@mabuyun

📊 Benchmark Bug Description:

  • What benchmark or evaluation is affected?
    Gaia2

  • What unexpected behavior did you observe?
    But still getting "Failed to load dataset", even though the latest cached version of the dataset has been downloaded to the local directory.

  • What were you trying to measure or evaluate?
    Use the pre-downloaded meta-agents-research-environments/gaia2 to walk through the mock evaluation process.

🎯 Benchmark Bug Category:

  • 📈 Performance Issue (slow execution, memory usage, timeouts)
  • 📊 Metrics Issue (incorrect calculations, missing metrics)
  • 🔄 Reproducibility Issue (inconsistent results across runs)
  • 💾 Data Issue (incorrect datasets, missing data, data corruption)
  • ⚖️ Evaluation Issue (scoring problems, comparison errors)
  • 🏃 Execution Issue (benchmark fails to run, crashes during evaluation)

📋 Benchmark Details

Benchmark Information:

  • Benchmark Name: Gaia2
  • Benchmark Version/Commit: [e.g., commit hash, tag, branch]
  • Dataset Used: meta-agents-research-environments/gaia2
  • Dataset Split: validation
  • Dataset Config: mini, execution, search, adaptability, time, ambiguity

Command Used:

# Check if the dataset exists
ls /home/martin/.cache/huggingface/datasets/meta-agents-research-environments___gaia2/ 
adaptability  ambiguity  demo  execution  mini  search  time

# run
are-benchmark gaia2-run --hf-dataset meta-agents-research-environments/gaia2  --hf-split validation  --model gpt-4o --provider mock

🔄 Steps to Reproduce

Environment Setup:

  1. Dataset preparation: ...
  2. Model/Agent configuration: ...
  3. Environment setup: ...
  4. Command execution: ...

Reproduction Steps:

  1. Step 1
  2. Step 2
  3. Step 3
  4. Observe the issue

Frequency:

  • Always happens
  • Happens most of the time (>75%)
  • Happens sometimes (25-75%)
  • Rarely happens (<25%)
  • Only under specific conditions (describe below)

📋 Error Information

Error Messages:

2025-10-10 17:00:16,105 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 17:00:16,106 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/mini/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'

Stack Trace (if applicable):

Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']

Log Output:

are-benchmark gaia2-run --hf-dataset meta-agents-research-environments/gaia2  --hf-split validation  --model mock --provider mock -l 1
2025-10-10 16:56:40,741 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - No output directory specified. Using default: ./gaia2_results
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Starting GAIA2 submission pipeline...
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Standard phase will run configs: ambiguity, adaptability, execution, search, time
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Agent2Agent and Noise phases will run mini config only
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - This includes: standard runs, agent2agent runs, and noise runs
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - === Phase: Running standard configurations ===
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: ambiguity
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:11,478 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:11,479 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/ambiguity/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'ambiguity-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'ambiguity-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:57:11,497 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:57:11,498 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:57:11,498 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Ambiguity scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:57:11,499 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'ambiguity' failed with error: No scenarios processed
2025-10-10 16:57:11,499 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: adaptability
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:42,248 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:42,249 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/adaptability/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'adaptability-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'adaptability-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:57:42,249 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:57:42,250 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:57:42,250 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Adaptability scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:57:42,250 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'adaptability' failed with error: No scenarios processed
2025-10-10 16:57:42,250 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: execution
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:13,013 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:13,014 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/execution/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'execution-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'execution-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:58:13,014 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:58:13,014 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:58:13,015 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Execution scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:58:13,015 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'execution' failed with error: No scenarios processed
2025-10-10 16:58:13,015 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: search
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:43,738 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:43,739 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/search/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'search-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'search-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:58:43,739 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:58:43,739 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:58:43,740 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Search scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:58:43,740 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'search' failed with error: No scenarios processed
2025-10-10 16:58:43,740 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: time
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:14,563 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:14,564 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/time/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'time-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'time-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:59:14,564 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:59:14,564 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:59:14,564 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Time scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:59:14,565 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'time' failed with error: No scenarios processed
2025-10-10 16:59:14,565 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - === Phase: Running agent2agent configurations ===
2025-10-10 16:59:14,565 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running agent2agent config: mini
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:45,336 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:45,337 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/mini/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Mini scenarios with Agent2Agent: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:59:45,338 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'agent2agent' config 'mini' failed with error: No scenarios processed
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - === Phase: Running noise configurations ===
2025-10-10 16:59:45,339 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running noise config: mini
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 17:00:16,105 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 17:00:16,106 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/mini/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
    ds = load_dataset(
         ^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
    builder_instance = load_dataset_builder(
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
    builder_instance: DatasetBuilder = builder_cls(
                                       ^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
    config_name, version, hash = _find_hash_in_cache(
                                 ^^^^^^^^^^^^^^^^^^^^
  File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
    raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 17:00:16,106 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 17:00:16,106 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 17:00:16,106 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Mini scenarios with Noise: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 17:00:16,107 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'noise' config 'mini' failed with error: No scenarios processed
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - The following 7 phase/config(s) failed entirely and were skipped:
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission -   - standard/ambiguity
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission -   - standard/adaptability
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission -   - standard/execution
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission -   - standard/search
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission -   - standard/time
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission -   - agent2agent/mini
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission -   - noise/mini
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - Check the logs above for specific error details.
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - GAIA2 submission summary:
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission -   Total phase/configs attempted: 7
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission -   Successful phase/configs: 0
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission -   Failed phase/configs: 7
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission -   Total scenarios processed: 0
2025-10-10 17:00:16,107 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - All phase/configs failed. No results generated.
2025-10-10 17:00:16,108 - MainThread - WARNING - are.simulation.benchmark.cli - No results to report.
2025-10-10 17:00:16,108 - MainThread - INFO - are.simulation.benchmark.cli - All Done.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions