-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Labels
bugSomething isn't workingSomething isn't working
Description
📊 Benchmark Bug Description:
-
What benchmark or evaluation is affected?
Gaia2 -
What unexpected behavior did you observe?
But still getting "Failed to load dataset", even though the latest cached version of the dataset has been downloaded to the local directory. -
What were you trying to measure or evaluate?
Use the pre-downloaded meta-agents-research-environments/gaia2 to walk through the mock evaluation process.
🎯 Benchmark Bug Category:
- 📈 Performance Issue (slow execution, memory usage, timeouts)
- 📊 Metrics Issue (incorrect calculations, missing metrics)
- 🔄 Reproducibility Issue (inconsistent results across runs)
- 💾 Data Issue (incorrect datasets, missing data, data corruption)
- ⚖️ Evaluation Issue (scoring problems, comparison errors)
- 🏃 Execution Issue (benchmark fails to run, crashes during evaluation)
📋 Benchmark Details
Benchmark Information:
- Benchmark Name: Gaia2
- Benchmark Version/Commit: [e.g., commit hash, tag, branch]
- Dataset Used: meta-agents-research-environments/gaia2
- Dataset Split: validation
- Dataset Config: mini, execution, search, adaptability, time, ambiguity
Command Used:
# Check if the dataset exists
ls /home/martin/.cache/huggingface/datasets/meta-agents-research-environments___gaia2/
adaptability ambiguity demo execution mini search time
# run
are-benchmark gaia2-run --hf-dataset meta-agents-research-environments/gaia2 --hf-split validation --model gpt-4o --provider mock🔄 Steps to Reproduce
Environment Setup:
- Dataset preparation: ...
- Model/Agent configuration: ...
- Environment setup: ...
- Command execution: ...
Reproduction Steps:
- Step 1
- Step 2
- Step 3
- Observe the issue
Frequency:
- Always happens
- Happens most of the time (>75%)
- Happens sometimes (25-75%)
- Rarely happens (<25%)
- Only under specific conditions (describe below)
📋 Error Information
Error Messages:
2025-10-10 17:00:16,105 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 17:00:16,106 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/mini/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Stack Trace (if applicable):
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Log Output:
are-benchmark gaia2-run --hf-dataset meta-agents-research-environments/gaia2 --hf-split validation --model mock --provider mock -l 1
2025-10-10 16:56:40,741 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - No output directory specified. Using default: ./gaia2_results
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Starting GAIA2 submission pipeline...
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Standard phase will run configs: ambiguity, adaptability, execution, search, time
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Agent2Agent and Noise phases will run mini config only
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - This includes: standard runs, agent2agent runs, and noise runs
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - === Phase: Running standard configurations ===
2025-10-10 16:56:40,742 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: ambiguity
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:11,478 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:11,479 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/ambiguity/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'ambiguity-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'ambiguity-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:57:11,497 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:57:11,498 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:57:11,498 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Ambiguity scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:57:11,499 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'ambiguity' failed with error: No scenarios processed
2025-10-10 16:57:11,499 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: adaptability
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:42,248 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:57:42,249 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/adaptability/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'adaptability-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'adaptability-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:57:42,249 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:57:42,250 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:57:42,250 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Adaptability scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:57:42,250 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'adaptability' failed with error: No scenarios processed
2025-10-10 16:57:42,250 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: execution
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:13,013 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:13,014 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/execution/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'execution-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'execution-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:58:13,014 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:58:13,014 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:58:13,015 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Execution scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:58:13,015 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'execution' failed with error: No scenarios processed
2025-10-10 16:58:13,015 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: search
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:43,738 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:58:43,739 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/search/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'search-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'search-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:58:43,739 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:58:43,739 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:58:43,740 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Search scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:58:43,740 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'search' failed with error: No scenarios processed
2025-10-10 16:58:43,740 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running standard config: time
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:14,563 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:14,564 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/time/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'time-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'time-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:59:14,564 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:59:14,564 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:59:14,564 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Time scenarios: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:59:14,565 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'standard' config 'time' failed with error: No scenarios processed
2025-10-10 16:59:14,565 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - === Phase: Running agent2agent configurations ===
2025-10-10 16:59:14,565 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running agent2agent config: mini
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:45,336 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 16:59:45,337 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/mini/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Mini scenarios with Agent2Agent: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 16:59:45,338 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'agent2agent' config 'mini' failed with error: No scenarios processed
2025-10-10 16:59:45,338 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - === Phase: Running noise configurations ===
2025-10-10 16:59:45,339 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Running noise config: mini
Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 17:00:16,105 - MainThread - WARNING - datasets.load - Using the latest cached version of the dataset since meta-agents-research-environments/gaia2 couldn't be found on the Hugging Face Hub
2025-10-10 17:00:16,106 - MainThread - ERROR - are.simulation.benchmark.huggingface_loader - Failed to load dataset meta-agents-research-environments/gaia2/mini/validation: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
Traceback (most recent call last):
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/are/simulation/benchmark/huggingface_loader.py", line 163, in _create_huggingface_scenario_iterator
ds = load_dataset(
^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1392, in load_dataset
builder_instance = load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/load.py", line 1166, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 124, in __init__
config_name, version, hash = _find_hash_in_cache(
^^^^^^^^^^^^^^^^^^^^
File "/data/repo/are-gaia2/.venv/lib/python3.12/site-packages/datasets/packaged_modules/cache/cache.py", line 64, in _find_hash_in_cache
raise ValueError(
ValueError: Couldn't find cache for meta-agents-research-environments/gaia2 for config 'mini-ac06a8d2d8a73f66'
Available configs in the cache: ['adaptability', 'ambiguity', 'demo', 'execution', 'mini', 'search', 'time']
2025-10-10 17:00:16,106 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Running each scenario 3 times to improve variance
2025-10-10 17:00:16,106 - MainThread - INFO - are.simulation.benchmark.scenario_executor - Starting.
2025-10-10 17:00:16,106 - MainThread - INFO - are.simulation.multi_scenario_runner - Running scenarios in parallel with 5 workers
Running Mini scenarios with Noise: 0it [00:00, ?it/s, Success=0.0%]
2025-10-10 17:00:16,107 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - Phase 'noise' config 'mini' failed with error: No scenarios processed
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - The following 7 phase/config(s) failed entirely and were skipped:
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - - standard/ambiguity
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - - standard/adaptability
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - - standard/execution
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - - standard/search
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - - standard/time
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - - agent2agent/mini
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - - noise/mini
2025-10-10 17:00:16,107 - MainThread - WARNING - are.simulation.benchmark.gaia2_submission - Check the logs above for specific error details.
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - GAIA2 submission summary:
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Total phase/configs attempted: 7
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Successful phase/configs: 0
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Failed phase/configs: 7
2025-10-10 17:00:16,107 - MainThread - INFO - are.simulation.benchmark.gaia2_submission - Total scenarios processed: 0
2025-10-10 17:00:16,107 - MainThread - ERROR - are.simulation.benchmark.gaia2_submission - All phase/configs failed. No results generated.
2025-10-10 17:00:16,108 - MainThread - WARNING - are.simulation.benchmark.cli - No results to report.
2025-10-10 17:00:16,108 - MainThread - INFO - are.simulation.benchmark.cli - All Done.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working