Description
From runbenchmark.20201223T011330.log
file saved to S3 (same error occurred on all datasets)
[DEBUG] [amlb.benchmark:01:13:30.196] Using constraint definition: { 'cores': 8,
'folds': 10,
'max_runtime_seconds': 3600,
'min_vol_size_mb': 1000000,
'name': '1h8c'}.
[INFO] [amlb.benchmarks.openml:01:13:30.196] Loading openml suite 269.
[ERROR] [amlb:01:13:30.498] https://www.openml.org/api/v1/xml/study/269 returned code 107: Database connection error. Usually due to high server load. Please wait for N seconds and try again. - None
Traceback (most recent call last):
File "runbenchmark.py", line 118, in <module>
bench = amlb.Benchmark(args.framework, args.benchmark, args.constraint)
File "/repo/amlb/benchmark.py", line 75, in __init__
self.benchmark_def, self.benchmark_name, self.benchmark_path = rget().benchmark_definition(benchmark_name, self.constraint_def)
File "/repo/amlb/resources.py", line 181, in benchmark_definition
hard_defaults, tasks, benchmark_path, benchmark_name = benchmark_load(name, self.config.benchmarks.definition_dir)
File "/repo/amlb/benchmarks/parser.py", line 19, in benchmark_load
benchmark_name, benchmark_path, tasks = load_oml_benchmark(name)
File "/repo/amlb/benchmarks/openml.py", line 38, in load_oml_benchmark
suite = openml.study.get_suite(oml_id)
File "/repo/venv/lib/python3.7/site-packages/openml/study/functions.py", line 29, in get_suite
suite = cast(OpenMLBenchmarkSuite, _get_study(suite_id, entity_type="task"))
File "/repo/venv/lib/python3.7/site-packages/openml/study/functions.py", line 71, in _get_study
xml_string = openml._api_calls._perform_api_call(call_suffix, "get")
File "/repo/venv/lib/python3.7/site-packages/openml/_api_calls.py", line 62, in _perform_api_call
__check_response(response, url, file_elements)
File "/repo/venv/lib/python3.7/site-packages/openml/_api_calls.py", line 143, in __check_response
raise __parse_server_exception(response, url, file_elements=file_elements)
openml.exceptions.OpenMLServerException: https://www.openml.org/api/v1/xml/study/269 returned code 107: Database connection error. Usually due to high server load. Please wait for N seconds and try again. - None
This was caused by the following:
python runbenchmark.py AutoGluon openml/s/269 1h8c -m aws -f 0 -p 100
Note: This is on a fork of the automlbenchmark repo where I have set max p to 100.
Running test
datasets works correctly.
When I run the classification datasets, ~45 of the 66 datasets fail with similar errors, while the others succeed.
python runbenchmark.py AutoGluon openml/s/271 1h8c -m aws -f 0 -p 100
This used to work fine in previous versions of automlbenchmark (From May 28th) where I would sometimes even use -p 400
equivalents with the original 39 datasets.
It seems odd that it is trying to fetch the study instead of a particular dataset. Why does an EC2 instance need to fetch the study? It is only supposed to be training a single dataset. Also, perhaps it makes sense to have a logarithmic retry loop on accessing the data? This currently seems quite brittle.