Skip to content

code 107: Database connection error. #229

Closed
@Innixma

Description

@Innixma

From runbenchmark.20201223T011330.log file saved to S3 (same error occurred on all datasets)

[DEBUG] [amlb.benchmark:01:13:30.196] Using constraint definition: { 'cores': 8,
  'folds': 10,
  'max_runtime_seconds': 3600,
  'min_vol_size_mb': 1000000,
  'name': '1h8c'}.
[INFO] [amlb.benchmarks.openml:01:13:30.196] Loading openml suite 269.
[ERROR] [amlb:01:13:30.498] https://www.openml.org/api/v1/xml/study/269 returned code 107: Database connection error. Usually due to high server load. Please wait for N seconds and try again. - None
Traceback (most recent call last):
  File "runbenchmark.py", line 118, in <module>
    bench = amlb.Benchmark(args.framework, args.benchmark, args.constraint)
  File "/repo/amlb/benchmark.py", line 75, in __init__
    self.benchmark_def, self.benchmark_name, self.benchmark_path = rget().benchmark_definition(benchmark_name, self.constraint_def)
  File "/repo/amlb/resources.py", line 181, in benchmark_definition
    hard_defaults, tasks, benchmark_path, benchmark_name = benchmark_load(name, self.config.benchmarks.definition_dir)
  File "/repo/amlb/benchmarks/parser.py", line 19, in benchmark_load
    benchmark_name, benchmark_path, tasks = load_oml_benchmark(name)
  File "/repo/amlb/benchmarks/openml.py", line 38, in load_oml_benchmark
    suite = openml.study.get_suite(oml_id)
  File "/repo/venv/lib/python3.7/site-packages/openml/study/functions.py", line 29, in get_suite
    suite = cast(OpenMLBenchmarkSuite, _get_study(suite_id, entity_type="task"))
  File "/repo/venv/lib/python3.7/site-packages/openml/study/functions.py", line 71, in _get_study
    xml_string = openml._api_calls._perform_api_call(call_suffix, "get")
  File "/repo/venv/lib/python3.7/site-packages/openml/_api_calls.py", line 62, in _perform_api_call
    __check_response(response, url, file_elements)
  File "/repo/venv/lib/python3.7/site-packages/openml/_api_calls.py", line 143, in __check_response
    raise __parse_server_exception(response, url, file_elements=file_elements)
openml.exceptions.OpenMLServerException: https://www.openml.org/api/v1/xml/study/269 returned code 107: Database connection error. Usually due to high server load. Please wait for N seconds and try again. - None

This was caused by the following:

python runbenchmark.py AutoGluon openml/s/269 1h8c -m aws -f 0 -p 100

Note: This is on a fork of the automlbenchmark repo where I have set max p to 100.

Running test datasets works correctly.

When I run the classification datasets, ~45 of the 66 datasets fail with similar errors, while the others succeed.

python runbenchmark.py AutoGluon openml/s/271 1h8c -m aws -f 0 -p 100

This used to work fine in previous versions of automlbenchmark (From May 28th) where I would sometimes even use -p 400 equivalents with the original 39 datasets.

It seems odd that it is trying to fetch the study instead of a particular dataset. Why does an EC2 instance need to fetch the study? It is only supposed to be training a single dataset. Also, perhaps it makes sense to have a logarithmic retry loop on accessing the data? This currently seems quite brittle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions