[Bug] ray cluster detection is broken on startup

```
root@psap-h200-au-syd-3:/workspace/auto-tuning-vllm# auto-tune-vllm optimize --config examples/kimi-k2.yaml --python-executable python3 --max-concurrent 1
Starting auto-tune-vllm optimization
Configuration: examples/kimi-k2.yaml
Backend: ray
Python executable: python3
/usr/local/lib/python3.12/dist-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
Generated study name: kimi-k2-code_13948 from prefix: kimi-k2-code
Loaded study: kimi-k2-code_13948
__ Saved study config to: optuna_studies/kimi-k2-code_13948/study_config.yaml
2025-10-09 05:45:48,623 INFO worker.py:1771 -- Connecting to existing Ray cluster at address: 10.245.128.4:6379...




^C[2025-10-09 05:45:53,633 W 141339 141339] gcs_rpc_client.h:155: Failed to connect to GCS at address 10.245.128.4:6379 within 5 seconds.
[2025-10-09 05:46:23,635 W 141339 141339] gcs_client.cc:184: Failed to get cluster ID from GCS server: TimedOut: Timed out while waiting for GCS to become available.
```

A more elegant check of ray cluster availability needs to be done. This does not respond to user input until it finishes the chain of timeouts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] ray cluster detection is broken on startup #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] ray cluster detection is broken on startup #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions