Fix Arrow Flight host resolution and skip PlantCAD dry-run#5064
Fix Arrow Flight host resolution and skip PlantCAD dry-run#5064wmoss wants to merge 1 commit intomarin-community:mainfrom
Conversation
Extend _resolve_advertise_host to treat .localdomain suffixes (e.g. Mac.localdomain) as localhost-only, the same way .local is already handled. gRPC's c-ares resolver resolves these to a network IP that can't reach the server bound to 0.0.0.0, causing connection refused in local tests. Mark exp1729_plantcad_eval.py with nodryrun since it imports torch at module level, which is not available in the dev environment.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7a56e0ff4d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if hostname.endswith(".local") or hostname.endswith(".localdomain"): | ||
| return "localhost" |
There was a problem hiding this comment.
Do not advertise localhost for .localdomain hosts
This change makes every non-GCP machine with a hostname ending in .localdomain advertise localhost, which is only reachable from the same host. In distributed runs where metadata lookup fails (the normal non-GCP path), update_server() publishes these addresses and remote workers then connect to themselves instead of the trainer host, causing Arrow Flight weight fetches to fail. The previous behavior at least advertised a routable hostname for this case.
Useful? React with 👍 / 👎.
Tests were failing locally, I let Claude loose to fix it (interestingly, since I cannot add labels to PRs, Claude can't label it as
agent-generated).Extend _resolve_advertise_host to treat .localdomain suffixes (e.g. Mac.localdomain) as localhost-only, the same way .local is already handled. gRPC's c-ares resolver resolves these to a network IP that can't reach the server bound to 0.0.0.0, causing connection refused in local tests.
Mark exp1729_plantcad_eval.py with nodryrun since it imports torch at module level, which is not available in the dev environment.