Skip to content

Fix Arrow Flight host resolution and skip PlantCAD dry-run#5064

Open
wmoss wants to merge 1 commit intomarin-community:mainfrom
wmoss:fix/arrow-flight-localdomain-hostname
Open

Fix Arrow Flight host resolution and skip PlantCAD dry-run#5064
wmoss wants to merge 1 commit intomarin-community:mainfrom
wmoss:fix/arrow-flight-localdomain-hostname

Conversation

@wmoss
Copy link
Copy Markdown
Contributor

@wmoss wmoss commented Apr 22, 2026

Tests were failing locally, I let Claude loose to fix it (interestingly, since I cannot add labels to PRs, Claude can't label it as agent-generated).

Extend _resolve_advertise_host to treat .localdomain suffixes (e.g. Mac.localdomain) as localhost-only, the same way .local is already handled. gRPC's c-ares resolver resolves these to a network IP that can't reach the server bound to 0.0.0.0, causing connection refused in local tests.

Mark exp1729_plantcad_eval.py with nodryrun since it imports torch at module level, which is not available in the dev environment.

Extend _resolve_advertise_host to treat .localdomain suffixes (e.g.
Mac.localdomain) as localhost-only, the same way .local is already
handled. gRPC's c-ares resolver resolves these to a network IP that
can't reach the server bound to 0.0.0.0, causing connection refused
in local tests.

Mark exp1729_plantcad_eval.py with nodryrun since it imports torch at
module level, which is not available in the dev environment.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a56e0ff4d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +99 to 100
if hostname.endswith(".local") or hostname.endswith(".localdomain"):
return "localhost"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not advertise localhost for .localdomain hosts

This change makes every non-GCP machine with a hostname ending in .localdomain advertise localhost, which is only reachable from the same host. In distributed runs where metadata lookup fails (the normal non-GCP path), update_server() publishes these addresses and remote workers then connect to themselves instead of the trainer host, causing Arrow Flight weight fetches to fail. The previous behavior at least advertised a routable hostname for this case.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant