[xla:pjrt:gpu] Pass network nodes to LocalTopologyProto#38009
Open
ezhulenev wants to merge 1 commit intoopenxla:mainfrom
Open
[xla:pjrt:gpu] Pass network nodes to LocalTopologyProto#38009ezhulenev wants to merge 1 commit intoopenxla:mainfrom
ezhulenev wants to merge 1 commit intoopenxla:mainfrom
Conversation
mwhittaker
approved these changes
Feb 20, 2026
copybara-service bot
pushed a commit
that referenced
this pull request
Feb 20, 2026
Imported from GitHub PR #38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- e74b1a5 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=#38009 from ezhulenev:network-topology-1 e74b1a5 PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Feb 20, 2026
Imported from GitHub PR openxla/xla#38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- e74b1a5e48fdca9cc8d9ee4d437416a380af9b7a by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#38009 from ezhulenev:network-topology-1 e74b1a5e48fdca9cc8d9ee4d437416a380af9b7a PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
that referenced
this pull request
Feb 20, 2026
Imported from GitHub PR #38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- e74b1a5 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=#38009 from ezhulenev:network-topology-1 e74b1a5 PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Feb 20, 2026
Imported from GitHub PR openxla/xla#38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- e74b1a5e48fdca9cc8d9ee4d437416a380af9b7a by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#38009 from ezhulenev:network-topology-1 e74b1a5e48fdca9cc8d9ee4d437416a380af9b7a PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
that referenced
this pull request
Feb 20, 2026
Imported from GitHub PR #38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- e74b1a5 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=#38009 from ezhulenev:network-topology-1 e74b1a5 PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
that referenced
this pull request
Feb 20, 2026
Imported from GitHub PR #38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- e74b1a5 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=#38009 from ezhulenev:network-topology-1 e74b1a5 PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Feb 20, 2026
Imported from GitHub PR openxla/xla#38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- e74b1a5e48fdca9cc8d9ee4d437416a380af9b7a by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#38009 from ezhulenev:network-topology-1 e74b1a5e48fdca9cc8d9ee4d437416a380af9b7a PiperOrigin-RevId: 872989879
hyeontaek
reviewed
Feb 23, 2026
xla/python/pjrt_ifrt/pjrt_client.cc
Outdated
| if (options.sort_devices_by_process_index) { | ||
| absl::c_sort(devices, [](const std::unique_ptr<PjRtDevice>& a, | ||
| const std::unique_ptr<PjRtDevice>& b) { | ||
| absl::c_stable_sort(devices, [](const std::unique_ptr<PjRtDevice>& a, |
Contributor
There was a problem hiding this comment.
Thanks Eugene!
Could you elaborate the context of this change? It is unclear to me if stable sorting helps. Since IFRT requires all IFRT devices IDs to be unique within a single IFRT client, and PjRtDevice here is xla::ifrt::PjRtDevice, not xla::PjRtDevice, I think both sorting below would not see any duplicate keys, and thus stable sorting is not necessary.
Contributor
Author
There was a problem hiding this comment.
You are right, forgot what was my motivation last week :) reverted it
e74b1a5 to
fb1aff8
Compare
hyeontaek
reviewed
Feb 23, 2026
| VLOG(3) << "Global topology for platform " << platform << ":\n" | ||
| << global_topology->DebugString(); | ||
|
|
||
| // Because we might do global topology assignment based on network proximity |
Contributor
There was a problem hiding this comment.
Nit: When a loop is only doing VLOG, consider using the following pattern so that we can skip looping completely when debug logging is not enabled:
if (VLOG_IS_ON(3)) {
VLOG(3) << ...;
for (...) {
VLOG(3) << ...;
}
}
fb1aff8 to
4bcbb73
Compare
hyeontaek
approved these changes
Feb 23, 2026
copybara-service bot
pushed a commit
to tensorflow/tensorflow
that referenced
this pull request
Feb 23, 2026
Imported from GitHub PR openxla/xla#38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- 4bcbb73be85e8c033bd3e580de2013a57bed704b by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#38009 from ezhulenev:network-topology-1 4bcbb73be85e8c033bd3e580de2013a57bed704b PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
that referenced
this pull request
Feb 23, 2026
Imported from GitHub PR #38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- 4bcbb73 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=#38009 from ezhulenev:network-topology-1 4bcbb73 PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
that referenced
this pull request
Feb 23, 2026
Imported from GitHub PR #38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- 4bcbb73 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=#38009 from ezhulenev:network-topology-1 4bcbb73 PiperOrigin-RevId: 872989879
copybara-service bot
pushed a commit
that referenced
this pull request
Feb 23, 2026
Imported from GitHub PR #38009 Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment. JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178 Copybara import of the project: -- 4bcbb73 by Eugene Zhulenev <ezhulenev@openxla.org>: [xla:pjrt:gpu] Pass network nodes to LocalTopologyProto Merging this change closes #38009 FUTURE_COPYBARA_INTEGRATE_REVIEW=#38009 from ezhulenev:network-topology-1 4bcbb73 PiperOrigin-RevId: 872989879
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Don't forget to pass network nodes to local topology proto, so that the coordinator process can use them to do network-topology-optimized global device assignment.
JAX PR that allows keeping devices sorted by assigned global device id: jax-ml/jax#35178