Skip to content

Optimize 1x32 Single Galaxy Topology Mapping to Reduce QSFP Usage #37549

@Riddy21

Description

@Riddy21

Problem Statement

The topology mapper is currently generating suboptimal mappings for the 1x32 Single Galaxy topology. It has been observed that the mapper aggressively utilizes QSFP torus connections rather than prioritizing local links.

Background Information

This issue was identified while resolving folding problems with multi-host configurations. The "heinous mapping" results in excessive latency and poor performance for 1x32 configurations due to the unnecessary use of longer-range QSFP links. This is being tracked preemptively in case the Forge team encounters performance bottlenecks.

Example

The mapper assigns QSFP torus connections frequently (e.g., "every other node") without regarding the latency cost.

Code Snippets

N/A

Expected Behavior

The Topology Solver should be updated to include specific costing and constraints that penalize the use of QSFP links for the 1x32 Single Galaxy topology. The solver must prioritize local/lower-latency paths where possible.

Testing

Verify that the generated topology for a 1x32 Single Galaxy configuration utilizes minimal QSFP links, reserving them only for necessary long-distance hops.

Metadata

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions