-
Notifications
You must be signed in to change notification settings - Fork 347
Description
Problem Statement
The topology mapper is currently generating suboptimal mappings for the 1x32 Single Galaxy topology. It has been observed that the mapper aggressively utilizes QSFP torus connections rather than prioritizing local links.
Background Information
This issue was identified while resolving folding problems with multi-host configurations. The "heinous mapping" results in excessive latency and poor performance for 1x32 configurations due to the unnecessary use of longer-range QSFP links. This is being tracked preemptively in case the Forge team encounters performance bottlenecks.
Example
The mapper assigns QSFP torus connections frequently (e.g., "every other node") without regarding the latency cost.
Code Snippets
N/A
Expected Behavior
The Topology Solver should be updated to include specific costing and constraints that penalize the use of QSFP links for the 1x32 Single Galaxy topology. The solver must prioritize local/lower-latency paths where possible.
Testing
Verify that the generated topology for a 1x32 Single Galaxy configuration utilizes minimal QSFP links, reserving them only for necessary long-distance hops.
Metadata
- Priority: P2
- Source: Slack Thread