-
Notifications
You must be signed in to change notification settings - Fork 347
Closed as not planned
Closed as not planned
Copy link
Description
Summary
The topology mapper currently generates suboptimal mappings for the 1x32 Single Galaxy topology. It has been observed that the mapper aggressively utilizes QSFP torus connections for routing between nodes, treating them with insufficient cost relative to local links.
Context
- Reporter: Ridvan Song
- Impact: This mapping strategy results in excessive use of QSFP links (potentially "every other node"), leading to increased latency and poor performance characteristics for this topology.
- Stakeholders: This issue is being tracked preemptively in case the Forge team (e.g., Uros Males, Vladimir Jovanovic) encounters performance bottlenecks with 1x32 topologies.
Requirements
- Update the Topology Solver to include specific costing and constraints that penalize the use of QSFP links for the 1x32 Single Galaxy topology.
- The solver should prioritize local/lower-latency paths where possible.
Examples
- Current Behavior: The mapper assigns QSFP torus connections frequently (e.g., every other node) without regarding the latency cost.
- Desired Behavior: The mapper generates a topology that minimizes QSFP usage, reserving it only for necessary long-distance hops.
Planning / Metadata
- Repository: tenstorrent/tt-metal
- Board: Control Plane TT-Distributed
- Assignee: @Riddy21
- Priority: P2
- Labels: scale-out, topology-mapper, performance
- Status: New Issue
Reactions are currently unavailable