Skip to content

[BH GLX] Device Init Timeout & FW Failure Regression on Main #37553

@Riddy21

Description

@Riddy21

Problem Statement

Tests are failing consistently on the latest main branch with a device timeout error on BH GLX systems.

Background Information

  • Context: Users (Yu Gao, Allan Liu) are reporting consistent failures on Blackhole Galaxy (BH GLX) systems when running tests on the latest main branch.
  • Regression Status: This is identified as a regression. The failure rate has increased from intermittent (10-20%) to consistent (100%) in the ~100 commits preceding Jan 30, 2026.
  • Environment: BH GLX (Observed on both single chip configuration and 8-chip submesh).
  • Impact: High - Currently blocking tests on BH GLX; failure rate is effectively 100%.

Example / Logs

The system fails to initialize, throwing a Timeout (10000 ms) error waiting for physical cores, followed by a firmware initialization failure.

Key Logs:

Fabric | TopologyMapper: Using 2 pinning(s) for mesh 0...
2026-01-30 15:20:35.843 | error    | Metal | Timeout detected (metal_context.cpp:1921)
2026-01-30 15:20:35.843 | critical | Always | TT_THROW: Device 2: Timeout (10000 ms) waiting for physical cores to finish: (x=29,y=25). (assert.hpp:104)
2026-01-30 15:20:35.844 | critical | Always | TT_THROW: Device 2 init: failed to initialize FW! Try resetting the board. (assert.hpp:104)

Expected Behaviour

The device should initialize correctly, and tests should pass without timeouts on the main branch, restoring stability to the BH GLX CI pipeline.

Testing / Steps to Reproduce

  1. Access a BH GLX system.
  2. Checkout the latest main branch of tenstorrent/tt-metal.
  3. Run standard fabric/metal tests (e.g., as observed by Yu Gao on an 8-chip submesh or Allan Liu on chip 0).
  4. Observe the timeout error during device initialization.

Reference: Slack Thread

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions