Skip to content

Commit bb5be5b

Browse files
authored
Fix incorrect core type check in Blackhole Active Erisc status print (#36851)
### Ticket #36833 ### Problem description - Active erisc status got printed for non aerisc cores because we weren't properly checking the core type ### What's changed - Use the correct function to determine if the core is active erisc ### Checklist - [ ] [![All post-commit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml/badge.svg?branch=nhuang/fix-2)](https://github.com/tenstorrent/tt-metal/actions/workflows/all-post-commit-workflows.yaml?query=branch:nhuang/fix-2) - [ ] [![Blackhole Post commit](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml/badge.svg?branch=nhuang/fix-2)](https://github.com/tenstorrent/tt-metal/actions/workflows/blackhole-post-commit.yaml?query=branch:nhuang/fix-2) - [ ] [![cpp-unit-tests](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml/badge.svg?branch=nhuang/fix-2)](https://github.com/tenstorrent/tt-metal/actions/workflows/tt-metal-l2-nightly.yaml?query=branch:nhuang/fix-2) - [ ] New/Existing tests provide coverage for changes #### Model tests If your changes cover model-related code, you should run tests corresponding to affected models and platforms (Single card, T3K, Galaxy). "Choose your pipeline" workflows facilitate running multiple kinds of tests in a single run. Each offers `models-mandatory` and `models-extended` presets. The former includes a minimal set of tests, to be run always. The latter extends that with additional ones - use your best judgement in deciding which is the most appropriate for your PR. - [ ] [![(Single) Choose your pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml/badge.svg?branch=nhuang/fix-2)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select.yaml?query=branch:nhuang/fix-2) - [ ] `models-mandatory` preset (runs: [Device perf regressions](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml) and [Frequent model and ttnn tests](https://github.com/tenstorrent/tt-metal/actions/workflows/fast-dispatch-full-regressions-and-models.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/single-card-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/perf-models.yaml) tests) - [ ] other selection - specify runs - [ ] [![(T3K) Choose your pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml/badge.svg?branch=nhuang/fix-2)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-t3k.yaml?query=branch:nhuang/fix-2) - [ ] `models-mandatory` preset (runs: [Unit tests](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-unit-tests.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/t3000-model-perf-tests.yaml) tests) - [ ] other selection - specify runs - [ ] [![(Galaxy) Choose your pipeline](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml/badge.svg?branch=nhuang/fix-2)](https://github.com/tenstorrent/tt-metal/actions/workflows/pipeline-select-galaxy.yaml?query=branch:nhuang/fix-2) - [ ] `models-mandatory` preset (runs: [Quick tests](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-quick.yaml)) - [ ] `models-extended` preset (runs: the mandatory tests, plus [Demo](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-demo-tests.yaml) and [Model perf](https://github.com/tenstorrent/tt-metal/actions/workflows/galaxy-model-perf-tests.yaml) tests) - [ ] other selection - specify runs
1 parent 8a330e7 commit bb5be5b

File tree

1 file changed

+30
-37
lines changed

1 file changed

+30
-37
lines changed

tt_metal/llrt/llrt.cpp

Lines changed: 30 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -28,32 +28,6 @@
2828
#include <umd/device/types/core_coordinates.hpp>
2929
#include <llrt/tt_cluster.hpp>
3030

31-
namespace {
32-
void print_aerisc_training_status(tt::ChipId device_id, const CoreCoord& virtual_core) {
33-
const auto& hal = tt::tt_metal::MetalContext::instance().hal();
34-
if (!hal.get_dispatch_feature_enabled(tt::tt_metal::DispatchFeature::ETH_MAILBOX_API)) {
35-
return;
36-
}
37-
const auto port_status_addr = hal.get_eth_fw_mailbox_val(tt::tt_metal::FWMailboxMsg::PORT_STATUS);
38-
const auto retrain_count_addr = hal.get_eth_fw_mailbox_val(tt::tt_metal::FWMailboxMsg::RETRAIN_COUNT);
39-
const auto rx_link_up_addr = hal.get_eth_fw_mailbox_val(tt::tt_metal::FWMailboxMsg::RX_LINK_UP);
40-
uint32_t port_status = tt::tt_metal::MetalContext::instance().get_cluster().read_core(
41-
device_id, virtual_core, port_status_addr, sizeof(uint32_t))[0];
42-
uint32_t retrain_count = tt::tt_metal::MetalContext::instance().get_cluster().read_core(
43-
device_id, virtual_core, retrain_count_addr, sizeof(uint32_t))[0];
44-
uint32_t rx_link_up = tt::tt_metal::MetalContext::instance().get_cluster().read_core(
45-
device_id, virtual_core, rx_link_up_addr, sizeof(uint32_t))[0];
46-
log_critical(
47-
tt::LogMetal,
48-
"Device {}: Virtual core {}, Port status: {:#x}, Retrain count: {:#x}, Rx link up: {:#x}",
49-
device_id,
50-
virtual_core.str(),
51-
port_status,
52-
retrain_count,
53-
rx_link_up);
54-
}
55-
} // namespace
56-
5731
// llrt = lower-level runtime
5832
namespace tt::llrt {
5933

@@ -272,12 +246,6 @@ void write_binary_to_address(const ll_api::memory& mem, tt::ChipId chip_id, cons
272246

273247
namespace internal_ {
274248

275-
bool is_active_eth_core(tt::ChipId chip_id, const CoreCoord& core) {
276-
auto active_eth_cores =
277-
tt::tt_metal::MetalContext::instance().get_control_plane().get_active_ethernet_cores(chip_id);
278-
return active_eth_cores.contains(logical_core_from_ethernet_core(chip_id, core));
279-
}
280-
281249
namespace {
282250

283251
bool check_if_riscs_on_specified_core_done(tt::ChipId chip_id, const CoreCoord& core, int run_state) {
@@ -310,6 +278,33 @@ bool check_if_riscs_on_specified_core_done(tt::ChipId chip_id, const CoreCoord&
310278
return get_mailbox_is_done(go_msg_addr);
311279
}
312280

281+
void print_aerisc_training_status(tt::ChipId device_id, const CoreCoord& virtual_core) {
282+
const auto& hal = tt::tt_metal::MetalContext::instance().hal();
283+
if (!hal.get_dispatch_feature_enabled(tt::tt_metal::DispatchFeature::ETH_MAILBOX_API)) {
284+
return;
285+
}
286+
if (get_core_type(device_id, virtual_core) != tt::tt_metal::HalProgrammableCoreType::ACTIVE_ETH) {
287+
return;
288+
}
289+
const auto port_status_addr = hal.get_eth_fw_mailbox_val(tt::tt_metal::FWMailboxMsg::PORT_STATUS);
290+
const auto retrain_count_addr = hal.get_eth_fw_mailbox_val(tt::tt_metal::FWMailboxMsg::RETRAIN_COUNT);
291+
const auto rx_link_up_addr = hal.get_eth_fw_mailbox_val(tt::tt_metal::FWMailboxMsg::RX_LINK_UP);
292+
uint32_t port_status = tt::tt_metal::MetalContext::instance().get_cluster().read_core(
293+
device_id, virtual_core, port_status_addr, sizeof(uint32_t))[0];
294+
uint32_t retrain_count = tt::tt_metal::MetalContext::instance().get_cluster().read_core(
295+
device_id, virtual_core, retrain_count_addr, sizeof(uint32_t))[0];
296+
uint32_t rx_link_up = tt::tt_metal::MetalContext::instance().get_cluster().read_core(
297+
device_id, virtual_core, rx_link_up_addr, sizeof(uint32_t))[0];
298+
log_critical(
299+
tt::LogMetal,
300+
"Device {}: Virtual core {}, Port status: {:#x}, Retrain count: {:#x}, Rx link up: {:#x}",
301+
device_id,
302+
virtual_core.str(),
303+
port_status,
304+
retrain_count,
305+
rx_link_up);
306+
}
307+
313308
} // namespace
314309

315310
void wait_until_cores_done(
@@ -334,9 +329,8 @@ void wait_until_cores_done(
334329
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(now - start).count();
335330
if (elapsed > timeout_ms) {
336331
for (const auto& core : not_done_phys_cores) {
337-
if (internal_::is_active_eth_core(device_id, core)) {
338-
print_aerisc_training_status(device_id, core);
339-
}
332+
// only prints if the core is an active ethernet core
333+
print_aerisc_training_status(device_id, core);
340334
}
341335
std::string cores = fmt::format("{}", fmt::join(not_done_phys_cores, ", "));
342336

@@ -395,9 +389,8 @@ void send_msg_to_eth_mailbox(
395389
TT_THROW("Ethernet mailbox API not supported on device {}", device_id);
396390
}
397391

398-
bool is_eth_core = internal_::is_active_eth_core(device_id, virtual_core);
399392
TT_ASSERT(
400-
is_eth_core,
393+
get_core_type(device_id, virtual_core) == tt_metal::HalProgrammableCoreType::ACTIVE_ETH,
401394
"target core for send_msg_to_eth_mailbox {} (virtual) must be an active ethernet core",
402395
virtual_core.str());
403396

0 commit comments

Comments
 (0)