Skip to content

[TT-Train]: GCC build failure — 5 device operation headers use self-referential using declarations #37922

@epam-iaroslav-voitovych

Description

Component / Area

TT-Train / Metal Ops / Device Operations

Issue Type (optional)

Other

Observed

tt-train fails to compile with GCC (any version) due to self-referential using type alias declarations in 5 device operation headers. The pattern using operation_attributes_t = operation_attributes_t; inside a struct creates a member name that shadows the namespace-scope type per C++ standard [basic.scope.class]/2. GCC correctly rejects this:

error: declaration of 'using operation_attributes_t = struct ttml::metal::ops::layernorm_fw::device::operation_attributes_t'
       changes meaning of 'operation_attributes_t' [-fpermissive]

Clang accepts this code (more lenient name lookup), but GCC 12, 13, and 14 all reject it.

Expected

tt-train should compile with both Clang and GCC, consistent with tt-metal's own compiler support (Clang 17+ and GCC 12+, with toolchain files for both).

1. Steps (exact commands)

# Build tt-metal with clang-20 first (succeeds)
source python_env/bin/activate
./build_metal.sh --debug --build-all --enable-ccache
deactivate

# Attempt standalone tt-train build with GCC 12
cd tt-train
rm -rf build/
cmake -DCMAKE_BUILD_TYPE=Debug \
      -DTT_TRAIN_INHERIT_COMPILER=OFF \
      -DCMAKE_C_COMPILER=gcc-12 \
      -DCMAKE_CXX_COMPILER=g++-12 \
      -B build -GNinja
cmake --build build --config Debug
# Fails with "changes meaning of" errors in 5 headers

2. Input data / link or description

No input data needed — build-time issue.

Broken pattern (5 files — unqualified, self-referential):

namespace ttml::metal::ops::layernorm_fw::device {
struct LayerNormForwardDeviceOperation {
    using operation_attributes_t = operation_attributes_t;      // BUG: self-referential
    using tensor_args_t = tensor_args_t;
    using spec_return_value_t = spec_return_value_t;
    using tensor_return_value_t = tensor_return_value_t;
};
}

Correct pattern (8 other files in the same codebase — fully qualified):

namespace ttml::metal::ops::rmsnorm_fw::device {
struct RMSNormForwardDeviceOperation {
    using operation_attributes_t = ttml::metal::ops::rmsnorm_fw::device::operation_attributes_t;
    using tensor_args_t = ttml::metal::ops::rmsnorm_fw::device::tensor_args_t;
    using spec_return_value_t = ttml::metal::ops::rmsnorm_fw::device::spec_return_value_t;
    using tensor_return_value_t = ttml::metal::ops::rmsnorm_fw::device::tensor_return_value_t;
};
}

Affected files (5):

File Namespace
tt-train/sources/ttml/metal/ops/layernorm_fw/device/layernorm_fw_device_operation.hpp ttml::metal::ops::layernorm_fw::device
tt-train/sources/ttml/metal/ops/layernorm_bw/device/layernorm_bw_device_operation.hpp ttml::metal::ops::layernorm_bw::device
tt-train/sources/ttml/metal/ops/swiglu_fw/device/swiglu_fw_device_operation.hpp ttml::metal::ops::swiglu_fw::device
tt-train/sources/ttml/metal/optimizers/adamw/device/adamw_device_operation.hpp ttml::metal::optimizers::adamw::device
tt-train/sources/ttml/metal/optimizers/sgd_fused/device/sgd_fused_device_operation.hpp ttml::metal::optimizers::sgd_fused::device

Already correct files (8): rmsnorm_fw, rmsnorm_bw, softmax, silu_bw, cross_entropy_fw, cross_entropy_bw, sdpa_fw, profiler_no_op.

3. Frequency

Always — deterministic build failure with any GCC version.

1. Software Versions

  • GCC 12.3.0 (tested in Docker container), affects all GCC versions
  • Clang 17+ accepts the code
  • tt-metal main at commit 26bdadf292 (2026-02-15)
  • Ubuntu 22.04

2. Hardware Details

N/A — build-time issue, not hardware-dependent.

Is this a regression?

No

Regression Details

The unqualified pattern was introduced when these 5 operations were added. It was not caught because tt-train CI only builds with Clang. The 8 other operation files were written correctly from the start.

Logs & Diagnostics

Full error output for layernorm_fw (same pattern repeats for all 5 files):

tt-train/sources/ttml/metal/ops/layernorm_fw/device/layernorm_fw_device_operation.hpp:16:11:
  error: declaration of 'using operation_attributes_t = struct ttml::metal::ops::layernorm_fw::device::operation_attributes_t'
         changes meaning of 'operation_attributes_t' [-fpermissive]
   16 |     using operation_attributes_t = operation_attributes_t;
      |           ^~~~~~~~~~~~~~~~~~~~~
tt-train/sources/ttml/metal/ops/layernorm_fw/device/layernorm_fw_device_operation_types.hpp:15:8:
  note: 'operation_attributes_t' declared here as 'struct ttml::metal::ops::layernorm_fw::device::operation_attributes_t'
   15 | struct operation_attributes_t {
      |        ^~~~~~~~~~~~~~~~~~~~~

Same error repeats for tensor_args_t, spec_return_value_t, and tensor_return_value_t in each of the 5 files.

Priority

P3

Impact

Prevents building tt-train with GCC. The fix is trivial — add full namespace qualification to the using declarations in 5 files, matching the pattern already used by 8 other operations in the same codebase (one-line change per type alias, 20 lines total).

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

Status

🆕 New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions