-
Notifications
You must be signed in to change notification settings - Fork 346
Description
Component / Area
TT-Train / Metal Ops / Device Operations
Issue Type (optional)
Other
Observed
tt-train fails to compile with GCC (any version) due to self-referential using type alias declarations in 5 device operation headers. The pattern using operation_attributes_t = operation_attributes_t; inside a struct creates a member name that shadows the namespace-scope type per C++ standard [basic.scope.class]/2. GCC correctly rejects this:
error: declaration of 'using operation_attributes_t = struct ttml::metal::ops::layernorm_fw::device::operation_attributes_t'
changes meaning of 'operation_attributes_t' [-fpermissive]
Clang accepts this code (more lenient name lookup), but GCC 12, 13, and 14 all reject it.
Expected
tt-train should compile with both Clang and GCC, consistent with tt-metal's own compiler support (Clang 17+ and GCC 12+, with toolchain files for both).
1. Steps (exact commands)
# Build tt-metal with clang-20 first (succeeds)
source python_env/bin/activate
./build_metal.sh --debug --build-all --enable-ccache
deactivate
# Attempt standalone tt-train build with GCC 12
cd tt-train
rm -rf build/
cmake -DCMAKE_BUILD_TYPE=Debug \
-DTT_TRAIN_INHERIT_COMPILER=OFF \
-DCMAKE_C_COMPILER=gcc-12 \
-DCMAKE_CXX_COMPILER=g++-12 \
-B build -GNinja
cmake --build build --config Debug
# Fails with "changes meaning of" errors in 5 headers2. Input data / link or description
No input data needed — build-time issue.
Broken pattern (5 files — unqualified, self-referential):
namespace ttml::metal::ops::layernorm_fw::device {
struct LayerNormForwardDeviceOperation {
using operation_attributes_t = operation_attributes_t; // BUG: self-referential
using tensor_args_t = tensor_args_t;
using spec_return_value_t = spec_return_value_t;
using tensor_return_value_t = tensor_return_value_t;
};
}Correct pattern (8 other files in the same codebase — fully qualified):
namespace ttml::metal::ops::rmsnorm_fw::device {
struct RMSNormForwardDeviceOperation {
using operation_attributes_t = ttml::metal::ops::rmsnorm_fw::device::operation_attributes_t;
using tensor_args_t = ttml::metal::ops::rmsnorm_fw::device::tensor_args_t;
using spec_return_value_t = ttml::metal::ops::rmsnorm_fw::device::spec_return_value_t;
using tensor_return_value_t = ttml::metal::ops::rmsnorm_fw::device::tensor_return_value_t;
};
}Affected files (5):
| File | Namespace |
|---|---|
tt-train/sources/ttml/metal/ops/layernorm_fw/device/layernorm_fw_device_operation.hpp |
ttml::metal::ops::layernorm_fw::device |
tt-train/sources/ttml/metal/ops/layernorm_bw/device/layernorm_bw_device_operation.hpp |
ttml::metal::ops::layernorm_bw::device |
tt-train/sources/ttml/metal/ops/swiglu_fw/device/swiglu_fw_device_operation.hpp |
ttml::metal::ops::swiglu_fw::device |
tt-train/sources/ttml/metal/optimizers/adamw/device/adamw_device_operation.hpp |
ttml::metal::optimizers::adamw::device |
tt-train/sources/ttml/metal/optimizers/sgd_fused/device/sgd_fused_device_operation.hpp |
ttml::metal::optimizers::sgd_fused::device |
Already correct files (8): rmsnorm_fw, rmsnorm_bw, softmax, silu_bw, cross_entropy_fw, cross_entropy_bw, sdpa_fw, profiler_no_op.
3. Frequency
Always — deterministic build failure with any GCC version.
1. Software Versions
- GCC 12.3.0 (tested in Docker container), affects all GCC versions
- Clang 17+ accepts the code
- tt-metal main at commit
26bdadf292(2026-02-15) - Ubuntu 22.04
2. Hardware Details
N/A — build-time issue, not hardware-dependent.
Is this a regression?
No
Regression Details
The unqualified pattern was introduced when these 5 operations were added. It was not caught because tt-train CI only builds with Clang. The 8 other operation files were written correctly from the start.
Logs & Diagnostics
Full error output for layernorm_fw (same pattern repeats for all 5 files):
tt-train/sources/ttml/metal/ops/layernorm_fw/device/layernorm_fw_device_operation.hpp:16:11:
error: declaration of 'using operation_attributes_t = struct ttml::metal::ops::layernorm_fw::device::operation_attributes_t'
changes meaning of 'operation_attributes_t' [-fpermissive]
16 | using operation_attributes_t = operation_attributes_t;
| ^~~~~~~~~~~~~~~~~~~~~
tt-train/sources/ttml/metal/ops/layernorm_fw/device/layernorm_fw_device_operation_types.hpp:15:8:
note: 'operation_attributes_t' declared here as 'struct ttml::metal::ops::layernorm_fw::device::operation_attributes_t'
15 | struct operation_attributes_t {
| ^~~~~~~~~~~~~~~~~~~~~
Same error repeats for tensor_args_t, spec_return_value_t, and tensor_return_value_t in each of the 5 files.
Priority
P3
Impact
Prevents building tt-train with GCC. The fix is trivial — add full namespace qualification to the using declarations in 5 files, matching the pattern already used by 8 other operations in the same codebase (one-line change per type alias, 20 lines total).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status