Skip to content

Conversation

@shutovilyaep
Copy link

Build System Modernization & PyPI Readiness

Summary

Modernizes the build system to eliminate manual environment setup and enable PyPI distribution. The extension now works without LD_LIBRARY_PATH or other environment variables. Note: running tests against development builds requires setting TT_METAL_HOME environment variable due to a bug in current tt-metal Python path discovery for source builds (not needed for PyPI wheel installations where runtime assets are bundled). CI validates wheel creation and tests installed wheels to emulate PyPI user experience. This PR fixes compatibility with tt-metal's current main branch (Nov 2025) and addresses project freeze requirements to prolong project building without manual code changes.

Key Changes

1. RPATH Configuration and Binary Bundling

Problem: Extension couldn't find PyTorch/TT-Metal libraries at runtime.

Fix: Configure RPATH and bundle ttnn binaries via CMake during wheel build.

  • BUILD_RPATH: $ORIGIN:${PYTORCH_LIB_DIR}:${TT_METAL_LIB_DIR}
  • INSTALL_RPATH: $ORIGIN:$ORIGIN/../torch/lib
  • TT-Metal libraries (libtt_metal.so, libtt_stl.so, libdevice.so, libtracy.so, _ttnncpp.so) are bundled into the wheel during CMake installation
  • Bundled libraries are placed in torch_ttnn_cpp_extension/ directory alongside the extension module
  • RPATH set to $ORIGIN allows extension to find bundled libraries in the same directory

Result: No LD_LIBRARY_PATH needed. Bundling simplifies PyPI wheel creation by including required binaries directly in the wheel.

2. Dependency Management

Problem: pip install -e .[dev] replaced locally built ttnn with old PyPI version, causing API mismatches.

Fix: Moved ttnn to optional [pypi] extra (Python packaging standard).

# dependencies list no longer includes ttnn

[project.optional-dependencies]
pypi = ["ttnn @ <direct-url-to-wheel>"]  # Auto-updated by CI

Usage:

  • Dev: pip install -e .[dev] (uses local ttnn)
  • PyPI: pip install torch-ttnn[pypi] (downloads ttnn)

3. CI Wheel Testing (PyPI User Simulation)

CI now validates wheel creation and tests installed wheels to ensure PyPI compatibility.

Process:

  1. Run tests against development build (using submodule tt-metal)
  2. Build wheel from source
  3. Uninstall development packages (torch-ttnn and ttnn) to simulate clean PyPI environment
  4. Install wheel with [pypi] extra (downloads ttnn from PyPI)
  5. Run full test suite against PyPI-emulated package installation

Implementation:

  • build-test-release-wheel.yaml: Builds wheel, uninstalls dev installation, installs wheel with [pypi] extra, verifies installation
  • run-cpp-native-tests.yaml: Runs tests against development build, then builds wheel, uninstalls dev packages, and runs tests against PyPI-emulated installation
  • before_merge.yaml: Added build-wheel-check job that verifies wheel builds successfully before merge
  • Test jobs: test-wheel-smoke, test-wheel-lowering, test-wheel-model run comprehensive tests against installed wheels

This ensures wheels work correctly for PyPI users before release.

4. Consistent Compiler

Problem: scikit-build-core used GCC 11 instead of Clang-17.

Fix: Forced Clang-17 in pyproject.toml, removed hardcoded ABI flags (now auto-detected).

5. Source Code Updates

Code changes required due to tt-metal API changes and PyTorch 2.7.1 bump:

  • device.cpp: Updated header for tt-metal v0.64 compatibility
  • extension_utils.hpp: Added __FILE_NAME__ fallback for GCC 11 compatibility
  • ttnn_device_mode.py: Registered module as torch.ttnn (fixes import torch.ttnn due to PyTorch 2.7.1 change)
  • conftest.py: Updated to mesh_device API (tt-metal API change)

6. CI Infrastructure Updates

  • Docker images: Migrated CI workflows to use tt-metal Docker images instead of building pytorch2.0_ttnn-specific images
    • Deprecated building torch-ttnn Docker images (TODOs left in workflows for future reference)
    • Due to project freeze, pinned tt-metal Docker images to specific digests for reproducibility
    • Ensures consistent build environment and reduces maintenance overhead
  • run-cpp-native-tests.yaml:
    • Removed LD_LIBRARY_PATH exports
    • Added pyproject.toml to trigger paths (enables CI on ttnn updates)
    • Smart ttnn handling: Uses [pypi,dev] for ttnn update PRs (builds ttnn from source), [dev] for regular PRs (uses updated PyPI ttnn package)
    • Direct URL dependencies force pip to replace local ttnn, ensuring new version is tested
  • update-ttnn-wheel.yaml: Updated to modify pyproject.toml [pypi] section, maintains auto-approve/merge gated on passing checks

7. Documentation and Error Messages

  • BuildFlow.md: Installation Quick Reference, dev vs PyPI workflows, CI behavior explanation
  • README.md: Updated installation instructions to show [pypi] extra is required
  • torch_ttnn/init.py: Fixed error message to show correct pip install torch-ttnn[pypi]
  • pyproject.toml: Added hint in description about [pypi] extra

Modernizes the C++ extension build system to follow Python packaging
standards and enables PyPI distribution. The extension now works
without manual environment variable setup.

Key improvements:

* Configure RPATH to use libraries from dependency packages
  - BUILD_RPATH includes absolute paths for build-time linking
  - INSTALL_RPATH: $ORIGIN:../torch/lib:../../ttnn/build/lib:../../ttnn
  - Extension uses TT-Metal libraries from ttnn package (no duplication)
  - Wheel size reduced to 768 KB (was 21 MB, 96% reduction)
  - Eliminates LD_LIBRARY_PATH requirement

* Move ttnn to optional [pypi] dependency group
  - Prevents pip from replacing locally built ttnn during development
  - Follows Python packaging standards (PEP 621 optional dependencies)
  - Dev builds: pip install -e .[dev] (uses local ttnn)
  - PyPI users: pip install torch-ttnn[pypi] (downloads ttnn)

* Force Clang-17 compiler for consistency
  - Set CMAKE_C_COMPILER and CMAKE_CXX_COMPILER in pyproject.toml
  - Matches tt-metal build compiler
  - Remove hardcoded ABI flags (now auto-detected from PyTorch)

* Fix tt-metal v0.64 compatibility
  - Update header path: tt-metalium/assert.hpp → tt_stl/assert.hpp
  - Add __FILE_NAME__ fallback macro for GCC 11 compatibility
  - Register extension as torch.ttnn module for PyTorch backend

* Update CI workflows for smart ttnn dependency testing
  - Remove LD_LIBRARY_PATH exports (RPATH handles this)
  - Add pyproject.toml to trigger paths in run-cpp-native-tests.yaml
  - Smart ttnn handling: detect ttnn update PRs by commit message,
    use [pypi,dev] to test new version, [dev] for regular builds
  - Direct URL dependencies force pip to replace local ttnn, ensuring
    automated updates are properly validated before auto-merge
  - Update ttnn-wheel-update workflow to modify pyproject.toml [pypi]
    section with direct URL format (was requirements.txt)

* Add comprehensive documentation and clear error messages
  - Installation quick reference for dev and PyPI users
  - Development vs PyPI distribution workflows
  - Dependency strategy and troubleshooting guide
  - CI behavior explanation: how ttnn updates are tested
  - Update README.md with [pypi] installation instructions
  - Fix torch_ttnn/__init__.py error message to show correct [pypi] extra
  - Add hint in pyproject.toml description

Tests pass without environment variables. Extension is self-contained
via RPATH and ready for PyPI distribution.
Address review comment to prefer factory functions over constructors.

- Replace manual HostBuffer + Tensor constructor pattern with
  Tensor::from_borrowed_data() factory function
- Applied to both BFLOAT16 and UINT32 data type cases
- Eliminates intermediate HostBuffer variable
- Cleaner API that directly passes Span and MemoryPin to factory

Benefits:
- Follows tt-metal best practices for tensor construction
- More idiomatic API usage
- Simplified code with fewer intermediate steps

Addresses: @aliaksei-sala comment on copy.cpp
…rovements

PyTorch 2.7.1 resolved BFloat16 discrete uniform distribution limitations,
allowing direct use of torch.randint() with all dtypes including bfloat16.
Issues fixed:
- PR#1243 Comment #4: .gitmodules Version Compatibility
- PR#1243 Comment #5: TT_METAL_REF override removal

Changes:
1. Updated .gitmodules branch from v0.58.0-rc25 to v0.62.0-dev20250916
   - Matches current ttnn wheel version (0.62.0.dev20250916)
   - Added documentation explaining automated update process

2. Fixed update-ttnn-wheel.yaml sed pattern bug
   - Changed s/.dev/-/ to s/\.dev/-dev/
   - Unescaped dot was matching any character, causing incorrect tag conversion
   - Example: 0.62.0.dev20250916 now correctly converts to v0.62.0-dev20250916

3. Removed TT_METAL_REF override from run-cpp-native-tests.yaml
   - Removed TT_METAL_REF: main env variable
   - Removed forced git checkout to specific branch
   - CI now respects .gitmodules branch configuration
   - Ensures automated ttnn update workflow changes are actually used

This allows the automated dependency update workflow to function correctly:
when a new ttnn wheel is released, the workflow will update both
pyproject.toml and .gitmodules, and CI will test with the matching versions.
Issue fixed:
- PR#1243 Comment #1: README.md installation instructions

Changes:
- Rewrote installation section with direct copy-paste commands
- Removed environment variables (TT_METAL_HOME)
- Added prerequisite installation step (pip install --upgrade pip scikit-build-core cmake ninja)
- Added numbered steps for clarity
- Added link to BuildFlow.md for detailed documentation
- Made instructions beginner-friendly and copy-paste ready
…ents-dev.txt

Issue fixed:
- PR#1243 Comment #2: requirements-dev.txt Removal - CI/CD Impact

Changes:
Updated 3 action files to use pyproject.toml dependency specification:

1. build_cpp_extension_artifacts/action.yaml
   - Changed: pip install -r requirements-dev.txt → pip install -e .[dev]
   - Removed: redundant pip downgrade and numpy/setuptools installation

2. common_wheel_install/action.yaml
   - Changed cache dependency path from requirements files to pyproject.toml
   - Updated: pip install dist/torch_ttnn-*.whl + requirements-dev.txt
     → pip install dist/torch_ttnn-*.whl[dev]
   - Simplified installation logic

3. common_repo_setup/action.yaml
   - Changed: pip install -r requirements-dev.txt → pip install -e .[dev]
   - Enabled pip cache with pyproject.toml as cache-dependency-path
   - Removed commented requirements-dev.txt reference

All actions now follow modern Python packaging standards (PEP 517/621).
Changed: <tt_stl/assert.hpp> → <tt-metalium/assert.hpp>

The header path changed between tt-metal versions. The v0.62.0-dev20250916
uses tt-metalium/assert.hpp instead of tt_stl/assert.hpp.
Issue fixed:
- PR#1243 Comment #3: TT_METAL_HOME Deprecation

Changes:
- Added auto-detection of TT_METAL_HOME from submodule path when env var not set
- Updated CMakeLists.txt to fallback to third-party/tt-metal if TT_METAL_HOME unset
- Updated documentation to mark TT_METAL_HOME as optional (was: REQUIRED)
- Provides clear error message if neither env var nor submodule are available

Benefits:
- Users don't need to manually set TT_METAL_HOME for standard builds
- Eliminates conflicts when switching between TT projects (tt-train, etc.)
- Still supports TT_METAL_HOME override for advanced use cases
- CI workflows can keep using it explicitly for clarity, but it's not required

The CMake logic now:
1. If TT_METAL_HOME env var is set → use it
2. Else → auto-detect from third-party/tt-metal submodule
3. If neither → clear error message

This addresses aliaksei-sala's concern about TT_METAL_HOME being error-prone
when switching between projects.
Issue fixed:
- PR#1243 Comment: Fresh venv without C++ extension support

Changes:
- Added SKIP_CPP_EXTENSION environment variable to CMakeLists.txt
- When set to 1, CMake skips C++ extension build entirely
- Allows pip install -e .[pypi,dev] in fresh venv without tt-metal/toolchain
- Provides clear status message when skipping

Usage:
  export SKIP_CPP_EXTENSION=1
  pip install -e .[pypi,dev]

Use cases:
- Installing just Python dependencies for development
- Testing Python code without C++ compilation
- CI jobs that don't need native integration
- Quick setup without full toolchain

This restores the previous capability of installing in pure Python mode
that was available with requirements-dev.txt.
Added documentation for pure Python installation mode in README.md.

Users can now skip C++ extension build by setting SKIP_CPP_EXTENSION=1:
  export SKIP_CPP_EXTENSION=1
  pip install -e .[pypi,dev]

This is useful for:
- Installing Python dependencies only
- Testing Python code without C++ toolchain
- Quick setup without full build

Related to PR#1243 comment about supporting installation without C++ extension support.
Added verbose output to make C++ compilation logs visible:
- cmake.verbose = true (shows CMake configuration details)
- logging.level = "INFO" (shows build progress)

Benefits:
- C++ compilation output visible on both CI and local builds
- Better debugging when build issues occur
- Clear visibility of SKIP_CPP_EXTENSION when used
- CMake configuration details always shown

This makes the build process more transparent and easier to debug.
BFloat16's 7-bit mantissa limits precise integer representation to the
range [-256, 256] for discrete uniform distributions. Using
torch.randint(-1000, 1000, dtype=bfloat16) exceeds this range and
triggers PyTorch warnings that will become hard errors in future releases:

  "Due to precision limitations c10::BFloat16 can support discrete uniform
   distribution only within this range. This warning will become an error
   in version 1.7 release"
   (Triggered at pytorch/aten/src/ATen/native/DistributionTemplates.h:111-112)

Root Cause:
- BFloat16 format: [sign: 1 bit][exponent: 8 bits][mantissa: 7 bits]
- With only 7 mantissa bits, BFloat16 can represent ~128 distinct integers
- Values outside [-256, 256] cannot all be represented exactly, violating
  the "uniform distribution" property

Solution:
Apply dtype-conditional tensor creation (matching pattern already used in
test_cpp_extension, lines 27-36):
- BFloat16: Use .uniform_() with safe range (-256, 256)
- Int types: Use torch.randint() with full range [-1000, 1000)

This follows tt-metal best practices:
- tt-metal/tests/ttnn/unit_tests/operations/eltwise/test_binary_ng_typecast.py
  uses torch.randint(low=-50, high=50, dtype=torch.bfloat16)
- tt-metal/tests/ttnn/unit_tests/tensor/test_tensor_ranks.py
  uses torch.randint(low=0, high=100).to(torch.bfloat16)
- tt-metal uses torch.randint(-1000, 1000) ONLY with torch.int32, never BFloat16

History:
- Bug introduced: Oct 2024 (commit 1099fde) when test was first created
- First fix: Nov 4, 2025 (commit 361fea6) - changed to [-255, 256)
- Regression: Nov 13, 2025 (commit 4d73e5e) - incorrectly reverted based on
  misunderstanding that "PyTorch 2.7.1 resolved BFloat16 limitations"
  (the limitation is intrinsic to BFloat16's data format, not a PyTorch bug)

Pre-existing issue in main branch since PyTorch 2.7.1 upgrade (Nov 9, 2025).

Fixes warnings in CI:
https://github.com/tenstorrent/pytorch2.0_ttnn/actions/runs/19375082596/job/55440386669
Addresses jmalone-tt's question about secrets.GITHUB_TOKEN compatibility
with tt-metal Docker container pulls.

Added container credentials:
- username: github.actor
- password: secrets.GITHUB_TOKEN (GitHub's default token)

This uses the standard GitHub-provided token, no custom tokens needed.
Container pulls from ghcr.io/tenstorrent/tt-metal now authenticate properly.

Fixes: #1243 (comment)
… variable

Revised the installation instructions in README.md and BuildFlow.md to clarify the process for building TT-Metal as a git submodule. Emphasized that the build system automatically detects TT-Metal and actively ignores the TT_METAL_HOME environment variable to prevent conflicts. Updated related documentation to reflect these changes and ensure users follow the correct setup steps for development and installation.
- Simplified the submodule checkout process by removing redundant commands and ensuring a clean state before fetching.
- Implemented a super-clean checkout strategy to prevent stale reference errors when tt-metal is force-pushed.
- Updated environment variable handling and installation scripts for clarity and efficiency.
- Enhanced logging and error tolerance during submodule synchronization to improve CI robustness.
- Added workaround for broken auto-detection in tt-metal source builds.
…mpatibility testing

- Implemented a temporary conditional include for the assert header to ensure compatibility with both current and future versions of tt-metal.
- This change allows for smoother transitions as the tt-metal version stabilizes.
- build_cpp_extension.sh, run_cpp_extension_tests.sh are created with Release, Debug modes
…t when repo is created, perform a clean checkout via GitHub actions
… organization

- Grouped core runtime dependencies and added comments for better understanding.
- Moved data analysis and visualization libraries to a dedicated section in the dev dependencies.
- Ensured all dependencies are clearly categorized for easier maintenance.
Add comprehensive wheel testing to all CI workflows:
- Build wheel from submodule after tests pass
- Uninstall torch-ttnn AND ttnn to simulate clean environment
- Install wheel with [pypi] extra (gets ttnn from PyPI like users)
- Re-run tests to verify wheel works for end users

Changes:
- run-cpp-native-tests.yaml: Add wheel test after tests (every PR)
- build-test-release-wheel.yaml: Fix pipeline, use build_cpp_extension.sh
- before_merge.yaml: Add build-wheel-check as merge gate

Every commit now verified to produce working wheel for PyPI users.
Add wheel verification to CI and fix packaging issues for PyPI distribution.

Fixes:
1. Wheel packaging (pyproject.toml):
   - Exclude submodule from wheel
   - Exclude build artifacts and scripts

2. Bundle libtracy.so (CMakeLists.txt):
   - Workaround for ttnn PyPI wheel bug (v0.62.0-dev20250916)
   - libtt_metal.so depends on libtracy but ttnn wheel doesn't include it
   - Bundle libtracy*.so* to avoid runtime errors
@shutovilyaep
Copy link
Author

Created a branch in "tenstorrent" repository with the same content as #1243
@kevinwuTT asked to run manually https://github.com/tenstorrent/pytorch2.0_ttnn/actions/workflows/run-tests.yaml

@shutovilyaep shutovilyaep force-pushed the fix/tt_metal_bump branch 3 times, most recently from 813bc69 to fb5aa74 Compare November 19, 2025 21:44
@shutovilyaep
Copy link
Author

shutovilyaep commented Nov 19, 2025

@shutovilyaep shutovilyaep force-pushed the fix/tt_metal_bump branch 7 times, most recently from cd3f9ad to cd134f4 Compare November 20, 2025 10:48
- installing git-lfs
- accurate checkout not to present tt-metal's submodule .github folders to GitHub containing not available actions for all runners
@shutovilyaep
Copy link
Author

shutovilyaep commented Nov 20, 2025

"The algorithm should be correct, and then be optimized"

Observations after some time of dealing with GitHub actions:

  • Correctness: [as done in this PR] for the sake of complete testing and checking, especially in cases when there are a lot of changes, it is a good idea to verify everything on any change of C++ code or pyproject.toml configuration/dependencies
    [C++ building, Python development build from sources, running tests against development build, wheel creation, running tests against installed-from-built-wheel solution]
  • Optimization: that long pipeline will lead to over-usage of TT hardware if enabled as CI job to run on every PR change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants