Skip to content

Releases: tenstorrent/tt-kmd

ttkmd-2.7.0

09 Feb 20:23

Choose a tag to compare

Tenstorrent PCIe Driver Version 2.7.0 Release Notes

This release introduces automatic userspace mapping invalidation during device resets, blocking lock acquisition for multi-process coordination, and a workaround for a kernel quirk affecting Blackhole Galaxy PCIe link speeds.


New Features

Userspace Mapping Invalidation on Reset

The driver now invalidates all userspace BAR and TLB mappings before device reset. Subsequent access attempts fault rather than reaching device memory in an undefined state.

Important

Applications must close file descriptors and re-open devices after reset. Applications that previously accessed mappings after reset without error were relying on undefined behavior that happened to succeed in some configurations.

Blocking Lock Acquisition

A new lock acquisition mode TENSTORRENT_LOCK_CTL_ACQUIRE_BLOCKING has been added to the TENSTORRENT_IOCTL_LOCK_CTL interface.

  • Unlike ACQUIRE which returns immediately with success or failure, ACQUIRE_BLOCKING sleeps until the lock is available.
  • Compatible with the C++ Lockable interface (e.g., std::unique_lock).

ERISC Lock Index Constants

Conventional lock indices for ERISC cores have been defined:

  • TENSTORRENT_LOCK_INDEX_ETH00 through TENSTORRENT_LOCK_INDEX_ETH15

These provide a standardized mapping for applications coordinating access to Ethernet RISC resources.

Heartbeat and Thermal Trip Telemetry

Two new sysfs telemetry attributes have been added for firmware health monitoring:

  • tt_heartbeat: Firmware heartbeat counter that changes periodically when CMFW is alive. Available on both Blackhole and Wormhole devices.
  • tt_therm_trip_count: Count of ASIC shutdowns due to critical temperature since the last power cycle. Available on Blackhole only.

Bug Fixes

  • Blackhole Galaxy PCIe Link Speed: Linux kernels 6.5 through 6.12 contain a quirk that can force the PCIe link to Gen1 (2.5GT/s) during hot-plug enumeration on Blackhole Galaxy systems. The driver now detects this condition and retrains the link to full speed. The kernel behavior was changed in 6.13.
  • Secondary Bus Reset Failure: Fixed a race condition where Secondary Bus Reset (SBR) would fail due to the driver continuing to use a stale device handle.
  • Device Removal: Fixed page faults when ioctls or hwmon/sysfs reads race with device removal.
  • Close Timeout After RESET_PCIE_LINK: Fixed a timeout when closing a power-aware file descriptor (opened with O_APPEND) after RESET_PCIE_LINK.
  • Suspend/Resume: Fixed inverted return value handling in tenstorrent_resume().
  • PID Display Across Namespaces: The mappings (debugfs) and pids (procfs) files now display correct PIDs when read from a different PID namespace.
  • Unassigned PCI BAR Handling: The driver now fails gracefully when PCI BARs are unassigned.
  • Blackhole ARC Messaging: Early exit from send_arc_message() when NOC is hung.

Compatibility Notes

Reset and Mapping Invalidation

Applications must close and re-open device files after reset. Accessing mappings from before reset will result in SIGBUS.

ttkmd-2.7.0-rc1

06 Feb 20:24

Choose a tag to compare

ttkmd-2.7.0-rc1 Pre-release
Pre-release
ttkmd-2.7.0-rc1

ttkmd-2.6.0

06 Jan 15:27

Choose a tag to compare

Tenstorrent PCIe Driver Version 2.6.0 Release Notes

This release introduces driver-side power management for Blackhole devices which lowers the idle power of the chip. It also adds improved device identification via stable symlinks and stricter file descriptor invalidation after resets.

Important

Firmware Recommendation
Power management features require CMFW version 19.4.2 or later. Blackhole Galaxy systems running firmware versions 18.12.0 through 19.4.1 will malfunction with power management enabled; either update firmware or set power_policy=0. Non-Galaxy Blackhole systems and devices running firmware without power management (pre-18.12.0) will continue to work, but updating to 19.4.2 is strongly recommended for all users.


New Features

Power Management Policy

A new kernel module parameter power_policy (enabled by default) has been introduced to manage device power states automatically.

  • Automatic Low Power: Devices are initialized to a low-power state at probe time.
  • Dynamic Adjustment: The driver tracks power requirements from all active applications. When an application closes a file descriptor, its power requirements are removed, and the device power state is updated to reflect the needs of remaining applications.
  • Idle Power Down: When the last client closes its file descriptor, the device automatically returns to a low-power state.

This feature can be controlled via the power_policy module parameter (default=on). Disabling it reverts to the legacy behavior where power states persist until explicitly changed or a reset occurs.

Power Management API

A new IOCTL TENSTORRENT_IOCTL_SET_POWER_STATE provides fine-grained control over device power domains. This API allows user-space applications to explicitly request enablement of specific features, such as:

  • Max AI Clock (TT_POWER_FLAG_MAX_AI_CLK)
  • MRISC PHY Wakeup (TT_POWER_FLAG_MRISC_PHY_WAKEUP)
  • Tensix Enable (TT_POWER_FLAG_TENSIX_ENABLE)
  • L2CPU Enable (TT_POWER_FLAG_L2CPU_ENABLE)

The driver aggregates requests from all active clients, ensuring that a feature remains enabled as long as at least one client requests it.

Action for Application Developers

No action is required for existing applications; they will continue to function correctly in a high-performance compatibility mode. They will also benefit from idle power savings when the device is not in use (all file descriptors closed).

To further optimize power consumption during active use, applications can:

  1. Open the device with the O_APPEND flag. This signals to the driver that the application is power-aware and opts out of the default high-power compatibility mode.
  2. Use TENSTORRENT_IOCTL_SET_POWER_STATE to explicitly request only the power domains required for their workload.

Device Invalidation After Reset

The driver now enforces stronger device invalidation semantics after a reset operation.

  • File Descriptor Invalidation: Any file descriptor opened before a reset will now permanently fail with -ENODEV for subsequent operations, even if the reset was successful and performed in-place (without hotplug). An exception is the TENSTORRENT_RESET_DEVICE_POST_RESET operation, which is allowed to complete the reset sequence.
  • Unified Reset Behavior: This unifies behavior with hotplug resets, simplifying application error handling logic. Applications must close and re-open the device after any reset.

Stable Device Identifiers

The driver now generates stable symlinks for devices based on their ASIC ID, accessible in /dev/tenstorrent/by-id/. This allows applications and system configurations to reliably reference specific devices across reboots and PCIe enumeration changes, independent of the dynamic minor number assignment (e.g., /dev/tenstorrent/0).

Tip

If by-id symlinks are not created, ensure the device firmware is up to date.

Example:

/dev/tenstorrent/by-id/wormhole-<asic_id> -> ../0
/dev/tenstorrent/by-id/blackhole-<asic_id> -> ../1

Improvements

Blackhole Support

  • Small BAR4 Configurations: Added support for Blackhole devices configured with smaller BAR4 apertures, ensuring compatibility on resource-constrained systems.

Internal Modernization

  • ABI Compatibility: Added explicit padding to IOCTL structures to ensure strict ABI consistency across different compiler versions and architectures.

Tools

  • Power Utility: A simple utility (tools/power.c) is included to manually set device power states, useful for testing and as a reference for the new API.

Compatibility Notes

Device Enumeration

The introduction of the by-id directory within /dev/tenstorrent/ may impact applications that naively iterate through this directory expecting only character device nodes (e.g., 0, 1, 2). Applications scanning this directory should be updated to ignore subdirectories or non-device entries.

Reset Behavior

Applications performing device resets must handle the ENODEV error on existing file descriptors and re-open the device.

ttkmd-2.6.0-rc1

04 Dec 00:50

Choose a tag to compare

ttkmd-2.6.0-rc1 Pre-release
Pre-release
ttkmd-2.6.0-rc1

ttkmd-2.5.0

03 Nov 20:52

Choose a tag to compare

Tenstorrent PCIe Driver Version 2.5.0 Release Notes

This release adds procfs/debugfs support for resource tracking and process visibility, RPM packaging for Red Hat-based distributions, and bug fixes.


New Features

Procfs PID Listing

Added /proc/driver/tenstorrent/<device>/pids file that lists process IDs (PIDs) of all processes with open file descriptors for each device. Useful for debugging and system administration to identify which processes are interacting with specific hardware.

Debugfs Resource Tracking

Added /sys/kernel/debug/tenstorrent/<device>/mappings debugfs file that displays:

  • Open file descriptors per device with process information (PID, command name)
  • Pinned user pages with IOVA/physical addresses
  • Driver-allocated DMA buffers and their mappings
  • User-space BAR mappings with offset, size, and caching type (UC/WC)
  • Allocated inbound TLB windows with ID, location, size, and mmap reference counts

RPM Packaging

Added RPM packages for Fedora 41/42 and AlmaLinux 10 (EL10/RHEL10 compatible).

Blackhole

Zero strided TLB registers during configuration to prevent stale values in TLB windows 0-31 that support non-rectangular multicast patterns.


Bug Fixes

  • Fixed DMA buffer allocation return code for TENSTORRENT_ALLOCATE_DMA_BUF_NOC_DMA. Previously returned positive iATU region index instead of 0 on success.
  • Fixed a NULL pointer dereference during telemetry initialization.

Improvements and Other Changes

Installation and Packaging

  • Updated README to prioritize package installation over source builds
  • make dkms-remove now handles all installed versions across kernels
  • Added --force flag support for DKMS version downgrades

ttkmd-2.5.0-rc1

30 Oct 22:11

Choose a tag to compare

ttkmd-2.5.0-rc1 Pre-release
Pre-release
ttkmd-2.5.0-rc1

ttkmd-2.4.1

19 Sep 18:07

Choose a tag to compare

Tenstorrent PCIe Driver Version 2.4.1 Release Notes

This is a patch release to improve the speed and reliability of resetting multi-chip systems. To reduce delays that could impact inter-chip communication during initialization, the Secondary Bus Reset (SBR) has been removed from the kernel driver's ASIC reset code paths. Tooling such as tt-smi should perform SBR when needed using the TENSTORRENT_RESET_DEVICE_RESET_PCIE_LINK flag.

ttkmd-2.4.0

29 Aug 00:01

Choose a tag to compare

Tenstorrent PCIe Driver Version 2.4.0 Release Notes

Warning

Device memory mappings created by a user-space application are not invalidated by the reset process. Accessing a stale mapping after a device has been reset results in undefined behavior and may lead to application or system instability.

Important

As part of the hot-plug safety improvements, any ioctl call on a file descriptor for a device that has been removed will now return -ENODEV. Applications may need to update their error-handling logic to account for this behavior.

This release introduces a unified device reset framework, adds new reliability features, and improves driver stability and usability.


New Features

Unified Reset Framework

This release introduces a unified, two-phase ioctl interface for resetting both Wormhole and Blackhole devices.

  • Reset Verification: A new hardware "reset marker" mechanism allows the driver and userspace to reliably verify that a reset operation has completed.
  • External Reset Coordination: The driver can now coordinate with resets initiated by an external Board Management Controller (BMC), allowing for more robust system-level error recovery.
  • Reference Utility: A reference utility, tools/reset, is provided to demonstrate how to use the new ioctl interface to perform resets from the command line. Users should continue to use the officially supported tt-smi utility for all device reset operations.

Improvements and Other Changes

Reset Reliability

The reset logic for both Wormhole and Blackhole devices includes new mechanisms to recover from hung states.

  • Wormhole: If a chip is unresponsive, the driver will now poll for the hardware's auto-reset watchdog to complete. This brings the device into a state where a subsequent driver-initiated reset can succeed.
  • Blackhole: The reset path issues a Secondary Bus Reset (SBR) to improve the success rate of recovering a deeply hung device.

Driver Stability and Telemetry

Core driver functions have been updated to prevent race conditions and improve behavior during device startup and shutdown.

  • Deferred Telemetry Initialization: The driver now waits for device firmware to be ready before initializing sysfs and hwmon telemetry. This fixes a race condition where driver probing could fail on systems with slower firmware initialization.
  • Hot-Plug Safety: The driver now prevents hardware access from any process that holds an open file handle to a device that has been removed. Any ioctl call made on a file descriptor that was opened before the device was removed will now immediately return -ENODEV.

Memory Mapping and Usability

  • Fixed a bug that incorrectly rejected memory mapping requests larger than 4GiB for Blackhole devices. Mappings up to the hardware limit of 1TiB are now supported.
  • The TENSTORRENT_PIN_PAGES_NOC_TOP_DOWN flag for ioctl_pin_pages now works independently to create a device-visible mapping, improving usability.
  • The PCI bus address in kernel log messages is now printed in hexadecimal format to match the output of system tools like lspci.

ttkmd-2.3.0

28 Jul 17:28

Choose a tag to compare

Tenstorrent PCIe Driver Version 2.3.0 Release Notes

Note

The new tt_asic_id attribute will only appear if the device firmware provides the necessary telemetry data.

This is a minor feature release that introduces a sysfs attribute for reading the unique ASIC ID and improves the reliability of the telemetry interface.

New Features

ASIC ID Sysfs Attribute

  • A new sysfs attribute, tt_asic_id, has been added for Wormhole devices.
  • This read-only file exposes the unique 64-bit ASIC identifier, which is useful for device tracking, inventory, and diagnostics.

Improvements and Other Changes

Conditional Sysfs Telemetry Attributes

  • The driver's sysfs interface has been improved to be more reliable. Telemetry attribute files (e.g. tt_aiclk, tt_serial, tt_asic_id) are now created only if the device firmware actively reports data for them.
  • Previously, files were created for all potential attributes, and reading from an unsupported one would result in an error. Now, the presence of a sysfs file guarantees that its corresponding feature is supported and its data is readable.

ttkmd-2.2.0

25 Jul 17:41

Choose a tag to compare

Tenstorrent PCIe Driver Version 2.2.0 Release Notes

Important

Wormhole devices now require device firmware v18.4.0 or newer for sysfs telemetry to be functional.

Caution

Grayskull has been deprecated. The driver no longer supports Grayskull devices.

This release introduces a new feature for application cleanup, deprecates the legacy Grayskull hardware, and refactors the telemetry system. It also adds support for Debian packaging and Alpine Linux, alongside several bug fixes.

Breaking Changes

Wormhole Firmware Requirement for Sysfs Telemetry

  • Wormhole devices now require device firmware v18.4.0 or newer for sysfs telemetry attributes (e.g. tt_aiclk, tt_serial) to function. This is due to a refactoring that relies on a dynamic telemetry table not present in older firmware versions. hwmon sensor data is not affected by this change.

Grayskull Hardware Deprecation

  • Support for the Grayskull hardware generation has been removed from the driver. The driver will no longer probe or attach to Grayskull PCI devices, and all Grayskull-specific module parameters have been removed.

New Features

Automatic Cleanup on File Close

  • A new ioctl, TENSTORRENT_IOCTL_SET_NOC_CLEANUP, has been added.
  • This feature allows an application to register a write operation that the driver will automatically execute when the application's file descriptor is closed. This provides a mechanism for device-side cleanup even if the host application terminates unexpectedly.

Bug Fixes

  • Blackhole: Fixed a bug where disabling an outbound ATU region would incorrectly configure a 4KB mapping instead of disabling it.
  • Blackhole: Resolved a race condition where messages sent to the ARC firmware immediately after a reset could be lost. The driver now waits for a readiness signal from the firmware before sending messages.
  • Build: Fixed a build failure that occurred when compiling the driver on older Linux kernels (pre-v5.4).

Improvements and Other Changes

Platform & Build System

  • Alpine Linux Support: Added support for Alpine Linux via the akms system, including a new make akms target.
  • Debian Packaging: The driver now includes the necessary files to build .deb packages for Debian-based distributions like Ubuntu.
  • Makefile: Added new make dkms-remove and make akms-remove targets to simplify uninstallation.

Driver & API

  • Telemetry System: The sysfs telemetry system for both Wormhole and Blackhole has been refactored. For Wormhole, the driver now dynamically discovers telemetry data locations from a table provided by firmware, replacing the previous system of hardcoded offsets. This improves forward compatibility with firmware updates.
  • TLB Allocation: The TENSTORRENT_IOCTL_ALLOCATE_TLB ioctl now requires the requested size to exactly match a valid hardware window size. Requests for non-standard sizes will be rejected.