This repo is a collection of all of the useful tools for enabling models to work on TT hardware. This includes:
-
Memory profiler
ttmem- useful for look at memory usage of the model. Signs that you need this - errors likeOut of Memory: Not enough space to allocate <nbytes> B DRAM buffer across <nbanks> banks -
Model analyzer
ttchop- analyzes PyTorch models to identify which modules/ops work on TT hardware. Generates interactive HTML report showing pass/fail status for each module. -
Claude skills and commands - We recommend you copy paste these in your
~/.claudeor<tt-xla-path>/.claudefor easier debugging of models.
This installs both ttmem and ttchop CLI tools and python packages
pip install git+https://github.com/vkovinicTT/tt-swiss.git
Before using tt-swiss, you need to configure TT-XLA for memory logging and op by op testing:
source venv/activate
cmake -G Ninja -B build -DCMAKE_BUILD_TYPE=Debug -DTT_RUNTIME_DEBUG=ON -DTTMLIR_ENABLE_BINDINGS_PYTHON=ON
cmake --build buildexport TTMLIR_RUNTIME_LOGGER_LEVEL=DEBUG
export TT_RUNTIME_MEMORY_LOG_LEVEL=operationttrt query --save-artifacts # --disable-eth-dispatch # add this for blackhole qbcd /path/to/tt-xla
source venv/activate
pip install git+https://github.com/vkovinicTT/tt-swiss.gitIf you want to modify tt-swiss and have changes reflected immediately:
cd /path/to/tt-swiss
pip install -e .Note: Editable installs require
setuptools>=64. If you get an error about a missingbuild_editablehook, upgrade setuptools first:pip install --upgrade setuptools pip
Note: Always activate the tt-xla environment first (
source venv/activate). This sets up the required paths for the model analyzer to find tt-xla's op-by-op test infrastructure.
No need to install anything. Just do
- Check if you have debug build of tt-xla:
grep CMAKE_BUILD_TYPE build/CMakeCache.txt. If not, rebuild in debug - Turn on these enviroment variables when running your python script:
TTMLIR_RUNTIME_LOGGER_LEVEL=DEBUG TT_RUNTIME_MEMORY_LOG_LEVEL=operation - Upload your logs to http://yyz2-forge-dash.local.tenstorrent.com:9000/
Example log files are included in example_logs/ so you can try ttmem without running a model first.
ttmemThe interactive CLI guides you through the process with prompts:
- Asks if you have a log file ready (shows prerequisites if not)
- Prompts for the log file path with autocomplete
- Parses the log and generates the HTML report
- Optionally starts an HTTP server for remote viewing
When working on a remote machine via VS Code Remote SSH, the HTTP server option allows you to view the report in your local browser. VS Code automatically forwards the port, so http://localhost:8000/report.html will work from your local machine.
# Default: run + parse + visualize (recommended)
tt-memory-profiler path/to/your_model.py
# Only capture logs (for later processing)
tt-memory-profiler --log path/to/your_model.py
# Parse existing log file
tt-memory-profiler --analyze logs/your_model_20260122_143957/your_model_profile.log
# Generate visualization from existing run
tt-memory-profiler --visualize logs/your_model_20260122_143957/
# Specify custom output directory
tt-memory-profiler --output-dir /path/to/output path/to/your_model.pyOutput is stored in ./logs/ relative to your current working directory (or --output-dir if specified):
./logs/<script_name>_YYYYMMDD_HHMMSS/
βββ <script_name>_memory.json # Memory stats per operation
βββ <script_name>_operations.json # Operation metadata per operation
βββ <script_name>_profile.log # Raw logs
βββ <script_name>_report.html # Interactive visualization
Option 1: Using ttmem (recommended for remote development)
- Run
ttmem, select "Yes" when asked to serve via HTTP - Open
http://localhost:PORT/report.htmlin your browser - VS Code Remote SSH automatically forwards the port
Option 2: Using VS Code Live Server
- Right-click on the HTML file and choose "Open with Live Server"
- Requires the Live Server extension in VS Code
If you are running inside a Docker container, the HTTP server binds to 0.0.0.0 but the port is not exposed by default. You need to forward port 8000 when starting your container:
docker run -p 8000:8000 <your-image>Or if the container is already running, you can use docker exec with a new container that shares the network, or restart with the port published. If ttmem picks a different port (e.g. 8001 if 8000 is busy), forward that port instead.
Once the port is forwarded, open http://localhost:8000/report.html in your host browser to view the report.
- Interactive HTML visualization with memory graphs, fragmentation analysis, peak operations
- Synchronized JSON outputs (nth element = same operation)
- Filtered data (excludes deallocate operations)
- Timestamped runs (never overwrites previous data)
Analyze PyTorch models to identify which modules/ops work on TT hardware.
ttchop \
--model-path path/to/model.py::load_model \
--inputs-path path/to/model.py::get_inputs- Extract Modules: Identifies all unique modules in the model
- Run Op-by-Op Analysis: Tests each module hierarchically on TT hardware
- Generate Report: Creates interactive HTML visualization showing pass/fail status
The tool requires two Python functions:
load_model()- Returns the PyTorch modelget_inputs()- Returns sample input tensors
# Basic usage
ttchop --model-path model.py::load_model --inputs-path model.py::get_inputs
# Specify output directory
ttchop --model-path model.py::load_model --inputs-path model.py::get_inputs --dir ./output<ModelClass>/
βββ unique_modules.json # Module analysis results with status
βββ analysis_report.html # Interactive tree visualization
βββ module_irs/ # IR files for each module
# model.py
import torch
import torch.nn as nn
def load_model():
# Just return the model on CPU - the tool handles device placement
return nn.Sequential(
nn.Conv2d(3, 64, 3),
nn.ReLU(),
nn.Linear(64, 10)
)
def get_inputs():
# Just return CPU tensors - the tool handles device placement
return torch.randn(1, 3, 224, 224)Note: Your functions should return CPU models/tensors. The tool automatically handles moving them to the TT device.
ttchop --model-path model.py::load_model --inputs-path model.py::get_inputs