Memory Debugging Guide: TT-Metal Bank Manager #4689
bmalesevicTT
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
🐞 Background
While running a benchmark model, I ran into an L1 memory leakage between two loop iterations. By enabling Bank Manager logging in TT-Metal, dumping the logs, and comparing runs, I was able to locate the issue:
➡️ the output of the entire model was allocated in L1 but never deallocated.
Since this logging approach might help in similar situations, here’s a rough guide with snippets and notes.
🔧 Setup
To see TT-Metal Bank Manager logs from MLIR / Forge-FE, you may need to set the environment variable:
📋 Overview
The Bank Manager in TT-Metal provides the foundation for tracking memory allocation, deallocation, and usage patterns across different memory types (L1, L1_SMALL, DRAM). This guide shows you how to add comprehensive logging capabilities for effective memory debugging.
🎯 Available Memory Information
Basic Allocation Details
When implementing allocation logging, you can capture:
- Buffer size and type (L1, L1_SMALL, DRAM)
- Number of banks and bytes per bank
- Allocated address
- Sharding status (sharded vs interleaved)
Memory Statistics via Allocator
The underlying allocator provides access to:
- Total allocated bytes (stats.total_allocated_bytes)
- Total free bytes (stats.total_free_bytes)
- Largest free block (stats.largest_free_block_bytes)
- Memory block table (allocator_->get_memory_block_table())
- Block-level dumps (allocator_->dump_blocks())
Bank-Level Information
Through BankManager methods:
- Bank count (num_banks())
- Bank size (bank_size())
- Bank offsets (bank_offset(bank_id))
- Lowest occupied address per bank (lowest_occupied_address())
🔧 Implementing Memory Debugging
Step 1: Add Statistics Helper Method
Add a helper method to BankManager to format comprehensive statistics:
Step 2: Enhanced Allocation Logging
Add comprehensive logging to the allocate_buffer method:
Step 3: Deallocation Logging
Add logging to track memory cleanup:
Step 4: Bulk Cleanup Logging
Add logging for bulk operations:
Beta Was this translation helpful? Give feedback.
All reactions