v0.46.0
·
15990 commits
to main
since this release
📦 Uncategorized
- user-triggerable C++ post-commit suite
- PR: #6626
- #6406: add missing position_ids/attention_mask to bert demo
- PR: #6617
- #6282: Add AdamW
- PR: #6333
- #6315: Fix dprint tests for T3000
- PR: #6599
- FD2: prefetch stall, dispatch wait, linear read, delay and cleanup
- PR: #6620
- #6609: update wording in demo section of main README.md
- PR: #6639
- #6364: Autocomplete for pybinded types
- PR: #6440
- Asarje/ttnn rn50 b20
- PR: #6629
- FD2.0 Test - Fix l1 buffer not page-size aligned in after FD-on-eth changes to L1_UNRESERVED_BASE
- PR: #6646
- #6593: Add resharding to Llama2 model when possible.
- PR: #6595
- #6572: Fix
ttnn.repeat_interleaveexample in documentation- PR: #6574
- #5780: Re-enable 100K enqueue program stress test on grayskull
- PR: #6648
- Enable basic width sharding support in all-gather
- PR: #6642
- Alex/metal/remove cb wait markers
- PR: #6628
- #6657: Use sysmem manager cq size instead of recomputing it each time…
- PR: #6658
- #0: (MINOR) Add Grayskull purchase link and update version to 0.46.0
- PR: #6667
- #5063: add TopK API to metal
- PR: #6563
- #5480: FD2.0 Test - Fix test_prefetcher for dram paged read test (-t 3) on whb0
- PR: #6663
- Fix logit low pcc
- PR: #6538
- Backward op - Fixed ldexp, hardsigmoid and asin
- PR: #6542
- #6598: Fix softplus
- PR: #6675
- Add support for BFP4_B tensor serialization
- PR: #6545
- Eltwise mul for different batch size
- PR: #6587
- #6575: Split docs into separate Metalium and nn docs
- PR: #6666
- #0: Add two separate links for documentation (tt-metalium/ttnn) on README
- PR: #6697
- #6361: Update ttnn repeat to use correct shapes when formatting output
- PR: #6526
- #0: Sayonaraaaaaaa
- PR: #6702
- FD2.0 Test fix test_prefetcher add_paged_dram_data_to_worker_data dropping start_page
- PR: #6703
- #5785: Watcher ringbuffer implementation
- PR: #6652
- Add FD 2.0 WriteHost Command
- PR: #6614
- #0: Put back frequent api tests because I'm an idiot
- PR: #6698
- Optimize All Gather Interleaved Worker send/receive
- PR: #6706
- #0: changing all
#include common/*to#include tt_metal/common/*- PR: #6669
- #6676: Fix issues related to unary lte and gte
- PR: #6685
- #5817: Fix lerp
- PR: #6630
- #6589: Fix for relu_bw
- PR: #6631
- #6633: Backward test update
- PR: #6679
- #0: Skip logit, logiteps test
- PR: #6714
- #0: Testing CI fix
- PR: #6708
- #5480: Update test_prefetcher to pass added hugepage args to dispatch kernel
- PR: #6717
- Fix l1 acc, add whb0 optimized conv tests
- PR: #6668
- Alignment fix for eth core kernels
- PR: #6696
- Add data parallel (multi-chip) for Falcon7b (prefill/decode) model and corresponding tests
- PR: #6656
- CQ_DISPATCH_CMD_WRITE_PAGED support in test_dispatcher and passing tests
- PR: #6641
- #6647: disable failing ci cpp tests and reenable cpp pipeline on CI
- PR: #6704
- Backward test updates
- PR: #6692
- Ngrujic/check bugs
- PR: #6688
- Add Llama matmul perf tests to main
- PR: #6690
- TTLIB: removing working tests from broken
- PR: #6718
- #6443: Update backward asin and addcdiv logic
- PR: #6715
- #0: Fix output cb size calculation in reshard op for bfp8b
- PR: #6739
- #0: use smart ptrs in allocator
- PR: #6719
- Jvasilje docs 0322
- PR: #6745
- DRAM based device profiler with Tracy support
- PR: #6460
- #6553: Fix ttnn.reshape(..) handling for bfloat16, TILE_LAYOUT
- PR: #6746
- Add Llama2 demo to tt-metal docs
- PR: #6682
- Mistral-7B WH demo
- PR: #6501
- Revert "#0: Put back frequent api tests because I'm an idiot"
- PR: #6755
- FP32 support
- PR: #6747
- #0: Add back frequent api tests to run.sh
- PR: #6756
- Bteng/watcher ci3
- PR: #6530
- Remove cpuprof
- PR: #6758
- logo update
- PR: #6762
- #6184: sharded row major silu support.
- PR: #6643
- #6443: Update div_bw and backward ops test file
- PR: #6742
- #6705: Relax forcing of keyword argument in ttnn.open_device
- PR: #6707
- Forward op tests
- PR: #6730
- #6691: Allow blocking of inner dim within a core for shaded in0 for 2d and 1d systolic matmuls
- PR: #6640
- #6662: Width Sharding support for eltwise OP
- PR: #6671
- Stable diffusion python API level perf improvements
- PR: #6681
- Add get_compute_kernel_config_args function
- PR: #6768
- #0: Add fd-2/main triggers for pull_request and push for post-commit
- PR: #6709
- #5480: FD2 refactor for pre/dis patch variants
- PR: #6655
- #6654: Add perf tests for ttnn ResNet50
- PR: #6673
- #5480: Fix fd gtest unit test test_write_host
- PR: #6778
- #0: Set myself as setup.py owner
- PR: #6779
- #6780: Add mistral7b to demos list in getting started
- PR: #6781
- #4003: re-added TTNN_ENABLE_LOGGING as runtime flag
- PR: #6750
- #0: Fix semaphore address gen bug
- PR: #6233
- #6769: Disable program caching for failing Llama tests.
- PR: #6770
- #5480: Fix zero sized write transaction request that could occur in write_linear_host
- PR: #6784
- #6077: Fix unet pcc issues
- PR: #6660
- Remove DstSync from llk api templates
- PR: #6753
- FP32 Support
- PR: #6785
- #6680: Reverting move op change
- PR: #6811
- #6443: Update asinh and softsign backward
- PR: #6773
- Backward tests with updated test modules
- PR: #6765
- Ngrujic/check bugs 1
- PR: #6734
- #6654: Moving init for self.compute_kernel_config
- PR: #6782
- #6805: reproduce the bug with sharded split_query_key_value_and_split_heads
- PR: #6806
- #6832: Account for tile-padding in softmax for mistral 7B
- PR: #6833
- Enable support for uint32 format to be consumed by SFPU (issue #4624)
- PR: #6796
- #4252: fix clang build error since std::log2 only constexpr in gcc
- PR: #6835
- #4003: log, debug and add pre- and post- hooks only for top-level ttnn ops
- PR: #6841
- #6823: Fix core count to not include dispatch cores in op reprot
- PR: #6831
- #6197: Align pages for interleaved <-> sharded.
- PR: #6828
- METALIUM_GUIDE
- PR: #6846
- Bteng/watcher post commit
- PR: #6760
- #6443: update backward test file for relational ops and concat op
- PR: #6817
- Revert "Bteng/watcher post commit"
- PR: #6866
- #6443: Update backward ops
- PR: #6826
- Backward test updates
- PR: #6822
- #0: Add the dim 0 support repeat backward
- PR: #5596
- Update hard related test ops
- PR: #6816
- #6757: Remove set_profiler_location
- PR: #6824
- #6443: Update backward ops erfinv elu hypot cos sin
- PR: #6827
- #6861: Enable Watcher/dprint tests on T3000 CI
- PR: #6869
- Update Mistral perf regression for CI, until issue is resolved
- PR: #6883
- Mamba/perf v1
- PR: #6744
- #0: remove data movement ops related to silu in SD
- PR: #6798
- #4003: added proper fallback for getitem of ttnn.Tensor. Slice the tensor only on the tile boundary but set the shape based on whatever user provided
- PR: #6886
- #4003: added proper fallbacks for every op that falls back to torch
- PR: #6888
- #6731: add fix to LN width sharding
- PR: #6891
- #5797: add back sweep test for ln
- PR: #6893
- Integrate GroupNorm V2 to SD model
- PR: #6862
- METALIUM_GUIDE.md updates
- PR: #6863
- [Falcon7b] Fix bugs with inference throughput measurements in demo
- PR: #6884
- #0: shallow unet add perf_mode
- PR: #6904
- #6154: 2d matmul in0 height, in1 width sharding
- PR: #6821
- #5249: Various Falcon40b test and demo cleanup
- PR: #6764
- #0: fix incremental build
- PR: #6914
- #0: remove upsample spill to DRAM
- PR: #6905
- [Llama2 Prefill] Model Functionality completed
- PR: #6800
- Watcher alignment checking for PCIe/DRAM <-> L1
- PR: #6901
- #6920: fixed the error in whisper
- PR: #6921
- Update METALIUM_GUIDE.md
- PR: #6902
- #6644: save l1 buffers to data base
- PR: #6856
- Update usage.rst
- PR: #6929
- #6804: fix ttnn falcon7b demo regression + add to CI regressions
- PR: #6924
- #6285: Add backward support for floor round and div_no_nan
- PR: #6290
- [skip ci] Update INSTALLING.md
- PR: #6936
- #6873: Add more test combinations to tt_lib sweeps add, add_unary, su…
- PR: #6887
- Ngrujic/check bugs 3
- PR: #6951
- #6882: Updated Mistral-7b perf estimate
- PR: #6892
- #6850: Update install links in Sphinx docs to point directly to INSTALLING.md
- PR: #6953
- #6619: Fix per op profiler sum
- PR: #6955
- #6644: sync before calling print l1 buffers
- PR: #6958
- Barsic/ttlib ops check
- PR: #6772
- Barsic/ttlib params fix
- PR: #6944
- #6962: Move cd tt-metal earlier in the command list of INSTALLING.md
- PR: #6966
- #6819: Add support for CreateKernel absolute file paths
- PR: #6922
- #6356: Remove half-half grid logic for bmms
- PR: #6968
- #4003: added a flag to disable ttnn fallbacks. Don't throw an error w…
- PR: #6961
- #0: Correct FW versions, tt-smi versions, and add note about tt-topology
- PR: #6971
- #0: Capitalize tt to TT consistently for marketing
- PR: #6973
- #0: Add myself as CODEOWNER for INSTALLING.md
- PR: #6974
- #6644: ttnn visualizer
- PR: #6935
- #6847: Allow disabling individual watcher features
- PR: #6855
- #6889: Support printing/padding/tilizing multi-device tensors
- PR: #6976
- #4003: removed ttnn.print_l1_buffers and consolidated all ttnn flags into a CONFIG class
- PR: #6980
- #6217: tt_lib async mode support (single chipp tensors supported)
- PR: #6700
- Reshard With Ranges
- PR: #6919
- #4003: updated buffer report to show the input/output tensors, buffer report of the previous operation and the buttons to go to the reports of previous/next operations. Load ttnn.CONFIG from a json file and override it using a single environment variable
- PR: #6994
- #4003: disable all tests in test_reports
- PR: #6997
- New TTNN sweeps
- PR: #6632
- #0: Put sfpi/ CODEOWNERS directive back on separate line because I'm an idiot and broke it
- PR: #7002
- #6957: Upload artifacts regardless of the device perf results
- PR: #6975
- #5592: Optimize Falcon 7b lm head matmul
- PR: #6956
- #4003: set delete_reports_on_start to false in the visualizer
- PR: #7005
- #6969: Split watcher noc alignment checks for reads vs writes
- PR: #6979
- #7012: Add support for sharding in Mamba model
- PR: #7011
- #6217: Async Mode Changes
- PR: #7010
- #6886: ttnn slicing bug for padded input
- PR: #6999
- #7023: Use
bfloat8weights in Mamba block MLPs- PR: #7024
- #6937: Silu fix for multiple calls. Bug fix. Some name changes.
- PR: #7022
- #6306: Enable N150,N300 ttnn unit tests in CI Regressions; disable failing ones
- PR: #7016
- Fix minor grammatical errors in METALIUM-GUIDE.md
- PR: #7027
- #4003: ttnn visualizer
- PR: #7025
- #4003: re-enabled test_reports
- PR: #7034
- Sharded attention in stable diffusion.
- PR: #7013
- #7041: GS watcher error
- PR: #7042
- #7041: GS watcher error
- PR: #7043
- #0: update path to watcher.log
- PR: #7046
- Ngrujic/check bugs
- PR: #7001
- build C++ tests in release mode
- PR: #7053
- #6443: Update backward ops
- PR: #6877
- #6443: Update backward ops
- PR: #6946
- #6443: Update backward ops
- PR: #6912
- [skip ci] Update CODEOWNERS
- PR: #7029
- frequent pipeline updates
- PR: #7055
- Clean up Mamba unit tests and configs
- PR: #7062
- #6873: TTLIB modified sweeps GS and WH
- PR: #7004
- #6443: Update Unary Div backward
- PR: #6878
- More aggressive deallocation, fewer spills to DRAM.
- PR: #7076
- #4003: use reports_path instead of tmp_path
- PR: #7074
- #6838: Add tracy timeout for op reprots
- PR: #6852
- #6873: Add more sweep combinations for tt_lib bcast and sum operations
- PR: #7060
- #0: Add link to programming guide (METALIUM_GUIDE.md) instead of the bad paragraph we had before
- PR: #7093
- #5489: re-enable profiler regression on N300
- PR: #7079
- TTNN sweep tests - zeros, zeros like, nexafter, empty, attention softmax inlace
- PR: #6551