v0.43.0
·
17875 commits
to main
since this release
📦 Uncategorized
- #4668: Yolov5 GS Demo Benchmarking
- PR: #4776
- #0: uplift umd; pick up fix for n150 cluster
- PR: #4881
- #3178: Fix for wormhole b0 reduce w
- PR: #4882
- #4489: fixed bugs in the program caching of eltwise unary and eltwise binary. Updated bloom to use L1 memory config
- PR: #4842
- #4821: Add cumsum op to tt_dnn
- PR: #4824
- Dispatch/Bandwidth tests
- PR: #4783
- #4003: fixed test_eltwise_unary_op
- PR: #4901
- Argmax and Argmin Support
- PR: #4779
- #3212: softmax works after reduce fix of max, sum, etc. for WHB0
- PR: #4907
- #0: (MINOR) Update version to v0.43.0
- PR: #4910
- #4761: Add call to ttl repeat_interleave and also provide script for …
- PR: #4891
- #4003: fixed the bug with printing the compile-time attributes
- PR: #4918
- Support moreh arange
- PR: #4921
- Remove skip_for_wormhole_b0 for test_moreh_softmax and test_moreh_softmin
- PR: #4924
- #4541: remove unpad start at 0 limitation
- PR: #4566
- Agrebenisan/restart cmd fix
- PR: #4922
- Support moreh SGD
- PR: #4929
- #0: Use fetch-depth: 0 instead of fetch-tags because otherwise git complains of commit SHA/tag conflict
- PR: #4934
- #0: Add code owners for primary operations api binding
- PR: #4936
- #4547: Add 2x2 window unit tests to ttnn maxpool
- PR: #4909
- #4003: restructure ttnn
- PR: #4902
- #4889: Change TileSlice printing to only print tile data
- PR: #4912
- #4836: Add support for blocking conv activation in 2d systolic conv v…
- PR: #4837
- #0: Update unicast cycles lower bound
- PR: #4937
- #4904: Add support for 1d width sharded LN
- PR: #4905
- #4941: Convert command header to struct for easier maintainability
- PR: #4942
- #4823: enable sum_0 operation fails with low PCC [Wormhole,Grayskull]
- PR: #4955
- Fix sharded buffers for one core in fast dispatch
- PR: #4944
- #4906: global reduce sum, mean, max, min operations added
- PR: #4908
- Revert "#4823: enable sum_0 operation fails with low PCC [Wormhole,GS]
- PR: #4963
- #0: Change codeowners from specific op binding files/dirs to all tt_lib bindings
- PR: #4938
- #4003: split unary sweep into per op sweeps
- PR: #4952
- #4232: added support for converting from numpy arrays to ttnn tensors. Borrow data whenever possible when converting from numpy/torch
- PR: #4893
- Uplift AttnMatmul to support GroupAttnMatmul
- PR: #4913
- Add watcher-specific CI tests
- PR: #4919
- #4916: Add avg pool to ttnn
- PR: #4917
- #0: Add a lock on DPRINT server raise/wait structures
- PR: #4920
- #4967: added validation for input tensors
- PR: #4977
- #4971: update documentation by a new doc hierarchy;
- PR: #4983
- #0: Leftover decorate_operation replacement for avg pool
- PR: #4987
- #4899: fix the permute to operate on the intended shape
- PR: #4951
- #4730: Add tt_lib.tensor.concat
- PR: #4990
- Aliu/enqueue eth
- PR: #4845
- #4003: Updating functional performance from changes in ttnn.permute w…
- PR: #4991
- #4984: Remove dead OP_INFO and graph interpreter
- PR: #4985
- #4878: initial commit to add Conv parameters to ttnn.preprocess_model_parameters
- PR: #4966
- Update Program Hashes for Ops using Mem config
- PR: #4953
- #4984: Remove unused dprint functionality
- PR: #5000
- Aliu/ci fix
- PR: #5001
- #4215: Add Argmax and Argmin Fallback
- PR: #4928
- #4999: added input tensor validation to add, sub and mul operations.
- PR: #5004
- Support for softmax rm major sharding and causal mask sharding
- PR: #5006
- #0: provide API for where() to support scalar True/False branches
- PR: #4988
- #5003: Update expected compile and runtimes for perf regression on VM
- PR: #5008
- Revert "Update Program Hashes for Ops using Mem config"
- PR: #5021
- #4931: add apis to get ethernet by socket ids
- PR: #4932
- #4786: Add upsample_nearest2d functional stable diffusion
- PR: #4870
- #4986: deploy docs only to main and enable devs to run docs build on different pages
- PR: #5020
- Deploy ttnn sweeps results to docs
- PR: #5019
- #4958: Move all python api unit tests to frequent in order to reduce SD pipeline length
- PR: #4981
- #4999: Added input validation for ttnn.matmul and ttnn.linear. Add unit test for linear operation. Update input tensor validation in binary.py. Fix compute_output_shapes in bmm_op.cpp
- PR: #5010
- #4620: Fix+improve bw test
- PR: #5029
- #4852: Add unit tests for functional bloom
- PR: #5013
- #5032: scalar argument versions for relops
- PR: #5018
- #0: Add some README recommendations from MCW to clarify issue about access to internal workflows VM installation page
- PR: #5034
- #4790: Implement GEGLU using ttnn for stable_diffusion model
- PR: #4869
- #4999: Adding validation checks
- PR: #5011
- #4791: Implement Feedforward sub-module using ttnn for stable_diffusi…
- PR: #4868
- Npetrovic/bw ops sweeps
- PR: #5009
- #4999: update documentation of ttnn operations to include the validation schema
- PR: #5031
- #0: Remove model run from frequent_api_pipeline per @tt-rkim
- PR: #5043
- Minor dprint/watcher cleanup
- PR: #5030
- #4858: Add support for typecast
- PR: #4840
- #0: Disable dprint tests because they're flaky at the moment
- PR: #5026
- #4946: Add trig ops to ttnn
- PR: #5041
- Nshanker/convs split by 2
- PR: #5042
- #4946: Add inv trig ops to ttnn
- PR: #5038
- #4003: fixed circular dependency in decorators
- PR: #5052
- #5054: Removed asserts from conv op host code that are not required. …
- PR: #5055
- #4003: fixed circular dependencies in ttnn
- PR: #5061
- #4852: Fix CI pipeline by re-enabling functional bloom for causal LM
- PR: #5060
- GroupNorm Sharded. support
- PR: #4945
- #4972: is_sharded and memory_config is free from tensor
- PR: #4980
- #0: eltwise ops/activate operator tracking for GS, and WHB0
- PR: #5074
- Aliu/fd tunneling pr
- PR: #4725
- #4642: Converted 14 old cpp tests to use gtest, with capabilities to switch btwn FD/SD when possible
- PR: #5050
- #4852: Add tests for functional ttnn bloom implementation.
- PR: #5078
- #4003: correctly convert all parameters of torch module to ttnn parameters
- PR: #5100
- #5082: Pow gradient calculation method is different with pytorch
- PR: #5106
- Argmax/Argmin support for channel, batch and all dim
- PR: #5040
- #4420: switch to shared_ptr
- PR: #5123
- #4420: return shared_future from taskflow async wrapper
- PR: #5121
- Minor DPrint fixes
- PR: #5108
- #0: Enable/disable clearing L1 from env var
- PR: #5107
- #4003: started moving ttnn operation to C++
- PR: #5111
- #4003: Add script to help with finding issues that we need approval for
- PR: #5129
- #5044: Adding support for optional output tensors
- PR: #5104
- #4003: Adding the open flag to show only open PRs
- PR: #5134
- #5048: Add CreateDevices and CloseDevices api to detail
- PR: #5118
- decouple ClearProgramCache from CommandQueue
- PR: #5124
- Conv fixes for padding input channels. Shallow conv fixes. Conv input/output autoformatting. Cleanup
- PR: #5109
- Asarje/mp unpack tilize fused
- PR: #5033
- Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr
- PR: #5125
- #5137: Cleanups for newer Linux distro / toolchains
- PR: #5114
- Revert "#5137: Cleanups for newer Linux distro / toolchains"
- PR: #5139
- Revert "Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr"
- PR: #5138
- #4793: Implement ResnetBlock2D using ttnn for stable_diffusion model
- PR: #5084
- #4788: Implement Downsample2D using ttnn for stable_diffusion model
- PR: #5090
- #4792: Implement CrossAttention sub-module using ttnn for stable_diff…
- PR: #4927
- #4747: Reduce amount of samples in bert sweeps
- PR: #5140
- #4789: Add upsample2d to functional_stable_diffusion model
- PR: #5080
- #0: Add fix for lamb optimizer
- PR: #5144
- #5057: Add relational ops support to TTNN
- PR: #5120
- skip eth test suite on GS
- PR: #5155
- #4003: updated ttnn.Tensor to be derived form ttl.tensor.Tensor
- PR: #5130
- Asarje/shwetank upsample
- PR: #5105
- #5082: power gradient is erroneous when exponent is in range (0-1)
- PR: #5158