v0.43.0

github-actions released this 08 Feb 18:02

· 17875 commits to main since this release

4b97c17

📦 Uncategorized

#4668: Yolov5 GS Demo Benchmarking
- PR: #4776
#0: uplift umd; pick up fix for n150 cluster
- PR: #4881
#3178: Fix for wormhole b0 reduce w
- PR: #4882
#4489: fixed bugs in the program caching of eltwise unary and eltwise binary. Updated bloom to use L1 memory config
- PR: #4842
#4821: Add cumsum op to tt_dnn
- PR: #4824
Dispatch/Bandwidth tests
- PR: #4783
#4003: fixed test_eltwise_unary_op
- PR: #4901
Argmax and Argmin Support
- PR: #4779
#3212: softmax works after reduce fix of max, sum, etc. for WHB0
- PR: #4907
#0: (MINOR) Update version to v0.43.0
- PR: #4910
#4761: Add call to ttl repeat_interleave and also provide script for …
- PR: #4891
#4003: fixed the bug with printing the compile-time attributes
- PR: #4918
Support moreh arange
- PR: #4921
Remove skip_for_wormhole_b0 for test_moreh_softmax and test_moreh_softmin
- PR: #4924
#4541: remove unpad start at 0 limitation
- PR: #4566
Agrebenisan/restart cmd fix
- PR: #4922
Support moreh SGD
- PR: #4929
#0: Use fetch-depth: 0 instead of fetch-tags because otherwise git complains of commit SHA/tag conflict
- PR: #4934
#0: Add code owners for primary operations api binding
- PR: #4936
#4547: Add 2x2 window unit tests to ttnn maxpool
- PR: #4909
#4003: restructure ttnn
- PR: #4902
#4889: Change TileSlice printing to only print tile data
- PR: #4912
#4836: Add support for blocking conv activation in 2d systolic conv v…
- PR: #4837
#0: Update unicast cycles lower bound
- PR: #4937
#4904: Add support for 1d width sharded LN
- PR: #4905
#4941: Convert command header to struct for easier maintainability
- PR: #4942
#4823: enable sum_0 operation fails with low PCC [Wormhole,Grayskull]
- PR: #4955
Fix sharded buffers for one core in fast dispatch
- PR: #4944
#4906: global reduce sum, mean, max, min operations added
- PR: #4908
Revert "#4823: enable sum_0 operation fails with low PCC [Wormhole,GS]
- PR: #4963
#0: Change codeowners from specific op binding files/dirs to all tt_lib bindings
- PR: #4938
#4003: split unary sweep into per op sweeps
- PR: #4952
#4232: added support for converting from numpy arrays to ttnn tensors. Borrow data whenever possible when converting from numpy/torch
- PR: #4893
Uplift AttnMatmul to support GroupAttnMatmul
- PR: #4913
Add watcher-specific CI tests
- PR: #4919
#4916: Add avg pool to ttnn
- PR: #4917
#0: Add a lock on DPRINT server raise/wait structures
- PR: #4920
#4967: added validation for input tensors
- PR: #4977
#4971: update documentation by a new doc hierarchy;
- PR: #4983
#0: Leftover decorate_operation replacement for avg pool
- PR: #4987
#4899: fix the permute to operate on the intended shape
- PR: #4951
#4730: Add tt_lib.tensor.concat
- PR: #4990
Aliu/enqueue eth
- PR: #4845
#4003: Updating functional performance from changes in ttnn.permute w…
- PR: #4991
#4984: Remove dead OP_INFO and graph interpreter
- PR: #4985
#4878: initial commit to add Conv parameters to ttnn.preprocess_model_parameters
- PR: #4966
Update Program Hashes for Ops using Mem config
- PR: #4953
#4984: Remove unused dprint functionality
- PR: #5000
Aliu/ci fix
- PR: #5001
#4215: Add Argmax and Argmin Fallback
- PR: #4928
#4999: added input tensor validation to add, sub and mul operations.
- PR: #5004
Support for softmax rm major sharding and causal mask sharding
- PR: #5006
#0: provide API for where() to support scalar True/False branches
- PR: #4988
#5003: Update expected compile and runtimes for perf regression on VM
- PR: #5008
Revert "Update Program Hashes for Ops using Mem config"
- PR: #5021
#4931: add apis to get ethernet by socket ids
- PR: #4932
#4786: Add upsample_nearest2d functional stable diffusion
- PR: #4870
#4986: deploy docs only to main and enable devs to run docs build on different pages
- PR: #5020
Deploy ttnn sweeps results to docs
- PR: #5019
#4958: Move all python api unit tests to frequent in order to reduce SD pipeline length
- PR: #4981
#4999: Added input validation for ttnn.matmul and ttnn.linear. Add unit test for linear operation. Update input tensor validation in binary.py. Fix compute_output_shapes in bmm_op.cpp
- PR: #5010
#4620: Fix+improve bw test
- PR: #5029
#4852: Add unit tests for functional bloom
- PR: #5013
#5032: scalar argument versions for relops
- PR: #5018
#0: Add some README recommendations from MCW to clarify issue about access to internal workflows VM installation page
- PR: #5034
#4790: Implement GEGLU using ttnn for stable_diffusion model
- PR: #4869
#4999: Adding validation checks
- PR: #5011
#4791: Implement Feedforward sub-module using ttnn for stable_diffusi…
- PR: #4868
Npetrovic/bw ops sweeps
- PR: #5009
#4999: update documentation of ttnn operations to include the validation schema
- PR: #5031
#0: Remove model run from frequent_api_pipeline per @tt-rkim
- PR: #5043
Minor dprint/watcher cleanup
- PR: #5030
#4858: Add support for typecast
- PR: #4840
#0: Disable dprint tests because they're flaky at the moment
- PR: #5026
#4946: Add trig ops to ttnn
- PR: #5041
Nshanker/convs split by 2
- PR: #5042
#4946: Add inv trig ops to ttnn
- PR: #5038
#4003: fixed circular dependency in decorators
- PR: #5052
#5054: Removed asserts from conv op host code that are not required. …
- PR: #5055
#4003: fixed circular dependencies in ttnn
- PR: #5061
#4852: Fix CI pipeline by re-enabling functional bloom for causal LM
- PR: #5060
GroupNorm Sharded. support
- PR: #4945
#4972: is_sharded and memory_config is free from tensor
- PR: #4980
#0: eltwise ops/activate operator tracking for GS, and WHB0
- PR: #5074
Aliu/fd tunneling pr
- PR: #4725
#4642: Converted 14 old cpp tests to use gtest, with capabilities to switch btwn FD/SD when possible
- PR: #5050
#4852: Add tests for functional ttnn bloom implementation.
- PR: #5078
#4003: correctly convert all parameters of torch module to ttnn parameters
- PR: #5100
#5082: Pow gradient calculation method is different with pytorch
- PR: #5106
Argmax/Argmin support for channel, batch and all dim
- PR: #5040
#4420: switch to shared_ptr
- PR: #5123
#4420: return shared_future from taskflow async wrapper
- PR: #5121
Minor DPrint fixes
- PR: #5108
#0: Enable/disable clearing L1 from env var
- PR: #5107
#4003: started moving ttnn operation to C++
- PR: #5111
#4003: Add script to help with finding issues that we need approval for
- PR: #5129
#5044: Adding support for optional output tensors
- PR: #5104
#4003: Adding the open flag to show only open PRs
- PR: #5134
#5048: Add CreateDevices and CloseDevices api to detail
- PR: #5118
decouple ClearProgramCache from CommandQueue
- PR: #5124
Conv fixes for padding input channels. Shallow conv fixes. Conv input/output autoformatting. Cleanup
- PR: #5109
Asarje/mp unpack tilize fused
- PR: #5033
Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr
- PR: #5125
#5137: Cleanups for newer Linux distro / toolchains
- PR: #5114
Revert "#5137: Cleanups for newer Linux distro / toolchains"
- PR: #5139
Revert "Update CreateBuffer to return shared_ptr, and Enqueue R/W buffer to accept std::shared_ptr"
- PR: #5138
#4793: Implement ResnetBlock2D using ttnn for stable_diffusion model
- PR: #5084
#4788: Implement Downsample2D using ttnn for stable_diffusion model
- PR: #5090
#4792: Implement CrossAttention sub-module using ttnn for stable_diff…
- PR: #4927
#4747: Reduce amount of samples in bert sweeps
- PR: #5140
#4789: Add upsample2d to functional_stable_diffusion model
- PR: #5080
#0: Add fix for lamb optimizer
- PR: #5144
#5057: Add relational ops support to TTNN
- PR: #5120
skip eth test suite on GS
- PR: #5155
#4003: updated ttnn.Tensor to be derived form ttl.tensor.Tensor
- PR: #5130
Asarje/shwetank upsample
- PR: #5105
#5082: power gradient is erroneous when exponent is in range (0-1)
- PR: #5158

Contributors

tt-rkim

Assets 5