v0.50.0
·
13630 commits
to main
since this release
📦 Uncategorized
- Fix issue with Mamba SSM
Aweight preprocessing- PR: #9443
- Make buid key unique for mmio and remote devices with same harvest mask
- PR: #9435
- #5337: Removed eth_dispatch yaml flag from mistral tests
- PR: #9421
- New workflow for custom test dispatch on CI runners
- PR: #9536
- #9312: Add single-header
boost-ext/reflectlibrary as dependency- PR: #9328
- Opt LayerNorm/RMSNorm with 2D reduce
- PR: #9603
- Revert "#8630: support uint8 data type"
- PR: #9649
- #0: Fix codeowners for metal bert
- PR: #9635
- Revert "Revert "#8630: support uint8 data type""
- PR: #9651
- #9642: fix matmul2d in1 sharded with batch>1
- PR: #9655
- #0: add tile layout support for GN
- PR: #9645
- FD2 packed binary commands
- PR: #9572
- #9082: t3k demo with slack notifications for owners. split jobs
- PR: #9625
- Rtawfik/issue 9142
- PR: #9674
- #9688: Remove redundant left shift in DEBUG_SANITIZE_NOC_READ_TRANSACTION_FROM_STATE
- PR: #9689
- #9500: Update eth_interface include in tt_cluster to not be hardcoded for WH
- PR: #9501
- #9578: Add WITH_PYTHON_BINDINGS option to allow build w/o python
- PR: #9662
- #9587: Update CB and worker Go signals to respect max sub cmd limit introduced by dispatch packed write local copy change
- PR: #9670
- Add support for bfloat4 weights in Mamba
- PR: #8869
- Use in-place binary operations in Mamba block
- PR: #9726
- #5337: Relaxed Mistral expected compilation time in CI by 1 sec
- PR: #9731
- Mo/9406 profiler build flags
- PR: #9549
- Add support for single col/row/core output grid for matmul 2D
- PR: #9683
- #9725: Set release candidate releases on GitHub to pre-release, not draft, to enable downstream users
- PR: #9729
- add tagged docker image with releases
- PR: #9693
- Rtawfik/issue 9164
- PR: #9700
- #5562: resolve reduce scatter issues (nd hang and correctness)
- PR: #9423
- Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf
- PR: #9659
- #0: Fix bug with var name in single-chip falcon7b demo tests
- PR: #9740
- #9735: fix issues with including reflect library
- PR: #9737
- #9527: Remove usage of bcast where multiply is used
- PR: #9717
- Mchiou/9082 slack notification owners
- PR: #9690
- #9681: set name attribute for ttnn operations when fast runtime m…
- PR: #9730
- #9553: Add prefix scan op for Mamba prefill
- PR: #9554
- #9628: Merge Binary backward ops from tt_eager to TTNN
- PR: #9570
- Namhyeong kim/support fp32 dest acc in moreh adam
- PR: #9135
- #0: Update t3k workflow timeouts (except freq pipeline)
- PR: #9772
- Temporary update Mixtral perf times to pass CI
- PR: #9673
- #9479: fix cpu core worker bug
- PR: #9739
- #4858: add typecast fp32 <-> int32
- PR: #9736
- #0: ViT demo fix
- PR: #9768
- #9389: Add support for integer type in sum operation
- PR: #9548
- Transfer llama2/3 from experimental to demo folder.
- PR: #9716
- #9657: add topk multicore to support larger dimension sizes
- PR: #9718
- #4858: add typecast bfp8_b
- PR: #9779
- #9082: t3k model perf split tests with slack notifications, disabled cnn
- PR: #9761
- #0: Add ttnn/cpp to packages to enable using ttnn kernels in tt_eager ops
- PR: #9784
- #9741: Set stricter pytest timeouts
- PR: #9742
- #9492: Change models matmul usage to ttnn
- PR: #9727
- #9778: test prefetcher hanging with changes to test
- PR: #9795
- #9490: TTNN eltwise/unary migration
- PR: #9732
- Update timeout for falcon40b t3k demo test
- PR: #9777
- #0: Remove extra t3k falcon40b matrix test group
- PR: #9802
- #9044: Move dispatch core x y to be part of launch msg
- PR: #9743
- Modify rot mat each iteration to avoid allocating 10k tensors upfront
- PR: #9809
- Optimize bcast sharded op
- PR: #9822
- Start using
reflectlibrary- PR: #9780
- #0: Properly delete source folders for wheel testing
- PR: #9829
- #9479: Update Mixtral perf estimates
- PR: #9803
- #0: Added github community issue workflow
- PR: #9833
- #8729: Pytest multiprocess reset infrastructure
- PR: #9677
- Enable switching between 1 and 2 cqs in the same process
- PR: #9832
- Fixed failing tests for SD Conv tests for WH using new conv
- PR: #9799
- #0: Switch org-membership check to an authenticated call
- PR: #9840
- #0: Decrease num loops in trace stress tests
- PR: #9724
- #9628: Support optional return tensor
- PR: #9769
- #0: Use CV to wait for cq_reader in production mode. Remove enqueue_record_event for NB calls
- PR: #9793
- #9628: Merge second set of binary backward op from tt_eager to TTNN
- PR: #9771
- #0: Bump bert compile time threshold since it's been intermittently failing on ci
- PR: #9844
- Mchiou/9792 t3k runner management
- PR: #9847
- #0: Bump up Bert inference time due to instability on ci
- PR: #9850
- #8865: For host dispatch time measureing increese failing reference t…
- PR: #9438
- #9484: Add output_tensor queue_id to dependency ops
- PR: #9494
- Adding the new op: Flash Decode!
- PR: #9794
- #0: Add missing permissions to issue notification job
- PR: #9863
- #9275: Fix Falcon7b demo failing to run by default on an Grayskull e75
- PR: #9859
- #9801: Account for 64B BH PCIe alignment in cq cmd sizing
- PR: #9862
- #0: Make prefetcher early exit after fetching/reading exec_buf
- PR: #9856
- #8683: Add Unary bitwise AND, OR
- PR: #9437
- Ngrujic/profiling
- PR: #9875
- #9628: Merge third set of binary backward op from tt_eager to TTNN
- PR: #9846
- #4858: add typecast uint32
- PR: #9843
- Migrate Pad Host Code, Bindings, C++ Usages from TT Eager to TTNN
- PR: #9816
- Support longer sequence lengths in
ssm_prefix_scan- PR: #9776
- #9709: Add optional transpose_a and transpose_b to ttnn matmul and linear
- PR: #9836
- #0: Only run batch 12 bert for GS profiling and tighten some bert/resnet thresholds
- PR: #9851
- Asarje/resnet highres 20240624
- PR: #9660
- #9492: replace falcon specific matmul calls
- PR: #9810
- Extend ssm_eltwise_mul for num_users > 32
- PR: #9867
- Update documentation for adding new ttnn operation
- PR: #9841
- Extend ssm_1d_reduce for the batch>32
- PR: #9881
- #0: rn50 fix add api
- PR: #9890
- #9123: Add support for optional output tensors to run in the worker t…
- PR: #9894
- #9861: support check_tensor helper_function
- PR: #9869
- Fix syntax issues in custom test dispatch workflow
- PR: #9567
- Add Mixtral accuracy tests and cleanup its other tests (CI-friendly)
- PR: #9864
- #9876: Increase timeout on falcon7b perplexity tests.
- PR: #9880
- #9492: Remove bmm/resnet_matmul from models
- PR: #9896
- #9410: enable fp32 precision unpacking for interm. CBs
- PR: #9885
- #9903: Fix conditional statements and indexing of y values in CoreRange::diff
- PR: #9915
- #9860: fix test create device apis
- PR: #9919
- #0: delete unused code
- PR: #9921
- #9719: fixed l1 clear issue on nlp create qkv heads decode test case
- PR: #9924
- Fixing type in llama demo readme
- PR: #9927
- #9892: Device only op report
- PR: #9914
- #8704: define consts for registers that hold x-y coordinates and amount to shift address to get x-y coord
- PR: #9897
- CODEOWNERS update
- PR: #9930
- Abhullar/bh misc fix
- PR: #9899
- Auto-register C++ ttnn operations in python
- PR: #9900
- #9788: Remove TopK from TTLib and replace all references with the TTNN api
- PR: #9884
- #0: add owners for resnet demo
- PR: #9937
- 7-way split of eager tests
- PR: #9950
- #9910: Improve Softplus kernel accuracy
- PR: #9893
- #9818: Add cache check to op info V2
- PR: #9826
- #0: update noc test bound
- PR: #9922
- Fix branching bug in softplus kernel
- PR: #9955
- propagate error upwards for tests in falcon 40b suite
- PR: #9957
- #0: Fix falcon40b softmax import failure
- PR: #9958
- #9755: move ttnn.concat to match the new file structure
- PR: #9923
- #9837: Assign workers after performing ref count cleanup in async mode
- PR: #9944
- #0: Make event_synchronize API safer
- PR: #9965
- #0: Update buffer asserts to account for trace buffers
- PR: #9918
- Clean up ttnn operation registration on python side
- PR: #9961
- #9164: [Blackhole bringup] Add fix for unpack untilize
- PR: #9967
- Aliu/no l1 clear
- PR: #9931
- Restructure ttnn::permute to match the new standard format
- PR: #9917
- #9815: Update host to pass packed write max unicast sub cmds to cq dispatch
- PR: #9868
- Distributed layernorm op
- PR: #9382
- #9831: re-enable test
- PR: #9976
- #8835: cleaned up ttnn operation registration on C++ side
- PR: #9975
- #9941: update dram/l1 to noc xy header to do the appropriate shift
- PR: #9948
- #9336: Refactoring moreh layernorm
- PR: #9636
- #9745: move unpad to slice ttnn cpp references
- PR: #9970
- #9980: Update falcon updated outputs
- PR: #9981
- Fix Main after Pad Merge
- PR: #9988
- Update eltwise bcast unary ops to use memory_config and fix PCC issue for interleaved output
- PR: #9939
- Update FD cmds to be PCIe aligned
- PR: #9929
- Fix N150 product name to nebula_x1 even if its unharvested.
- PR: #9925
- #0: add a second codeowner for conv
- PR: #9990
- #0: Get tt-metal to compile with gcc-12
- PR: #9943
- #9492: Change to ttnn matmul in tests and tt_eager
- PR: #9928
- #9441: add typecast uint16->uint32
- PR: #9991
- Move ttnn::embedding to match new pybind structure and replace C++ ttlib embeddings usage with it
- PR: #9969
- #0: fix corerangeset for semaphore and CB to use good ranges
- PR: #9997
- #9490: Migrate unary ops to TTNN
- PR: #9916
- Moving Device Side Code for Unpad from TT Lib to TTNN
- PR: #9972
- #9871: Merge ternary backward ops to TTNN
- PR: #9904
- #0: Fix maybe uninitialized warnings
- PR: #9998
- #9759: Move UpSample to ttnn
- PR: #9879
- #9978: Refactoring
moreh_logsoftmaxfor support of large input value- PR: #10001
- Ngrujic/profiling
- PR: #9954
- #9971: Support time sharding in
ssm_prefix_scanop- PR: #9960
- Combined prefill decode for reference model
- PR: #9989
- Move Softmax to ttnn
- PR: #9820
- #9767: updated tt::stl::reflection library to print structs using boost reflect
- PR: #9994
- #9767: removed
attributesmethod as it's no longer needed because of reflect library- PR: #8758
- #0: Add Moreh representatives to CODEOWNERS
- PR: #10017
- use unpadded tensor for l4m1 on wormhole to fix PCC on WHB0 for B16
- PR: #9936
- #0: Skip RN50 large tests on GS/WH for certain shapes
- PR: #9942
- #6430: Fix reset-based hangs for WH
- PR: #9766
- #9849: Move checks on batch dims for matmul to validate
- PR: #10013
- #9492: move matmul code to ttnn directory hierarchy
- PR: #10015