Release v0.50.0 · tenstorrent/tt-metal

📦 Uncategorized

Fix issue with Mamba SSM A weight preprocessing
- PR: #9443
Make buid key unique for mmio and remote devices with same harvest mask
- PR: #9435
#5337: Removed eth_dispatch yaml flag from mistral tests
- PR: #9421
New workflow for custom test dispatch on CI runners
- PR: #9536
#9312: Add single-header boost-ext/reflect library as dependency
- PR: #9328
Opt LayerNorm/RMSNorm with 2D reduce
- PR: #9603
Revert "#8630: support uint8 data type"
- PR: #9649
#0: Fix codeowners for metal bert
- PR: #9635
Revert "Revert "#8630: support uint8 data type""
- PR: #9651
#9642: fix matmul2d in1 sharded with batch>1
- PR: #9655
#0: add tile layout support for GN
- PR: #9645
FD2 packed binary commands
- PR: #9572
#9082: t3k demo with slack notifications for owners. split jobs
- PR: #9625
Rtawfik/issue 9142
- PR: #9674
#9688: Remove redundant left shift in DEBUG_SANITIZE_NOC_READ_TRANSACTION_FROM_STATE
- PR: #9689
#9500: Update eth_interface include in tt_cluster to not be hardcoded for WH
- PR: #9501
#9578: Add WITH_PYTHON_BINDINGS option to allow build w/o python
- PR: #9662
#9587: Update CB and worker Go signals to respect max sub cmd limit introduced by dispatch packed write local copy change
- PR: #9670
Add support for bfloat4 weights in Mamba
- PR: #8869
Use in-place binary operations in Mamba block
- PR: #9726
#5337: Relaxed Mistral expected compilation time in CI by 1 sec
- PR: #9731
Mo/9406 profiler build flags
- PR: #9549
Add support for single col/row/core output grid for matmul 2D
- PR: #9683
#9725: Set release candidate releases on GitHub to pre-release, not draft, to enable downstream users
- PR: #9729
add tagged docker image with releases
- PR: #9693
Rtawfik/issue 9164
- PR: #9700
#5562: resolve reduce scatter issues (nd hang and correctness)
- PR: #9423
Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf
- PR: #9659
#0: Fix bug with var name in single-chip falcon7b demo tests
- PR: #9740
#9735: fix issues with including reflect library
- PR: #9737
#9527: Remove usage of bcast where multiply is used
- PR: #9717
Mchiou/9082 slack notification owners
- PR: #9690
#9681: set name attribute for ttnn operations when fast runtime m…
- PR: #9730
#9553: Add prefix scan op for Mamba prefill
- PR: #9554
#9628: Merge Binary backward ops from tt_eager to TTNN
- PR: #9570
Namhyeong kim/support fp32 dest acc in moreh adam
- PR: #9135
#0: Update t3k workflow timeouts (except freq pipeline)
- PR: #9772
Temporary update Mixtral perf times to pass CI
- PR: #9673
#9479: fix cpu core worker bug
- PR: #9739
#4858: add typecast fp32 <-> int32
- PR: #9736
#0: ViT demo fix
- PR: #9768
#9389: Add support for integer type in sum operation
- PR: #9548
Transfer llama2/3 from experimental to demo folder.
- PR: #9716
#9657: add topk multicore to support larger dimension sizes
- PR: #9718
#4858: add typecast bfp8_b
- PR: #9779
#9082: t3k model perf split tests with slack notifications, disabled cnn
- PR: #9761
#0: Add ttnn/cpp to packages to enable using ttnn kernels in tt_eager ops
- PR: #9784
#9741: Set stricter pytest timeouts
- PR: #9742
#9492: Change models matmul usage to ttnn
- PR: #9727
#9778: test prefetcher hanging with changes to test
- PR: #9795
#9490: TTNN eltwise/unary migration
- PR: #9732
Update timeout for falcon40b t3k demo test
- PR: #9777
#0: Remove extra t3k falcon40b matrix test group
- PR: #9802
#9044: Move dispatch core x y to be part of launch msg
- PR: #9743
Modify rot mat each iteration to avoid allocating 10k tensors upfront
- PR: #9809
Optimize bcast sharded op
- PR: #9822
Start using reflect library
- PR: #9780
#0: Properly delete source folders for wheel testing
- PR: #9829
#9479: Update Mixtral perf estimates
- PR: #9803
#0: Added github community issue workflow
- PR: #9833
#8729: Pytest multiprocess reset infrastructure
- PR: #9677
Enable switching between 1 and 2 cqs in the same process
- PR: #9832
Fixed failing tests for SD Conv tests for WH using new conv
- PR: #9799
#0: Switch org-membership check to an authenticated call
- PR: #9840
#0: Decrease num loops in trace stress tests
- PR: #9724
#9628: Support optional return tensor
- PR: #9769
#0: Use CV to wait for cq_reader in production mode. Remove enqueue_record_event for NB calls
- PR: #9793
#9628: Merge second set of binary backward op from tt_eager to TTNN
- PR: #9771
#0: Bump bert compile time threshold since it's been intermittently failing on ci
- PR: #9844
Mchiou/9792 t3k runner management
- PR: #9847
#0: Bump up Bert inference time due to instability on ci
- PR: #9850
#8865: For host dispatch time measureing increese failing reference t…
- PR: #9438
#9484: Add output_tensor queue_id to dependency ops
- PR: #9494
Adding the new op: Flash Decode!
- PR: #9794
#0: Add missing permissions to issue notification job
- PR: #9863
#9275: Fix Falcon7b demo failing to run by default on an Grayskull e75
- PR: #9859
#9801: Account for 64B BH PCIe alignment in cq cmd sizing
- PR: #9862
#0: Make prefetcher early exit after fetching/reading exec_buf
- PR: #9856
#8683: Add Unary bitwise AND, OR
- PR: #9437
Ngrujic/profiling
- PR: #9875
#9628: Merge third set of binary backward op from tt_eager to TTNN
- PR: #9846
#4858: add typecast uint32
- PR: #9843
Migrate Pad Host Code, Bindings, C++ Usages from TT Eager to TTNN
- PR: #9816
Support longer sequence lengths in ssm_prefix_scan
- PR: #9776
#9709: Add optional transpose_a and transpose_b to ttnn matmul and linear
- PR: #9836
#0: Only run batch 12 bert for GS profiling and tighten some bert/resnet thresholds
- PR: #9851
Asarje/resnet highres 20240624
- PR: #9660
#9492: replace falcon specific matmul calls
- PR: #9810
Extend ssm_eltwise_mul for num_users > 32
- PR: #9867
Update documentation for adding new ttnn operation
- PR: #9841
Extend ssm_1d_reduce for the batch>32
- PR: #9881
#0: rn50 fix add api
- PR: #9890
#9123: Add support for optional output tensors to run in the worker t…
- PR: #9894
#9861: support check_tensor helper_function
- PR: #9869
Fix syntax issues in custom test dispatch workflow
- PR: #9567
Add Mixtral accuracy tests and cleanup its other tests (CI-friendly)
- PR: #9864
#9876: Increase timeout on falcon7b perplexity tests.
- PR: #9880
#9492: Remove bmm/resnet_matmul from models
- PR: #9896
#9410: enable fp32 precision unpacking for interm. CBs
- PR: #9885
#9903: Fix conditional statements and indexing of y values in CoreRange::diff
- PR: #9915
#9860: fix test create device apis
- PR: #9919
#0: delete unused code
- PR: #9921
#9719: fixed l1 clear issue on nlp create qkv heads decode test case
- PR: #9924
Fixing type in llama demo readme
- PR: #9927
#9892: Device only op report
- PR: #9914
#8704: define consts for registers that hold x-y coordinates and amount to shift address to get x-y coord
- PR: #9897
CODEOWNERS update
- PR: #9930
Abhullar/bh misc fix
- PR: #9899
Auto-register C++ ttnn operations in python
- PR: #9900
#9788: Remove TopK from TTLib and replace all references with the TTNN api
- PR: #9884
#0: add owners for resnet demo
- PR: #9937
7-way split of eager tests
- PR: #9950
#9910: Improve Softplus kernel accuracy
- PR: #9893
#9818: Add cache check to op info V2
- PR: #9826
#0: update noc test bound
- PR: #9922
Fix branching bug in softplus kernel
- PR: #9955
propagate error upwards for tests in falcon 40b suite
- PR: #9957
#0: Fix falcon40b softmax import failure
- PR: #9958
#9755: move ttnn.concat to match the new file structure
- PR: #9923
#9837: Assign workers after performing ref count cleanup in async mode
- PR: #9944
#0: Make event_synchronize API safer
- PR: #9965
#0: Update buffer asserts to account for trace buffers
- PR: #9918
Clean up ttnn operation registration on python side
- PR: #9961
#9164: [Blackhole bringup] Add fix for unpack untilize
- PR: #9967
Aliu/no l1 clear
- PR: #9931
Restructure ttnn::permute to match the new standard format
- PR: #9917
#9815: Update host to pass packed write max unicast sub cmds to cq dispatch
- PR: #9868
Distributed layernorm op
- PR: #9382
#9831: re-enable test
- PR: #9976
#8835: cleaned up ttnn operation registration on C++ side
- PR: #9975
#9941: update dram/l1 to noc xy header to do the appropriate shift
- PR: #9948
#9336: Refactoring moreh layernorm
- PR: #9636
#9745: move unpad to slice ttnn cpp references
- PR: #9970
#9980: Update falcon updated outputs
- PR: #9981
Fix Main after Pad Merge
- PR: #9988
Update eltwise bcast unary ops to use memory_config and fix PCC issue for interleaved output
- PR: #9939
Update FD cmds to be PCIe aligned
- PR: #9929
Fix N150 product name to nebula_x1 even if its unharvested.
- PR: #9925
#0: add a second codeowner for conv
- PR: #9990
#0: Get tt-metal to compile with gcc-12
- PR: #9943
#9492: Change to ttnn matmul in tests and tt_eager
- PR: #9928
#9441: add typecast uint16->uint32
- PR: #9991
Move ttnn::embedding to match new pybind structure and replace C++ ttlib embeddings usage with it
- PR: #9969
#0: fix corerangeset for semaphore and CB to use good ranges
- PR: #9997
#9490: Migrate unary ops to TTNN
- PR: #9916
Moving Device Side Code for Unpad from TT Lib to TTNN
- PR: #9972
#9871: Merge ternary backward ops to TTNN
- PR: #9904
#0: Fix maybe uninitialized warnings
- PR: #9998
#9759: Move UpSample to ttnn
- PR: #9879
#9978: Refactoring moreh_logsoftmax for support of large input value
- PR: #10001
Ngrujic/profiling
- PR: #9954
#9971: Support time sharding in ssm_prefix_scan op
- PR: #9960
Combined prefill decode for reference model
- PR: #9989
Move Softmax to ttnn
- PR: #9820
#9767: updated tt::stl::reflection library to print structs using boost reflect
- PR: #9994
#9767: removed attributes method as it's no longer needed because of reflect library
- PR: #8758
#0: Add Moreh representatives to CODEOWNERS
- PR: #10017
use unpadded tensor for l4m1 on wormhole to fix PCC on WHB0 for B16
- PR: #9936
#0: Skip RN50 large tests on GS/WH for certain shapes
- PR: #9942
#6430: Fix reset-based hangs for WH
- PR: #9766
#9849: Move checks on batch dims for matmul to validate
- PR: #10013
#9492: move matmul code to ttnn directory hierarchy
- PR: #10015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.50.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

📦 Uncategorized

Uh oh!