Skip to content

v0.50.0

Choose a tag to compare

@github-actions github-actions released this 10 Jul 22:04
· 13630 commits to main since this release
f7c10a2

📦 Uncategorized

  • Fix issue with Mamba SSM A weight preprocessing
  • Make buid key unique for mmio and remote devices with same harvest mask
  • #5337: Removed eth_dispatch yaml flag from mistral tests
  • New workflow for custom test dispatch on CI runners
  • #9312: Add single-header boost-ext/reflect library as dependency
  • Opt LayerNorm/RMSNorm with 2D reduce
  • Revert "#8630: support uint8 data type"
  • #0: Fix codeowners for metal bert
  • Revert "Revert "#8630: support uint8 data type""
  • #9642: fix matmul2d in1 sharded with batch>1
  • #0: add tile layout support for GN
  • FD2 packed binary commands
  • #9082: t3k demo with slack notifications for owners. split jobs
  • Rtawfik/issue 9142
  • #9688: Remove redundant left shift in DEBUG_SANITIZE_NOC_READ_TRANSACTION_FROM_STATE
  • #9500: Update eth_interface include in tt_cluster to not be hardcoded for WH
  • #9578: Add WITH_PYTHON_BINDINGS option to allow build w/o python
  • #9587: Update CB and worker Go signals to respect max sub cmd limit introduced by dispatch packed write local copy change
  • Add support for bfloat4 weights in Mamba
  • Use in-place binary operations in Mamba block
  • #5337: Relaxed Mistral expected compilation time in CI by 1 sec
  • Mo/9406 profiler build flags
  • Add support for single col/row/core output grid for matmul 2D
  • #9725: Set release candidate releases on GitHub to pre-release, not draft, to enable downstream users
  • add tagged docker image with releases
  • Rtawfik/issue 9164
  • #5562: resolve reduce scatter issues (nd hang and correctness)
  • Create benchmarking tools for saving run/measurement data (with Falcon7b example) and model-demo utilities for verifying tokens/perf
  • #0: Fix bug with var name in single-chip falcon7b demo tests
  • #9735: fix issues with including reflect library
  • #9527: Remove usage of bcast where multiply is used
  • Mchiou/9082 slack notification owners
  • #9681: set name attribute for ttnn operations when fast runtime m…
  • #9553: Add prefix scan op for Mamba prefill
  • #9628: Merge Binary backward ops from tt_eager to TTNN
  • Namhyeong kim/support fp32 dest acc in moreh adam
  • #0: Update t3k workflow timeouts (except freq pipeline)
  • Temporary update Mixtral perf times to pass CI
  • #9479: fix cpu core worker bug
  • #4858: add typecast fp32 <-> int32
  • #0: ViT demo fix
  • #9389: Add support for integer type in sum operation
  • Transfer llama2/3 from experimental to demo folder.
  • #9657: add topk multicore to support larger dimension sizes
  • #4858: add typecast bfp8_b
  • #9082: t3k model perf split tests with slack notifications, disabled cnn
  • #0: Add ttnn/cpp to packages to enable using ttnn kernels in tt_eager ops
  • #9741: Set stricter pytest timeouts
  • #9492: Change models matmul usage to ttnn
  • #9778: test prefetcher hanging with changes to test
  • #9490: TTNN eltwise/unary migration
  • Update timeout for falcon40b t3k demo test
  • #0: Remove extra t3k falcon40b matrix test group
  • #9044: Move dispatch core x y to be part of launch msg
  • Modify rot mat each iteration to avoid allocating 10k tensors upfront
  • Optimize bcast sharded op
  • Start using reflect library
  • #0: Properly delete source folders for wheel testing
  • #9479: Update Mixtral perf estimates
  • #0: Added github community issue workflow
  • #8729: Pytest multiprocess reset infrastructure
  • Enable switching between 1 and 2 cqs in the same process
  • Fixed failing tests for SD Conv tests for WH using new conv
  • #0: Switch org-membership check to an authenticated call
  • #0: Decrease num loops in trace stress tests
  • #9628: Support optional return tensor
  • #0: Use CV to wait for cq_reader in production mode. Remove enqueue_record_event for NB calls
  • #9628: Merge second set of binary backward op from tt_eager to TTNN
  • #0: Bump bert compile time threshold since it's been intermittently failing on ci
  • Mchiou/9792 t3k runner management
  • #0: Bump up Bert inference time due to instability on ci
  • #8865: For host dispatch time measureing increese failing reference t…
  • #9484: Add output_tensor queue_id to dependency ops
  • Adding the new op: Flash Decode!
  • #0: Add missing permissions to issue notification job
  • #9275: Fix Falcon7b demo failing to run by default on an Grayskull e75
  • #9801: Account for 64B BH PCIe alignment in cq cmd sizing
  • #0: Make prefetcher early exit after fetching/reading exec_buf
  • #8683: Add Unary bitwise AND, OR
  • Ngrujic/profiling
  • #9628: Merge third set of binary backward op from tt_eager to TTNN
  • #4858: add typecast uint32
  • Migrate Pad Host Code, Bindings, C++ Usages from TT Eager to TTNN
  • Support longer sequence lengths in ssm_prefix_scan
  • #9709: Add optional transpose_a and transpose_b to ttnn matmul and linear
  • #0: Only run batch 12 bert for GS profiling and tighten some bert/resnet thresholds
  • Asarje/resnet highres 20240624
  • #9492: replace falcon specific matmul calls
  • Extend ssm_eltwise_mul for num_users > 32
  • Update documentation for adding new ttnn operation
  • Extend ssm_1d_reduce for the batch>32
  • #0: rn50 fix add api
  • #9123: Add support for optional output tensors to run in the worker t…
  • #9861: support check_tensor helper_function
  • Fix syntax issues in custom test dispatch workflow
  • Add Mixtral accuracy tests and cleanup its other tests (CI-friendly)
  • #9876: Increase timeout on falcon7b perplexity tests.
  • #9492: Remove bmm/resnet_matmul from models
  • #9410: enable fp32 precision unpacking for interm. CBs
  • #9903: Fix conditional statements and indexing of y values in CoreRange::diff
  • #9860: fix test create device apis
  • #0: delete unused code
  • #9719: fixed l1 clear issue on nlp create qkv heads decode test case
  • Fixing type in llama demo readme
  • #9892: Device only op report
  • #8704: define consts for registers that hold x-y coordinates and amount to shift address to get x-y coord
  • CODEOWNERS update
  • Abhullar/bh misc fix
  • Auto-register C++ ttnn operations in python
  • #9788: Remove TopK from TTLib and replace all references with the TTNN api
  • #0: add owners for resnet demo
  • 7-way split of eager tests
  • #9910: Improve Softplus kernel accuracy
  • #9818: Add cache check to op info V2
  • #0: update noc test bound
  • Fix branching bug in softplus kernel
  • propagate error upwards for tests in falcon 40b suite
  • #0: Fix falcon40b softmax import failure
  • #9755: move ttnn.concat to match the new file structure
  • #9837: Assign workers after performing ref count cleanup in async mode
  • #0: Make event_synchronize API safer
  • #0: Update buffer asserts to account for trace buffers
  • Clean up ttnn operation registration on python side
  • #9164: [Blackhole bringup] Add fix for unpack untilize
  • Aliu/no l1 clear
  • Restructure ttnn::permute to match the new standard format
  • #9815: Update host to pass packed write max unicast sub cmds to cq dispatch
  • Distributed layernorm op
  • #9831: re-enable test
  • #8835: cleaned up ttnn operation registration on C++ side
  • #9941: update dram/l1 to noc xy header to do the appropriate shift
  • #9336: Refactoring moreh layernorm
  • #9745: move unpad to slice ttnn cpp references
  • #9980: Update falcon updated outputs
  • Fix Main after Pad Merge
  • Update eltwise bcast unary ops to use memory_config and fix PCC issue for interleaved output
  • Update FD cmds to be PCIe aligned
  • Fix N150 product name to nebula_x1 even if its unharvested.
  • #0: add a second codeowner for conv
  • #0: Get tt-metal to compile with gcc-12
  • #9492: Change to ttnn matmul in tests and tt_eager
  • #9441: add typecast uint16->uint32
  • Move ttnn::embedding to match new pybind structure and replace C++ ttlib embeddings usage with it
  • #0: fix corerangeset for semaphore and CB to use good ranges
  • #9490: Migrate unary ops to TTNN
  • Moving Device Side Code for Unpad from TT Lib to TTNN
  • #9871: Merge ternary backward ops to TTNN
  • #0: Fix maybe uninitialized warnings
  • #9759: Move UpSample to ttnn
  • #9978: Refactoring moreh_logsoftmax for support of large input value
  • Ngrujic/profiling
  • #9971: Support time sharding in ssm_prefix_scan op
  • Combined prefill decode for reference model
  • Move Softmax to ttnn
  • #9767: updated tt::stl::reflection library to print structs using boost reflect
  • #9767: removed attributes method as it's no longer needed because of reflect library
  • #0: Add Moreh representatives to CODEOWNERS
  • use unpadded tensor for l4m1 on wormhole to fix PCC on WHB0 for B16
  • #0: Skip RN50 large tests on GS/WH for certain shapes
  • #6430: Fix reset-based hangs for WH
  • #9849: Move checks on batch dims for matmul to validate
  • #9492: move matmul code to ttnn directory hierarchy