v0.39.0
·
18645 commits
to main
since this release
📦 Uncategorized
- #0: Add extra sentence about use cases in somewhat vague terms
- PR: #3975
- #3824: cache weight tensors for mistral
- PR: #3973
- Npetrovic/power fp sweep
- PR: #3959
- #3918: Fix falcon7b perf profiling & add support to load weights from HF when weka is not mounted
- PR: #3863
- Rename KernelID -> KernelHandle and CircularBufferID -> CBHandle
- PR: #3939
- Aliu/erisc cleanup
- PR: #3989
- #3003: ttnn program logging
- PR: #3987
- Watcher output/doc tweaks
- PR: #3998
- #4014: added support for uint16 datatype
- PR: #4015
- #4000: Add links to demo folders in note in first 5 things
- PR: #4012
- #3751: Fix sfpu load/store of ints
- PR: #4016
- enable watcher for stress test actions
- PR: #4021
- #3058: Give first pass at flattening build by getting rid of tt-metal intermediate libs
- PR: #4011
- Revert "#3058: Give first pass at flattening build by getting rid of …
- PR: #4042
- #3219: Added host functions which tilize and untilize bfloat16 vectors
- PR: #4038
- stress test machine config update
- PR: #4025
- #0: update to use concat on device
- PR: #4010
- #3895: ttnn functional optimized Bert
- PR: #4020
- #4014: Fix bug with packing uint16 datatype
- PR: #4050
- #3824: move mistral embedding weights to weka
- PR: #4028
- #3978: Fix readme to instruct running pytest without warnings
- PR: #3984
- Dma/3467 dprint cleanup
- PR: #4018
- #0: identity operator for comparison of SFPU ops
- PR: #4019
- #3058: Add tracy back into build and test with ENABLE_TRACY=1
- PR: #4047
- #3979: Add support for ResNet for weka unmounted machines to download ImageNet
- PR: #4066
- #3990: Remove DPRINT SETW sticky bit
- PR: #4081
- #4041: Add moreh_layernorm op
- PR: #4045
- #4044: Add moreh_softmax, moreh_softmin ops
- PR: #4060
- #3103: profile the SFPU operators
- PR: #4075
- #0: function typo fix
- PR: #4100
- #3211: bug in WH B0 - sum along dim3
- PR: #4099
- Implementation for Bert Sharded Batch 12
- PR: #4093
- #4069: Avoid reading out of bounds in the hugepage
- PR: #4098
- #4014: Add testing for uint16 and uint32 on device
- PR: #4094
- #0: Disable TestPrintRaiseWait gtest until a fix for nondet issue is in
- PR: #4123
- Move hugepages section and refer to public syseng instructions for accelerator-level dependencies
- PR: #4124
- #4055: non-deterministic test_pow_fractional PCC error with watcher enabled
- PR: #4129
- #0: update test_sfpu and profiling conflict
- PR: #4128
- #4043: Add discord link to docs support page + README
- PR: #4134
- Noc on erisc
- PR: #4046
- #3894: backward ops for tt-metal
- PR: #4054
- #3972: Update tracy and device-side profiler docs
- PR: #4138
- #4085: update seed value and re-verify the reported bug
- PR: #4139
- #2860: Init one UMD per MMIO device ID and the remote devices it controls
- PR: #4080
- #4074: Add opened, reopened, synchronize pull_request triggers (default) for static checks pipeline
- PR: #4152
- #0: Ignore /device, not device/ in .gitignore
- PR: #4153
- #4074: Add wording to CONTRIBUTING.md to be open to future forks + to discourage clogging up pipelines with too many PRs
- PR: #4155
- #4053: Upgrade driver from 1.23 to 1.26 in release assets from syseng
- PR: #4133
- #4065: Update pinned python3.8-venv to 20.04.9 because 20.04.8 is gone
- PR: #4135
- #4096: Fix issue with DPRINT server closing too early for some WAITs
- PR: #4130
- #4053: Add chmod ugo+x step in ansible scripts for copying over script assets
- PR: #4167
- #4109: ttnn examples.rst needs update
- PR: #4149
- #4158: support full repeat interleave developed for Mistral
- PR: #4113
- #4076: Add instructions for execution for programming_examples and fix one typo
- PR: #4168
- #0: (MINOR) Bump minor to v0.39.0
- PR: #4175
- #4053: Get rid of FW labels for silicon runner targets
- PR: #4169
- #3752: update ttnn tutorials and make them more descriptive
- PR: #4178
- #3994: Add bfloat16 dtype to sweep tests
- PR: #4090
- #0: update ownership for SFPU ops profiler, and Backward ops code
- PR: #4179
- #3420: move init erisc info to clear l1 call
- PR: #4166
- #3918: Add falcon caching support
- PR: #4185
- #4125: Refactor tests for backward ops
- PR: #4180
- Perf bloom
- PR: #4095
- #4121: Unset TT_METAL_SLOW_DISPATCH_MODE when empty string in yaml. R…
- PR: #4182
- #4079: Remove dprints from op kernels
- PR: #4191
- #4176: uplift umd to include create-eth-map fixes
- PR: #4195
- #4017: Replace static device APIs to query num available devices and num availale pcie devices with standalone host APIs
- PR: #4190
- Fixup some error messages
- PR: #4209
- Rework build system
- PR: #4192
- #4228: Revert umd change to see if seg faults go away
- PR: #4229
- #4003: use if-else instead of try-except in ttnn.reshape and ttnn.permute
- PR: #4235
- #4003: updated ttnn.model_preprocessing to keep the structure of the model weights
- PR: #4196
- #0: Changing name for major places from Metal to Metalium
- PR: #4239
- #4186: Move all assets except for setup_hugepages.py to internal workflows
- PR: #4189
- #4003: run test_performance_of_bloom_for_question_answering using L1 Config and assuming fused softmax
- PR: #4238
- #3003: updated ttnn tests
- PR: #4242