v0.40.0
·
18456 commits
to main
since this release
📦 Uncategorized
- Opt LN_sharded and SMX_sharded
- PR: #4147
- #1919: Turn existing allocator tests into gtests
- PR: #4218
- Agrebenisan/fd perf opt
- PR: #4219
- #3932: Rename unary op args which were input_a -> input, binary ops from input, other -> input_a, input_b
- PR: #4194
- #3971: Fix TSLICE printing truncation when hitting MAX_COUNT
- PR: #4159
- #0: Fix undefined variable error when running with watcher
- PR: #4256
- #4141: Add GetPreferredNOCForDRAMRead, GetPreferredNOCForDRAMWrite and update all ops to use these apis
- PR: #4184
- #3420: fix eth core init L1 bug
- PR: #4262
- #0: Add ttnn founding engineers as CODEOWNERS of functional models
- PR: #4265
- #0: Commonize logic between E2E and device perf functions/scripts. Enable assertions for device perf scripts/ci
- PR: #4248
- Issue 4073: Fix for host-side hanging when an invalid DPRINT WAIT command is running on the device.
- PR: #4103
- #0: Add tt-rkim as CODEOWNERS for setup_hugepages.py
- PR: #4266
- #4003: implemented functional t5 model
- PR: #4241
- #3003: commonized variable names across tnn tests. Removed ttnn.experimental. Added ttnn.unary and commonized the import of ttl unary ops
- PR: #4268
- #0: Delete extra text in first docs page about being added to repo
- PR: #4295
- write watcher log to built/ folder rather than kernel subfolder
- PR: #4291
- Add Batch>1 fix for matmul blocking API
- PR: #4296
- #4231: improve unary add, sub, mul and div implementation in SFPU. Add complex polar operator
- PR: #4257
- #3493: sharded tensor support
- PR: #3790
- REVERT #4231: Fine-tune the unary ops to improve performance
- PR: #4312
- #0: Move setup_hugepages.py to release assets
- PR: #4264
- #0: (MINOR) Update VERSION to 0.40.0
- PR: #4315
- #4301: Fix link to announcements in README
- PR: #4317
- #4301: Replace some more instances of Metal w/ Metalium in docs
- PR: #4320
- Llk refactor uplift
- PR: #3908
- #0: Fix TT-Metalium docs link in get_performance.rst
- PR: #4323
- #0: uplift in device code
- PR: #4299
- #4176: uplift umd plus tt_metal changes
- PR: #4333
- init fw once
- PR: #4335
- Merge v2 of untilize_with_halo, maxpool, and conv ops for Resnet-50
- PR: #4325
- Backward ops for Metalium - part-2
- PR: #4322
- #4211: Assert that hugepages number is greater than or equal to required, rather than equal to
- PR: #4381
- Update resnet readme
- PR: #4367
- Add Run Instructions for BERT_large sharded in readme
- PR: #4366
- Add batch 20 for resnet-50
- PR: #4371
- #4376: Support mixed precision for eltwise binary with prescaling
- PR: #4387
- Increase timeout of slow dispatch unit tests and switch to Y_M_D format for ops logs
- PR: #4397
- #0: point umd to main, comestic change
- PR: #4396
- New tilize and straightforward vec gen in matmul kernel examples
- PR: #4261
- #4216: Enable DPrint slow dispatch testing
- PR: #4326
- #4376: Call llk reconfig functions in compute kernel apis for WH
- PR: #4393
- #4336: #4386: Fix interleaved_to_sharded writer waiting on incorrect amount of data for uneven shards
- PR: #4402
- #1433: removed Device* and MemoryConfig from DeviceStorage
- PR: #4411
- #0: Increase fast dispatch post commit timeout and shorten full regressions because we no longer need that much time
- PR: #4412
- #4003: added ttnn.mean, ttnn.rsqrt and ttnn.pow and deleted and got rid of ttl use in ttnn_functional_t5. Updated ttnn.Tensor to store shape as ttnn.Shape
- PR: #4383
- Aliu/load base erisc
- PR: #4394
- #4399: add spell checker script for docs spellchecking
- PR: #4398
- #2134: Uplift UMD
- PR: #4400
- #0: fix memory leaks found in test_sfpu via valgrind
- PR: #4419
- Revert "#4399: add spell checker script spellcheck.sh should be read…
- PR: #4424
- #0: update llk.rst for minor ReST syntax
- PR: #4423
- #2934: Make one CommandQueue and one HW CommandQueue (SysmemWriter) per device
- PR: #4077
- #4003: convert ttl.tennsor.Shape to tuple when using it in torch functions
- PR: #4426
- #4211: Fix HP targeting issues in main from cq-per-device changes
- PR: #4447