v0.41.0
·
18299 commits
to main
since this release
Metal
API Changes
tt::tt_metal::detail::GLOBAL_CQreplaced withtt::tt_metal::detail::GetCommandQueue(Device *device)- New
num_hw_cqsparameter to specify underlying number of HW CQs for a givenDevice:Device *CreateDevice(chip_id_t device_id, const uint8_t num_hw_cqs = 1, const std::vector<uint32_t>& l1_bank_remap = {});
Tools
Profiler
- Integrated Tracy host-side CLI capture and csv report generation with metal’s profiler infrastructure
- Added support for device profiling on ethernet cores for Wormhole systems.
ttNN
Infrastructure
- Updated ttnn documentation with visualizations and examples
- Added padded shape to ttnn
- Renamed
ttnn.nlptottnn.transformer - Updated
ttnn.transformer.split_query_key_value_and_split_headsto handle most shapes, multi head query and cases when key_value_states are used to compute key and value - Added
ttnn.rms_norm - Added
ttnn.Shapeand exposed support for padded shape. Simplified broadcasting and reduction operations - Moved
ttnn.Tensorto C++ - Added debug decorator for ttnn operations
Operations
- Layer operators
layernorm,conv,softmaxwere optimized for multi-core computation; model specific operators forFalcon7Bwere also added. - The operator
normalize_globalwas added to the tt_lib.tensor namespace; this transforms the tensor by normalizing elements to the mean and standard deviation of the entire tensor. - The operator
lamb_optimizerwas added to the tt_lib.tensor namespace to help with computing the back-propagation algorithm and weight update for DNN in the training loop.
The following backward operators, for use with back-propagation training loop, have been added to tt_dnn library; they are accessible with suffix _bw in the tt_lib.tensor namespace.
1. abs
2. add
3. addalpha
4. addcdiv
5. addcmul
6. binary_assign
7. binary_le
8. clamp
9. clamp_max
10. clamp_min
11. div
12. exp
13. fill
14. fill_zero
15. gt
16. log
17. lt
18. max
19. min
20. mul
21. ne
22. neg
23. relu
24. rsqrt
25. rsub
26. sigmoid
27. sqrt
28. sub
29. tan
30. tanh
31. unary_add
32. unary_assign
33. unary_div
34. unary_mul
35. unary_pow
36. unary_sub
37. where
Models
- Added ttnn implementation for Roberta, Whisper, T5-small, and flan-T5-small
- Updated ttnn implementation of Bloom to work with L1 memory, and cleaned up ttnn implementation of BERT
- Updated Mistral implementation to use tilized tensors and operations
- Updated VGG model to load pre-tilized weight tensors and use tilized tensors
- Added benchmarking demo for DistilBert and T5 using SQuAD dataset for question answering