Skip to content

v0.41.0

Choose a tag to compare

@github-actions github-actions released this 13 Jan 21:15
· 18299 commits to main since this release

Metal

API Changes

  • tt::tt_metal::detail::GLOBAL_CQ replaced with tt::tt_metal::detail::GetCommandQueue(Device *device)
  • New num_hw_cqs parameter to specify underlying number of HW CQs for a given Device: Device *CreateDevice(chip_id_t device_id, const uint8_t num_hw_cqs = 1, const std::vector<uint32_t>& l1_bank_remap = {});

Tools

Profiler

  • Integrated Tracy host-side CLI capture and csv report generation with metal’s profiler infrastructure
  • Added support for device profiling on ethernet cores for Wormhole systems.

ttNN

Infrastructure

  • Updated ttnn documentation with visualizations and examples
  • Added padded shape to ttnn
  • Renamed ttnn.nlp to ttnn.transformer
  • Updated ttnn.transformer.split_query_key_value_and_split_heads to handle most shapes, multi head query and cases when key_value_states are used to compute key and value
  • Added ttnn.rms_norm
  • Added ttnn.Shape and exposed support for padded shape. Simplified broadcasting and reduction operations
  • Moved ttnn.Tensor to C++
  • Added debug decorator for ttnn operations

Operations

  • Layer operators layernorm, conv,softmax were optimized for multi-core computation; model specific operators for Falcon7B were also added.
  • The operator normalize_global was added to the tt_lib.tensor namespace; this transforms the tensor by normalizing elements to the mean and standard deviation of the entire tensor.
  • The operator lamb_optimizer was added to the tt_lib.tensor namespace to help with computing the back-propagation algorithm and weight update for DNN in the training loop.

The following backward operators, for use with back-propagation training loop, have been added to tt_dnn library; they are accessible with suffix _bw in the tt_lib.tensor namespace.

 1. abs
 2. add
 3. addalpha
 4. addcdiv
 5. addcmul
 6. binary_assign
 7. binary_le
 8. clamp
 9. clamp_max
10. clamp_min
11. div
12. exp
13. fill
14. fill_zero
15. gt
16. log
17. lt
18. max
19. min
20. mul
21. ne
22. neg
23. relu
24. rsqrt
25. rsub
26. sigmoid
27. sqrt
28. sub
29. tan
30. tanh
31. unary_add
32. unary_assign
33. unary_div
34. unary_mul
35. unary_pow
36. unary_sub
37. where

Models

  • Added ttnn implementation for Roberta, Whisper, T5-small, and flan-T5-small
  • Updated ttnn implementation of Bloom to work with L1 memory, and cleaned up ttnn implementation of BERT
  • Updated Mistral implementation to use tilized tensors and operations
  • Updated VGG model to load pre-tilized weight tensors and use tilized tensors
  • Added benchmarking demo for DistilBert and T5 using SQuAD dataset for question answering