Metal

API Changes

tt::tt_metal::detail::GLOBAL_CQ replaced with tt::tt_metal::detail::GetCommandQueue(Device *device)
New num_hw_cqs parameter to specify underlying number of HW CQs for a given Device: Device *CreateDevice(chip_id_t device_id, const uint8_t num_hw_cqs = 1, const std::vector<uint32_t>& l1_bank_remap = {});

Tools

Profiler

Integrated Tracy host-side CLI capture and csv report generation with metal’s profiler infrastructure
Added support for device profiling on ethernet cores for Wormhole systems.

ttNN

Infrastructure

Updated ttnn documentation with visualizations and examples
Added padded shape to ttnn
Renamed ttnn.nlp to ttnn.transformer
Updated ttnn.transformer.split_query_key_value_and_split_heads to handle most shapes, multi head query and cases when key_value_states are used to compute key and value
Added ttnn.rms_norm
Added ttnn.Shape and exposed support for padded shape. Simplified broadcasting and reduction operations
Moved ttnn.Tensor to C++
Added debug decorator for ttnn operations

Operations

Layer operators layernorm, conv,softmax were optimized for multi-core computation; model specific operators for Falcon7B were also added.
The operator normalize_global was added to the tt_lib.tensor namespace; this transforms the tensor by normalizing elements to the mean and standard deviation of the entire tensor.
The operator lamb_optimizer was added to the tt_lib.tensor namespace to help with computing the back-propagation algorithm and weight update for DNN in the training loop.

The following backward operators, for use with back-propagation training loop, have been added to tt_dnn library; they are accessible with suffix _bw in the tt_lib.tensor namespace.

 1. abs
 2. add
 3. addalpha
 4. addcdiv
 5. addcmul
 6. binary_assign
 7. binary_le
 8. clamp
 9. clamp_max
10. clamp_min
11. div
12. exp
13. fill
14. fill_zero
15. gt
16. log
17. lt
18. max
19. min
20. mul
21. ne
22. neg
23. relu
24. rsqrt
25. rsub
26. sigmoid
27. sqrt
28. sub
29. tan
30. tanh
31. unary_add
32. unary_assign
33. unary_div
34. unary_mul
35. unary_pow
36. unary_sub
37. where

Models

Added ttnn implementation for Roberta, Whisper, T5-small, and flan-T5-small
Updated ttnn implementation of Bloom to work with L1 memory, and cleaned up ttnn implementation of BERT
Updated Mistral implementation to use tilized tensors and operations
Updated VGG model to load pre-tilized weight tensors and use tilized tensors
Added benchmarking demo for DistilBert and T5 using SQuAD dataset for question answering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.41.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Metal

API Changes

Tools

Profiler

ttNN

Infrastructure

Operations

Models

Uh oh!