v0.34.0
·
19581 commits
to main
since this release
Metal
API Changes
CreateDevice: device_id type has changed from int to chip_id_tCreateCircularBuffer: Three previous variants which only differ by CoreCoord, CoreRange, and CoreRangeSet function parameter have been compressed into one user-facingCreateCircularBufferfunction that’s parameterized withstd::variant<CoreCoord,CoreRange,CoreRangeSet>. Now acceptsCircularBufferConfigwhich specifies size, data format, and page size per buffer index. Return type updated fromCircularBufferobject toCircularBufferID(uintptr_t)GetCircularBufferConfig: New function to retrieve a reference to configuration of aCircularBuffer. This allows theCircularBufferconfig to be updated. Updates will take effect on the next call toLaunchProgram.
Tools - Profiler
Tracy Python Support : Profile python side code with tracy. Similar to cProfile, the standard python profiler module, all python function calls are picked up on tracy. Additionally, TT’s binded C++ calls are also picked up automatically. The entire python script or just desired parts of it can be profiled either at function or line level.
Extra features
Runtime Compute Args: Arguments can be sent to Compute Kernels at runtime. The kernel uses the same get_arg_val<type>(<index>) API to retrieve it. The host uses the same tt_metal::SetRuntimeArgs(<program, <compute_kernel_id>, <Core,CoreRange> , <vector of u32 runtime args>) as DataMovement Kernel.
Eager (Ops)
Notes not yet available.
Models
- metal_BERT_large_15: model implementation updated to use tt-DNN operation embedding that executes on GS device. Previously this model used PyTorch embedding operation executing on CPU.
- Falcon7b: added end to end demo that is running on GS device. The demo takes a text prompt and returns text generated by the model to complete the prompt. The demo works by pre-filling the cache with decoded input prompts and then running decode for all users in parallel.