v0.37.0
Metal
API Changes
-
Top-level API to create a Program:
Program CreateProgram(); -
GetRuntimeArgsnow returns a reference to underlying runtime args to allow for in-place updates. This results in noticeably better performance for host-bound workloads:
std::vector<uint32_t>& GetRuntimeArgs(const Program &program, KernelID kernel_id, const CoreCoord &logical_core); -
Two other variants of updating runtime arguments that results in better host-side performance in certain situations:
void UpdateRuntimeArg(const Program &program, KernelID kernel, const std::variant<CoreCoord, CoreRange, CoreRangeSet> &core_spec, size_t offset, uint32_t value);void SetRuntimeArgs(const Program &program, KernelID kernel, const std::vector< CoreCoord > & core_spec, const std::vector< std::vector<uint32_t> > &runtime_args);
(NOTE: UpdateRuntimeArg is getting removed by next release as it’s use as been superseded by the other functions)
-
GetCircularBufferConfig now returns a const reference:
const CircularBufferConfig &GetCircularBufferConfig(Program &program, CircularBufferID cb_handle); -
Updating circular buffer config parameters are done through separate 3 functions:
void UpdateCircularBufferTotalSize(Program &program, CircularBufferID cb_handle, uint32_t total_size);void UpdateCircularBufferPageSize(Program &program, CircularBufferID cb_handle, uint8_t buffer_index, uint32_t page_size);void UpdateDynamicCircularBufferAddress(Program &program, CircularBufferID cb_handle, const Buffer &buffer);
-
Moved slow/host dispatch APIs to detail namespace:
void LaunchProgram(Device *device, Program &program);void ReadFromBuffer(const Buffer &buffer, std::vector<uint32_t> &host_buffer);void WriteToBuffer(const Buffer &buffer, const std::vector<uint32_t> &host_buffer);
Tools - Profiler
- Updating the path for all profiler artifacts to be under generated/profiler folder
ttNN
Infrastructure
- Introduced
ttnn.embeddingto facilitate word embeddings - Added
preprocess_parametersfor generic conversion of torch parameters with caching - Added
ttnn.experimental.gelu - Added
ttnn.experimental.layer_norm - Updated program hash to be
std::size_tand significantly sped up its computation
Operations
- Support for split tensor into two has support for tensor [W, Z, Y, X] shape along Y in addition to existing X.
- Support trunc function has fallback support equivalent to torch.trunc
- Support power function with exponent which is not integral:
tt_lib.tensor.power_fp() - Support for reshape operator on host for
ROW_MAJORlayout
Models
Notes not available.