Skip to content

v0.37.0

Choose a tag to compare

@github-actions github-actions released this 17 Nov 22:42
· 18923 commits to main since this release

Metal

API Changes

  • Top-level API to create a Program:
    Program CreateProgram();

  • GetRuntimeArgs now returns a reference to underlying runtime args to allow for in-place updates. This results in noticeably better performance for host-bound workloads:
    std::vector<uint32_t>& GetRuntimeArgs(const Program &program, KernelID kernel_id, const CoreCoord &logical_core);

  • Two other variants of updating runtime arguments that results in better host-side performance in certain situations:

    • void UpdateRuntimeArg(const Program &program, KernelID kernel, const std::variant<CoreCoord, CoreRange, CoreRangeSet> &core_spec, size_t offset, uint32_t value);
    • void SetRuntimeArgs(const Program &program, KernelID kernel, const std::vector< CoreCoord > & core_spec, const std::vector< std::vector<uint32_t> > &runtime_args);

    (NOTE: UpdateRuntimeArg is getting removed by next release as it’s use as been superseded by the other functions)

  • GetCircularBufferConfig now returns a const reference: const CircularBufferConfig &GetCircularBufferConfig(Program &program, CircularBufferID cb_handle);

  • Updating circular buffer config parameters are done through separate 3 functions:

    • void UpdateCircularBufferTotalSize(Program &program, CircularBufferID cb_handle, uint32_t total_size);
    • void UpdateCircularBufferPageSize(Program &program, CircularBufferID cb_handle, uint8_t buffer_index, uint32_t page_size);
    • void UpdateDynamicCircularBufferAddress(Program &program, CircularBufferID cb_handle, const Buffer &buffer);
  • Moved slow/host dispatch APIs to detail namespace:

    • void LaunchProgram(Device *device, Program &program);
    • void ReadFromBuffer(const Buffer &buffer, std::vector<uint32_t> &host_buffer);
    • void WriteToBuffer(const Buffer &buffer, const std::vector<uint32_t> &host_buffer);

Tools - Profiler

  • Updating the path for all profiler artifacts to be under generated/profiler folder

ttNN

Infrastructure

  • Introduced ttnn.embedding to facilitate word embeddings
  • Added preprocess_parameters for generic conversion of torch parameters with caching
  • Added ttnn.experimental.gelu
  • Added ttnn.experimental.layer_norm
  • Updated program hash to be std::size_t and significantly sped up its computation

Operations

  • Support for split tensor into two has support for tensor [W, Z, Y, X] shape along Y in addition to existing X.
  • Support trunc function has fallback support equivalent to torch.trunc
  • Support power function with exponent which is not integral: tt_lib.tensor.power_fp()
  • Support for reshape operator on host for ROW_MAJOR layout

Models

Notes not available.