List view
- No due date
- No due date•1/1 issues closed
- Due by May 31, 2025
Getting resnet50 to 10% of metal handtuned peak performance. About 470 FPS for wormhole.
Due by June 28, 2025•4/5 issues closedCoordination layer responsible for identifying, delegating, and tracking of problematic ops within the compiler
No due date•55/64 issues closedWith previous milestones we've implemented basic runtime functionality, enabling running both inference and training. The goal for this milestone is to redesign and refactor the existing runtime into a more robust and performant solution. Some of the specific goals in mind: - support for runtime stitching, i.e. leaving tensors on device and reusing them when executing next iteration of a program or as inputs to a different program - introduce our tensor, which will be used to abstract host/device tensors and also track info related to running training loops - move more of the code to C++ - add sanity functional tests targeting specific features
No due date•3/11 issues closed- No due date•29/115 issues closed
- No due date•1/2 issues closed
This milestone focuses on cleaning up the TVM repository by removing `forge-fe` compile logic and shifting it back to `forge-fe`. This decoupling will enable us to uplift TVM to the latest version that still supports RELAY, as the current main branch has deprecated RELAY in favor of RELAX (Relay Next). This is a preparatory step for a future transition from RELAY to RELAX, ensuring a smoother migration path while maintaining compatibility in the interim.
Due by March 15, 2025•0/4 issues closed- No due date
- No due date•2/2 issues closed
- No due date•1/1 issues closed
- No due date•8/8 issues closed
- No due date
- No due date•40/47 issues closed
- No due date•18/18 issues closed
- No due date•218/261 issues closed
Single-chip training a Llama-like small LLM using LoRA technique
No due date•2/2 issues closedBackward pass through a llama model. Openllama 3B or Llama3.2 1B - TBD.
No due date•4/4 issues closed- No due date•2/2 issues closed
- No due date•12/13 issues closed
- No due date•10/10 issues closed
Core operations support is required for the ResNet 50 model. List of ops that are currently lowered through tt-forge (up to emit to TTIR): - add - already supported - hstack - potentially not needed - matmul - potentially not needed - narrow - potentially not needed - pad_tile - potentially not needed - reduce_avg - required support on Forge, MLIR has it - reduce_max - required support on Forge and MLIR - relu - required support on Forge, MLIR has it - sparse_matmul - potentially not needed - squeeze - potentially not needed, used after avg poll to reduce dim to the appropriate output - transpose - Currently WIP - vslice - potentially not needed Most of the ops are coming from conv2d decompositions. Therefore, bringup for them is probably redundant. Some of them are: - hstack - matmul - narrow - pad_tile - vslice
No due date•16/16 issues closedCore operations support is required for the Llama 3B model. List of ops that are currently lowered through tt-forge (up to emit to TTIR) - Add - Already supported e2e - Concatenate - Required support on Forge and MLIR - Embedding - Required support on Forge and MLIR - Hslice - Should be removed from the model - Hstack - Should be removed from the model - Matmul - Required support on Forge, MLIR has it - Multiply - Already supported e2e - Narrow - Required via reshape op for both Forge and MLIR - Pad_tile - Potentially redundant - Reciprocal - Required support on Forge and MLIR - Reduce_avg - Required support on Forge, MLIR has it - Sigmoid - Required support on Forge and MLIR - Softmax - Already supported e2e - Sparse_matmul - Should be removed from the model - Sqrt - Required support on Forge and MLIR - Squeeze - Required via reshape op for both Forge and MLIR - Tile_broadcast - Potentially redundant - Transpose - Currently WIP - Unsqueeze - Required via reshape op for both Forge and MLIR Also, some of the basic Llama 3B building blocks that should be supported: - Embeddings - Self-attention - MLP - RMS Norm - LM head
No due date•45/50 issues closedInference bringup of Llama model (exact variant TBD). After the initial inference PoC with the linear MNIST model, the next step is to provide PoC for one of the larger NLP/Vision models. Out of the list of proposed models: - Llama 3B - Falcon 7B - T5 - ResNet 50 - ViT potential next candidate is the LLM generative model, *Llama*. The main reasons for choosing this model are: - General points: - Good baseline for building other transformer-based LLM generative models after this PoC (common decoder-based architecture) - Availability of smaller model size - 3B/8B parameters. More flexibility around initial single-chip bringup - Contains valid manual tt-metal-based implementation ([ref link](https://github.com/tenstorrent/tt-metal/tree/main/models/demos/wormhole/llama31_8b)) as working PoC - Compared to other models: - Compared to T5, Llama is decoder-based architecture (T5 is encoder-decoder based). This makes it simpler/quicker to develop as the first generative model as it doesn't have many encoder layers, additional cross-attention block we need to run - Compared to Falcon, it has more streamlined shapes that are easier to handle for our backend. E.g. it doesn't contain (if any) problematic prime number dimensions. - Also compared to Falson, it uses standard positional embeddings (Falcon uses rotary embeddings), which makes our development of PoC a bit simpler. - Compared to ResNet and ViT, Llama doesn't contain any convolution layers, which removes one more constraints that we need to care about during the first generative model bringup All in all, the main goal for the initial PoC is to reduce unwanted complexity and focus on the main building blocks for generative models. Ofc, we need to keep in mind flexibility in order to support other components that are part of more complex models. _Note:_ Exact variant is still to be determined. Two candidates are: - [Llama v2 3B](https://huggingface.co/openlm-research/open_llama_3b_v2) - [Llama v3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) - has WH B0 single-chip support on [tt-metal](https://github.com/tenstorrent/tt-metal/tree/main/models/demos/wormhole/llama31_8b)
No due date•4/4 issues closedBootstrap e2e testing infrastructure and setup. For each new model we bring up, we should have stable e2e testing through the whole compiler and metal backend. We should run only op/models/etc. that are supported in each compiler and backend component.
No due date•2/2 issues closedTT-Forge has a bunch of optimization steps that are based on the older backend. Therefore, we should plan corresponding changes depending on model priorities/requirements. I.e. during model bringup, when we hit a certain issue with a defined optimization pass, we should have context around which pass is expected to go where.
No due dateBootstrap tt-forge testing infrastructure and setup. Goal of this milestone is to define how tt-forge tests should be defined and organized, as well as to create an initial setup for key components. Few samples of key sections are: - Component tests (cover tt-forge as a standalone component): - Op-specific tests (e.g. matmul) - Feature focus tests (e.g. data formats) - Model-based tests (e.g. MNIST Linear) _Note:_ E2E tests (cover functionality of other tt-forge related components; e.g. tt-mlir) should be part of Inference vertical as they’ll be used for broader testing, releasing, uplifting, etc.
No due date•6/6 issues closedBootstrap tt-forge docs. Goal of this milestone is to set up the initial version of tt-forge documentation that will help internal teams to quickly onboard with the new Forge compiler. Doc topics include (but are not limited to): - Build/Setup - First steps for running ops/models e2e - Quick links to the main dependencies components and corresponding docs
No due date•4/4 issues closedFocus on moving from private to public GitHub repo for the tt-forge project.
No due date•17/17 issues closedGet the training basics built out.
No due date•9/9 issues closedGet inference working e2e on MNIST.
No due date•10/10 issues closed