GitHub · Where software is built

Milestones

birng
No due date
0% complete0 open 0 closed
[FFE] Blackhole
No due date
•1/1 issues closed
100% complete0 open 1 closed
[Multi Device 1] [Stable v0.1.0]
Due by May 31, 2025
0% complete0 open 0 closed
[Performance] ResNet at 10%
Getting resnet50 to 10% of metal handtuned peak performance. About 470 FPS for wormhole.
Due by June 28, 2025
•4/5 issues closed
80% complete1 open 4 closed
[FFE] Model Generality (model ops issues)
Coordination layer responsible for identifying, delegating, and tracking of problematic ops within the compiler
No due date
•55/64 issues closed
85% complete9 open 55 closed
[FFE] Runtime v1
With previous milestones we've implemented basic runtime functionality, enabling running both inference and training. The goal for this milestone is to redesign and refactor the existing runtime into a more robust and performant solution. Some of the specific goals in mind: - support for runtime stitching, i.e. leaving tensors on device and reusing them when executing next iteration of a program or as inputs to a different program - introduce our tensor, which will be used to abstract host/device tensors and also track info related to running training loops - move more of the code to C++ - add sanity functional tests targeting specific features
No due date
•3/11 issues closed
27% complete8 open 3 closed
[FFE] Priority Bringup
No due date
•29/115 issues closed
25% complete86 open 29 closed
[Training] NERF training
No due date
•1/2 issues closed
50% complete1 open 1 closed
[FFE] TVM Cleanup & Version uplift to latest RELAY
This milestone focuses on cleaning up the TVM repository by removing `forge-fe` compile logic and shifting it back to `forge-fe`. This decoupling will enable us to uplift TVM to the latest version that still supports RELAY, as the current main branch has deprecated RELAY in favor of RELAX (Relay Next). This is a preparatory step for a future transition from RELAY to RELAX, ensuring a smoother migration path while maintaining compatibility in the interim.
Due by March 15, 2025
•0/4 issues closed
0% complete4 open 0 closed
[Training] Core functionality
No due date
0% complete0 open 0 closed
[FFE - E2E] LLaVA
No due date
•2/2 issues closed
100% complete0 open 2 closed
[FFE - E2E] stable_diffusion_v1.4
No due date
•1/1 issues closed
100% complete0 open 1 closed
[FFE - E2E] ViT
No due date
•8/8 issues closed
100% complete0 open 8 closed
[FFE] Op Unification
No due date
0% complete0 open 0 closed
[FFE] Automatic model analysis
No due date
•40/47 issues closed
85% complete7 open 40 closed
[FFE - E2E] Llama 3.2 1B
No due date
•18/18 issues closed
100% complete0 open 18 closed
[FFE] Bringup
No due date
•218/261 issues closed
83% complete43 open 218 closed
[Training] LoRA training
Single-chip training a Llama-like small LLM using LoRA technique
No due date
•2/2 issues closed
100% complete0 open 2 closed
[Training] Llama bwd pass
Backward pass through a llama model. Openllama 3B or Llama3.2 1B - TBD.
No due date
•4/4 issues closed
100% complete0 open 4 closed
[Training] tooling
No due date
•2/2 issues closed
100% complete0 open 2 closed
[Training] MNIST 2 (E2E on device)
No due date
•12/13 issues closed
92% complete1 open 12 closed
[Training] MNIST 1 (fwd and bwd on device)
No due date
•10/10 issues closed
100% complete0 open 10 closed
[FFE - E2E] ResNet 50
Core operations support is required for the ResNet 50 model. List of ops that are currently lowered through tt-forge (up to emit to TTIR): - add - already supported - hstack - potentially not needed - matmul - potentially not needed - narrow - potentially not needed - pad_tile - potentially not needed - reduce_avg - required support on Forge, MLIR has it - reduce_max - required support on Forge and MLIR - relu - required support on Forge, MLIR has it - sparse_matmul - potentially not needed - squeeze - potentially not needed, used after avg poll to reduce dim to the appropriate output - transpose - Currently WIP - vslice - potentially not needed Most of the ops are coming from conv2d decompositions. Therefore, bringup for them is probably redundant. Some of them are: - hstack - matmul - narrow - pad_tile - vslice
No due date
•16/16 issues closed
100% complete0 open 16 closed
[FFE - E2E] Open Llama 3B
Core operations support is required for the Llama 3B model. List of ops that are currently lowered through tt-forge (up to emit to TTIR) - Add - Already supported e2e - Concatenate - Required support on Forge and MLIR - Embedding - Required support on Forge and MLIR - Hslice - Should be removed from the model - Hstack - Should be removed from the model - Matmul - Required support on Forge, MLIR has it - Multiply - Already supported e2e - Narrow - Required via reshape op for both Forge and MLIR - Pad_tile - Potentially redundant - Reciprocal - Required support on Forge and MLIR - Reduce_avg - Required support on Forge, MLIR has it - Sigmoid - Required support on Forge and MLIR - Softmax - Already supported e2e - Sparse_matmul - Should be removed from the model - Sqrt - Required support on Forge and MLIR - Squeeze - Required via reshape op for both Forge and MLIR - Tile_broadcast - Potentially redundant - Transpose - Currently WIP - Unsqueeze - Required via reshape op for both Forge and MLIR Also, some of the basic Llama 3B building blocks that should be supported: - Embeddings - Self-attention - MLP - RMS Norm - LM head
No due date
•45/50 issues closed
90% complete5 open 45 closed
[Inference] - Model Bringup: Initial Llama 3B planning
Inference bringup of Llama model (exact variant TBD). After the initial inference PoC with the linear MNIST model, the next step is to provide PoC for one of the larger NLP/Vision models. Out of the list of proposed models: - Llama 3B - Falcon 7B - T5 - ResNet 50 - ViT potential next candidate is the LLM generative model, *Llama*. The main reasons for choosing this model are: - General points: - Good baseline for building other transformer-based LLM generative models after this PoC (common decoder-based architecture) - Availability of smaller model size - 3B/8B parameters. More flexibility around initial single-chip bringup - Contains valid manual tt-metal-based implementation ([ref link](https://github.com/tenstorrent/tt-metal/tree/main/models/demos/wormhole/llama31_8b)) as working PoC - Compared to other models: - Compared to T5, Llama is decoder-based architecture (T5 is encoder-decoder based). This makes it simpler/quicker to develop as the first generative model as it doesn't have many encoder layers, additional cross-attention block we need to run - Compared to Falcon, it has more streamlined shapes that are easier to handle for our backend. E.g. it doesn't contain (if any) problematic prime number dimensions. - Also compared to Falson, it uses standard positional embeddings (Falcon uses rotary embeddings), which makes our development of PoC a bit simpler. - Compared to ResNet and ViT, Llama doesn't contain any convolution layers, which removes one more constraints that we need to care about during the first generative model bringup All in all, the main goal for the initial PoC is to reduce unwanted complexity and focus on the main building blocks for generative models. Ofc, we need to keep in mind flexibility in order to support other components that are part of more complex models. _Note:_ Exact variant is still to be determined. Two candidates are: - [Llama v2 3B](https://huggingface.co/openlm-research/open_llama_3b_v2) - [Llama v3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) - has WH B0 single-chip support on [tt-metal](https://github.com/tenstorrent/tt-metal/tree/main/models/demos/wormhole/llama31_8b)
No due date
•4/4 issues closed
100% complete0 open 4 closed
[Inference - 2] - Testing Setup
Bootstrap e2e testing infrastructure and setup. For each new model we bring up, we should have stable e2e testing through the whole compiler and metal backend. We should run only op/models/etc. that are supported in each compiler and backend component.
No due date
•2/2 issues closed
100% complete0 open 2 closed
[FFE - 1] - Graph Optimizations - Exploration
TT-Forge has a bunch of optimization steps that are based on the older backend. Therefore, we should plan corresponding changes depending on model priorities/requirements. I.e. during model bringup, when we hit a certain issue with a defined optimization pass, we should have context around which pass is expected to go where.
No due date
0% complete0 open 0 closed
[TT-Forge - 0] - Testing Setup
Bootstrap tt-forge testing infrastructure and setup. Goal of this milestone is to define how tt-forge tests should be defined and organized, as well as to create an initial setup for key components. Few samples of key sections are: - Component tests (cover tt-forge as a standalone component): - Op-specific tests (e.g. matmul) - Feature focus tests (e.g. data formats) - Model-based tests (e.g. MNIST Linear) _Note:_ E2E tests (cover functionality of other tt-forge related components; e.g. tt-mlir) should be part of Inference vertical as they’ll be used for broader testing, releasing, uplifting, etc.
No due date
•6/6 issues closed
100% complete0 open 6 closed
[FFE - 1] - Documentation Setup
Bootstrap tt-forge docs. Goal of this milestone is to set up the initial version of tt-forge documentation that will help internal teams to quickly onboard with the new Forge compiler. Doc topics include (but are not limited to): - Build/Setup - First steps for running ops/models e2e - Quick links to the main dependencies components and corresponding docs
No due date
•4/4 issues closed
100% complete0 open 4 closed
[FFE - 0] - Open source GitHub repo
Focus on moving from private to public GitHub repo for the tt-forge project.
No due date
•17/17 issues closed
100% complete0 open 17 closed
[TT-Forge - 1] - M0 E2E Training - Linear model
Get the training basics built out.
No due date
•9/9 issues closed
100% complete0 open 9 closed
[TT-Forge] - MNIST E2E Inference
Get inference working e2e on MNIST.
No due date
•10/10 issues closed
100% complete0 open 10 closed