-
Notifications
You must be signed in to change notification settings - Fork 347
Open
Labels
Description
This issue is part of the ongoing models-team project to revamp our CI testing:
- An effort to make tests easier to understand/contribute/fix
- Fill in any testing gaps in our tests
- Optimize the CI load (no redundant tests)
- And above all, make tests green.
Description
With the new 3-tier model system, we will need to ensure that some tier 1 models have their OP device perf tests up to date. To ensure this is done in the most clean way, we shall separate those tests into a new pipeline, since they require a tracy build, CSV generation, script manipulation and upstream data
Proposal
- Create new
[Arch] Model OP Device Perf Testspipelines - Create the tests for some tier 1 and add to the new pipeline
- Jobs must be split by model
- Different flavours of the same model (same codebase, different weights) should be split, E.g. Llama3-1B,3B,8B, etc.
- This ensures that if a model fails, it will not prevent the runs of the models that run after
- Different flavours of the same model (same codebase, different weights) should be split, E.g. Llama3-1B,3B,8B, etc.
- Models must be organised by a 3-tier system
- Tier 1, the most important models will always contain all the mandatory tests
- Tier 3, the least important, will just contain a demo without any performance validation, to ensure that the model still runs on the latest SW version
- Filter system for every tier and model
- A user should be able to quickly select to run the whole pipeline, or either a specific tier group of models, or either just a single model, based on needs.
- This ensures that CI won't be as clogged
- A user should be able to quickly select to run the whole pipeline, or either a specific tier group of models, or either just a single model, based on needs.
Progress
- Present the proposal to Infra team to get greenlight
- Create the new pipelines (Start with WH pipelines first, then move to BH)
- Create the relevant model tests
- Add tests to the pipeline in the correct tier
Relevant tests
- WH Single card https://github.com/tenstorrent/tt-metal/actions/workflows/perf-device-models.yaml
- For WH galaxy, we tend to have these tests somewhere in the model demos pipelines. Double check
Reactions are currently unavailable