Skip to content

Conversation

@lanchongyizu
Copy link

@lanchongyizu lanchongyizu commented Nov 2, 2025

Ticket

#1045

Problem description

See the details in the above ticket.

What's changed

* Add disable_load_params_once option for model testing since Falcon 7B
  is incompatible with load_params_once which causes OOM.
* Convert following torch ops to ttnn ops:
    {
      "op_name": "aten.squeeze.default",
      "op_schema": "Tensor<[1, 7]> self = ?"
    },
    {
      "op_name": "aten.native_layer_norm.default",
      "op_schema": "Tensor<[1, 7, 4544]> input = ?,<br>List[int] normalized_shape = [4544],<br>Optional[Tensor]<[4544]> weight = ?,<br>Optional[Tensor]<[4544]> bias = ?,<br>float eps = 1e-05"
    },
    {
      "op_name": "aten.index.Tensor",
      "op_schema": "Tensor<[1, 7, 73, 64]> self = ?,<br>List[Optional[Tensor]] indices = [None, None, _folded_lift_fresh_copy]"
    },
    {
      "op_name": "aten.mul.Scalar",
      "op_schema": "Tensor<[1, 71, 7, 64]> self = ?,<br>number other = 0.3535533905932738"
    },
* Enable batch_size 32 for Falcon 7B E2E testing

OOM issue:
As mentioned by @kevinwuTT in another issue, Falcon 7B is incompatible with load_params_once. After load_params_once is disabled, OOM issue disappears for both batchsize 1 and 32.

Model Status Batch Compiled First Run (ms) Original Throughput (Inferences Per Second) Compiled Throughput (Inferences Per Second) Accuracy (%) Torch Ops Before (Unique Ops) Torch Ops Remain (Unique Ops) To/From Device Ops
Falcon-7B 1 44009.8 0.384783 0.13878 98 2600 (27) 0 (0) 193
Falcon-7B 32 50098.8 5.07062 4.27947 98 2696 (27) 0 (0) 193

@lanchongyizu lanchongyizu force-pushed the falcon_7b branch 5 times, most recently from 6b12a16 to be97d01 Compare November 2, 2025 13:01
@marty1885
Copy link

marty1885 commented Nov 7, 2025

Ping. @ayerofieiev-tt if you can get someone to look at this one

@marty1885 marty1885 linked an issue Nov 18, 2025 that may be closed by this pull request
Copy link
Collaborator

@jmalone-tt jmalone-tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look reasonable. Happy to merge if tests pass (kicked off a run here: https://github.com/tenstorrent/pytorch2.0_ttnn/actions/runs/19867910005)

Copy link
Collaborator

@jmalone-tt jmalone-tt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it still hits OOM in the latest run. Have you verified that it runs on your end?
https://github.com/tenstorrent/pytorch2.0_ttnn/actions/runs/19867910005/job/57557971201

@lanchongyizu
Copy link
Author

Looks like it still hits OOM in the latest run. Have you verified that it runs on your end? https://github.com/tenstorrent/pytorch2.0_ttnn/actions/runs/19867910005/job/57557971201

Sure. As mentioned in #1265 (comment), Falcon 7B is incompatible with load_params_once, so please add --disable_load_params_once in the pytest command line as below.

python3 -m pytest --github-report tests/models/falcon/test_falcon.py --disable_load_params_once --report_nth_iteration=$num_iterations --export_code=accuracy --splits 57 --group 10 -s

@jmalone-tt
Copy link
Collaborator

We just merged another PR that was hitting the same issue - if you look at tests/models/mamba/test_mamba, you should see a new pytest fixture load_params_once. Please rebase your changes and add that fixture, and if tests pass we should be good to merge

* Add disable_load_params_once option for model testing since Falcon 7B
  is incompatible with load_params_once which causes OOM.
* Convert following torch ops to ttnn ops:
    {
      "op_name": "aten.squeeze.default",
      "op_schema": "Tensor<[1, 7]> self = ?"
    },
    {
      "op_name": "aten.native_layer_norm.default",
      "op_schema": "Tensor<[1, 7, 4544]> input = ?,<br>List[int] normalized_shape = [4544],<br>Optional[Tensor]<[4544]> weight = ?,<br>Optional[Tensor]<[4544]> bias = ?,<br>float eps = 1e-05"
    },
    {
      "op_name": "aten.index.Tensor",
      "op_schema": "Tensor<[1, 7, 73, 64]> self = ?,<br>List[Optional[Tensor]] indices = [None, None, _folded_lift_fresh_copy]"
    },
    {
      "op_name": "aten.mul.Scalar",
      "op_schema": "Tensor<[1, 71, 7, 64]> self = ?,<br>number other = 0.3535533905932738"
    },
* Enable batch_size 32 for Falcon 7B E2E testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bounty $1500] Get Falcon running E2E

3 participants