[perf]feat: GPT-OSS mfu compute support #4750

mikequan0425 · 2025-12-31T03:23:33Z

What does this PR do?

Add the _estimate_gpt_oss_flops function to calculate the FLOPs of GPT OSS models, which supports the computation for standard attention, sliding window attention, and MoE layers.Update the test cases to verify the calculation accuracy of GPT OSS models.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: [perf] feat: mistral and gemma3_text mfu compute support #2622, [misc] add support for qwen3 model (dense/moe) #1409
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

The following is the output result of running the test file.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

Add the _estimate_gpt_oss_flops function to calculate the FLOPs of GPT OSS models, which supports the computation for standard attention, sliding window attention, and MoE layers.Update the test cases to verify the calculation accuracy of GPT OSS models.

…T-OSS model Revise comments and configuration parameters to more accurately reflect the FLOPs calculation of the GPT-OSS model Update configurations and comments in test cases to match the implementation logic

CLAassistant · 2025-12-31T03:23:42Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist

Code Review

This pull request adds support for calculating MFU for GPT-OSS models by introducing the _estimate_gpt_oss_flops function and corresponding test cases. The implementation for the new model appears correct. However, I found an issue in the updated test data for the existing gemma3_text model, where the seqlen_square_sum and subsequent FLOPs calculations seem incorrect for one of the test cases. I've provided a detailed comment and a suggestion to fix it. Please review this finding.

gemini-code-assist · 2025-12-31T03:25:50Z

tests/utils/test_flops_counter.py

+        # seqlen_square_sum: 1373634560 (calculated with sliding window logic)
+        # attn flops: 12 * 1373634560 * 256 * 16 = 67515029389312
+        # total: 1009005293453312 / 1e12 = 1009.005293453312
+        "expected_flops_tuple": (283517065887744 / 1e12, 1009005293453312 / 1e12),


There seems to be a miscalculation in the updated test values for gemma3_text for the [4096, 4096, 4096] batch case.

While the dense_flops value was correctly fixed, the seqlen_square_sum appears incorrect. My calculation shows it should be 905,969,664, which matches the original value in the file, not the new value of 1,373,634,560.

Here's a breakdown of the calculation for seqlen_square_sum:

The model has 48 layers, with a sliding window pattern of 6, resulting in 8 full-attention layers and 40 sliding-window layers.

The sliding window size is 1024.

For 40 sliding layers: 40 * (3 * 4096 * 1024) = 503,316,480

For 8 full layers: 8 * (3 * 4096 * 4096) = 402,653,184

Total seqlen_square_sum: 503,316,480 + 402,653,184 = 905,969,664.

This error propagates to attn_flops, total, and expected_flops_tuple. The suggested change below corrects these values.

Suggested change

# seqlen_square_sum: 1373634560 (calculated with sliding window logic)

# attn flops: 12 * 1373634560 * 256 * 16 = 67515029389312

# total: 1009005293453312 / 1e12 = 1009.005293453312

"expected_flops_tuple": (283517065887744 / 1e12, 1009005293453312 / 1e12),

# seqlen_square_sum: 905969664 (calculated with sliding window logic)

# attn flops: 12 * 905969664 * 256 * 16 = 44530220924928

# total: 941490264064000 + 44530220924928 = 986020485000000

"expected_flops_tuple": (283517065887744 / 1e12, 986020485000000 / 1e12),

mikequan0425 added 2 commits December 30, 2025 17:48

refactor(flops_counter): Update the FLOPs calculation logic of the GP…

7aceb9b

…T-OSS model Revise comments and configuration parameters to more accurately reflect the FLOPs calculation of the GPT-OSS model Update configurations and comments in test cases to match the implementation logic

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

tardis-key mentioned this pull request Dec 31, 2025

Qwen3-VL Model MFU #4748

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[perf]feat: GPT-OSS mfu compute support #4750

[perf]feat: GPT-OSS mfu compute support #4750

mikequan0425 commented Dec 31, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Dec 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[perf]feat: GPT-OSS mfu compute support #4750

Are you sure you want to change the base?

[perf]feat: GPT-OSS mfu compute support #4750

Conversation

mikequan0425 commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

Checklist Before Submitting

Uh oh!

CLAassistant commented Dec 31, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikequan0425 commented Dec 31, 2025 •

edited

Loading