[WebGPU] Add --enable-subgroups flag for optional subgroup support by ksgr5566 · Pull Request #3431 · mlc-ai/mlc-llm

ksgr5566 · 2026-02-24T10:51:39Z

Summary

Adds --enable-subgroups CLI flag to mlc_llm compile command, enabling WebGPU subgroup shuffle primitives on supported devices.

Changes

Add --enable-subgroups argument to compile.py
Pass flag through to TVM target via _apply_webgpu_subgroups()
Sets supports_subgroups boolean on target dict for TVM to handle

Details

When flag is enabled, TVM's target parser automatically sets thread_warp_size=32
(default is 1), which triggers warp reduction lowering. Without flag, reductions
use shared memory barriers (universally compatible).

Testing

Baseline: mlc_llm compile ... --device webgpu (no subgroups)
Enabled: mlc_llm compile ... --device webgpu --enable-subgroups (with subgroups)

Tested with Llama-3.2-1B-q4f16_1. Baseline (no flag) uses shared memory reductions;
with flag, generates subgroupShuffle* ops.
Both the generated WGSLs here: https://gist.github.com/ksgr5566/301664a5dda3e46f44092be4d09b2d4f

gemini-code-assist · 2026-02-24T10:52:06Z

Summary of Changes

Hello @ksgr5566, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new --enable-subgroups CLI flag to the mlc_llm compile command, allowing users to explicitly enable WebGPU subgroup shuffle primitives during compilation. This enhancement aims to improve performance on compatible WebGPU devices by leveraging warp reduction lowering in TVM, which otherwise defaults to shared memory barriers for broader compatibility.

Highlights

CLI Flag Addition: Introduced a new --enable-subgroups command-line interface flag to the mlc_llm compile command.
WebGPU Subgroup Support: Enabled WebGPU subgroup shuffle primitives on supported devices when the --enable-subgroups flag is used.
TVM Integration: Integrated the flag into the TVM target detection logic, setting supports_subgroups and triggering warp reduction lowering for WebGPU targets.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/mlc_llm/cli/compile.py
- Added --enable-subgroups argument to the command-line parser.
- Modified the call to detect_target_and_host to pass the enable_subgroups argument.
python/mlc_llm/interface/help.py
- Added a new help entry for the --enable-subgroups flag, explaining its purpose for WebGPU targets.
python/mlc_llm/support/auto_target.py
- Updated the detect_target_and_host function signature to accept an enable_subgroups boolean.
- Introduced a new private helper function _apply_webgpu_subgroups to conditionally set the supports_subgroups property on the TVM target for WebGPU.
- Integrated the call to _apply_webgpu_subgroups within detect_target_and_host.

Activity

No human activity detected in the pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new --enable-subgroups command-line flag to allow enabling WebGPU subgroup shuffle primitives. The changes are well-implemented, adding the argument to the CLI, updating help text, and correctly passing the flag to the TVM target configuration. My review includes one suggestion to simplify the logic for applying this new setting, making it more concise and focused on the enabled case.

python/mlc_llm/support/auto_target.py

Copilot

Pull request overview

This pull request adds optional WebGPU subgroup support to the MLC LLM compilation pipeline through a new --enable-subgroups CLI flag. When enabled for WebGPU targets, this flag sets the supports_subgroups attribute on the TVM target, which triggers TVM to use subgroup shuffle primitives instead of shared memory barriers for reductions.

Changes:

Add --enable-subgroups CLI argument to the compile command
Thread the flag through detect_target_and_host() to apply WebGPU-specific target modifications
Add help documentation for the new flag

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
python/mlc_llm/cli/compile.py	Adds `--enable-subgroups` argument and passes it to `detect_target_and_host()`
python/mlc_llm/interface/help.py	Adds help text explaining the new flag's purpose
python/mlc_llm/support/auto_target.py	Extends `detect_target_and_host()` signature and implements `_apply_webgpu_subgroups()` helper

Comments suppressed due to low confidence (1)

python/mlc_llm/support/auto_target.py:46

The docstring is missing documentation for the new enable_subgroups parameter. Please add a Parameters section entry describing this parameter, its type (Optional[bool]), default value (None), and what it does (e.g., "Whether to enable WebGPU subgroup support. When True, sets supports_subgroups on WebGPU targets. When None, no modification is made to the target.").

    """Detect the configuration for the target device and its host, for example, target GPU and
    the host CPU.

    Parameters
    ----------
    target_hint : str
        The hint for the target device.

    host_hint : str
        The hint for the host CPU, default is "auto".
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/mlc_llm/support/auto_target.py

MasterJH5574 · 2026-02-25T04:58:26Z

Hi @ksgr5566 thanks! The WASM CI is failing https://ci.mlc.ai/blue/organizations/jenkins/mlc-llm/detail/PR-3431/1/pipeline, could you check the failures?

Also cc @akaashrp for visibility.

ksgr5566 · 2026-02-25T08:46:42Z

Hi @ksgr5566 thanks! The WASM CI is failing https://ci.mlc.ai/blue/organizations/jenkins/mlc-llm/detail/PR-3431/1/pipeline, could you check the failures?

Also cc @akaashrp for visibility.

Hi @MasterJH5574, this would require TVM version change. I have raised the TVM PR to Charlie's subgroup PR branch (CharlieFRuan/tvm#1). If that is done, the version can be updated here and the build would succeed.

MasterJH5574 · 2026-02-25T14:39:44Z

@ksgr5566 I see. Should we upstream to https://github.com/apache/tvm instead of Charlie's fork?

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

akaashrp · 2026-03-11T06:46:06Z

python/mlc_llm/support/auto_target.py

    return target, build_func


+def _apply_webgpu_subgroups(target: Target, enable_subgroups: Optional[bool]) -> Target:


Is it possible to add a test for this behavior (enable_subgroups flag causes target_dict to have a supports_subgroups key that is set to True)?

Also non-webgpu target + enable_subgroups=True leads to unchanged target, and any target + enable_subgroups = False leads to unchanged target

[WebGPU] Add --enable-subgroups flag for subgroup shuffle support

77e6025

Copilot AI review requested due to automatic review settings February 24, 2026 10:51

ksgr5566 mentioned this pull request Feb 24, 2026

[WebGPU] Add gating logic for subgroup shuffle primitives CharlieFRuan/tvm#1

Open

Copilot started reviewing on behalf of ksgr5566 February 24, 2026 10:52 View session

gemini-code-assist bot reviewed Feb 24, 2026

View reviewed changes

python/mlc_llm/support/auto_target.py Show resolved Hide resolved

Copilot AI reviewed Feb 24, 2026

View reviewed changes

python/mlc_llm/support/auto_target.py Outdated Show resolved Hide resolved

ksgr5566 mentioned this pull request Feb 25, 2026

[WebGPU] Add gating logic for subgroup shuffle primitives apache/tvm#18823

Open

ksgr5566 and others added 5 commits February 25, 2026 15:57

Merge remote-tracking branch 'origin/main' into enable-subgroups

00c44c3

Merge remote-tracking branch 'origin/main' into enable-subgroups

78b6cdb

Merge remote-tracking branch 'origin/main' into enable-subgroups

07f113f

Merge remote-tracking branch 'origin/main' into enable-subgroups

3ba3d84

Fix _apply_webgpu_subgroups function

a042453

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

akaashrp requested changes Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WebGPU] Add --enable-subgroups flag for optional subgroup support#3431

[WebGPU] Add --enable-subgroups flag for optional subgroup support#3431
ksgr5566 wants to merge 6 commits intomlc-ai:mainfrom
ksgr5566:enable-subgroups

ksgr5566 commented Feb 24, 2026

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

MasterJH5574 commented Feb 25, 2026

Uh oh!

ksgr5566 commented Feb 25, 2026 •

edited

Loading

Uh oh!

MasterJH5574 commented Feb 25, 2026

Uh oh!

akaashrp Mar 11, 2026

Uh oh!

akaashrp Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return target, build_func


		def _apply_webgpu_subgroups(target: Target, enable_subgroups: Optional[bool]) -> Target:

Conversation

ksgr5566 commented Feb 24, 2026

Summary

Changes

Details

Related

Testing

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

MasterJH5574 commented Feb 25, 2026

Uh oh!

ksgr5566 commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MasterJH5574 commented Feb 25, 2026

Uh oh!

akaashrp Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

akaashrp Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ksgr5566 commented Feb 25, 2026 •

edited

Loading