Skip to content

[WebGPU] Add --enable-subgroups flag for optional subgroup support#3431

Open
ksgr5566 wants to merge 6 commits intomlc-ai:mainfrom
ksgr5566:enable-subgroups
Open

[WebGPU] Add --enable-subgroups flag for optional subgroup support#3431
ksgr5566 wants to merge 6 commits intomlc-ai:mainfrom
ksgr5566:enable-subgroups

Conversation

@ksgr5566
Copy link

Summary

Adds --enable-subgroups CLI flag to mlc_llm compile command, enabling WebGPU subgroup shuffle primitives on supported devices.

Changes

  • Add --enable-subgroups argument to compile.py
  • Pass flag through to TVM target via _apply_webgpu_subgroups()
  • Sets supports_subgroups boolean on target dict for TVM to handle

Details

When flag is enabled, TVM's target parser automatically sets thread_warp_size=32
(default is 1), which triggers warp reduction lowering. Without flag, reductions
use shared memory barriers (universally compatible).

Related

Companion to TVM PR (CharlieFRuan/tvm#1) gating logic. Works with both baseline (no flag)
and subgroup-enabled (with flag) WASM variants.

Testing

  • Baseline: mlc_llm compile ... --device webgpu (no subgroups)
  • Enabled: mlc_llm compile ... --device webgpu --enable-subgroups (with subgroups)

Tested with Llama-3.2-1B-q4f16_1. Baseline (no flag) uses shared memory reductions;
with flag, generates subgroupShuffle* ops.
Both the generated WGSLs here: https://gist.github.com/ksgr5566/301664a5dda3e46f44092be4d09b2d4f

@gemini-code-assist
Copy link

Summary of Changes

Hello @ksgr5566, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new --enable-subgroups CLI flag to the mlc_llm compile command, allowing users to explicitly enable WebGPU subgroup shuffle primitives during compilation. This enhancement aims to improve performance on compatible WebGPU devices by leveraging warp reduction lowering in TVM, which otherwise defaults to shared memory barriers for broader compatibility.

Highlights

  • CLI Flag Addition: Introduced a new --enable-subgroups command-line interface flag to the mlc_llm compile command.
  • WebGPU Subgroup Support: Enabled WebGPU subgroup shuffle primitives on supported devices when the --enable-subgroups flag is used.
  • TVM Integration: Integrated the flag into the TVM target detection logic, setting supports_subgroups and triggering warp reduction lowering for WebGPU targets.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/mlc_llm/cli/compile.py
    • Added --enable-subgroups argument to the command-line parser.
    • Modified the call to detect_target_and_host to pass the enable_subgroups argument.
  • python/mlc_llm/interface/help.py
    • Added a new help entry for the --enable-subgroups flag, explaining its purpose for WebGPU targets.
  • python/mlc_llm/support/auto_target.py
    • Updated the detect_target_and_host function signature to accept an enable_subgroups boolean.
    • Introduced a new private helper function _apply_webgpu_subgroups to conditionally set the supports_subgroups property on the TVM target for WebGPU.
    • Integrated the call to _apply_webgpu_subgroups within detect_target_and_host.
Activity
  • No human activity detected in the pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new --enable-subgroups command-line flag to allow enabling WebGPU subgroup shuffle primitives. The changes are well-implemented, adding the argument to the CLI, updating help text, and correctly passing the flag to the TVM target configuration. My review includes one suggestion to simplify the logic for applying this new setting, making it more concise and focused on the enabled case.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds optional WebGPU subgroup support to the MLC LLM compilation pipeline through a new --enable-subgroups CLI flag. When enabled for WebGPU targets, this flag sets the supports_subgroups attribute on the TVM target, which triggers TVM to use subgroup shuffle primitives instead of shared memory barriers for reductions.

Changes:

  • Add --enable-subgroups CLI argument to the compile command
  • Thread the flag through detect_target_and_host() to apply WebGPU-specific target modifications
  • Add help documentation for the new flag

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
python/mlc_llm/cli/compile.py Adds --enable-subgroups argument and passes it to detect_target_and_host()
python/mlc_llm/interface/help.py Adds help text explaining the new flag's purpose
python/mlc_llm/support/auto_target.py Extends detect_target_and_host() signature and implements _apply_webgpu_subgroups() helper
Comments suppressed due to low confidence (1)

python/mlc_llm/support/auto_target.py:46

  • The docstring is missing documentation for the new enable_subgroups parameter. Please add a Parameters section entry describing this parameter, its type (Optional[bool]), default value (None), and what it does (e.g., "Whether to enable WebGPU subgroup support. When True, sets supports_subgroups on WebGPU targets. When None, no modification is made to the target.").
    """Detect the configuration for the target device and its host, for example, target GPU and
    the host CPU.

    Parameters
    ----------
    target_hint : str
        The hint for the target device.

    host_hint : str
        The hint for the host CPU, default is "auto".
    """

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@MasterJH5574
Copy link
Member

Hi @ksgr5566 thanks! The WASM CI is failing https://ci.mlc.ai/blue/organizations/jenkins/mlc-llm/detail/PR-3431/1/pipeline, could you check the failures?

Also cc @akaashrp for visibility.

@ksgr5566
Copy link
Author

ksgr5566 commented Feb 25, 2026

Hi @ksgr5566 thanks! The WASM CI is failing https://ci.mlc.ai/blue/organizations/jenkins/mlc-llm/detail/PR-3431/1/pipeline, could you check the failures?

Also cc @akaashrp for visibility.

Hi @MasterJH5574, this would require TVM version change. I have raised the TVM PR to Charlie's subgroup PR branch (CharlieFRuan/tvm#1). If that is done, the version can be updated here and the build would succeed.

@MasterJH5574
Copy link
Member

@ksgr5566 I see. Should we upstream to https://github.com/apache/tvm instead of Charlie's fork?

return target, build_func


def _apply_webgpu_subgroups(target: Target, enable_subgroups: Optional[bool]) -> Target:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test for this behavior (enable_subgroups flag causes target_dict to have a supports_subgroups key that is set to True)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also non-webgpu target + enable_subgroups=True leads to unchanged target, and any target + enable_subgroups = False leads to unchanged target

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants