Skip to content

Conversation

@winglian
Copy link

@winglian winglian commented Dec 5, 2025

This PR makes it easier to simply use something like the snippet below for a drop in replacement for all the MoEs based on qwen2 in transformers v5. The original ScatterMoEGatedMLP isn't quite usable since it relies on router instead of gate and input_linear, output_linear instead of gate_up_proj, down_proj in the v5 modeling.

register_kernel_mapping({
    "HFScatterMoEParallelExperts": {
        "cuda": {
            Mode.TRAINING: LayerRepository(
                repo_id="axolotl-ai-co/scattermoe",
                layer_name="HFScatterMoEGatedMLP",
            ),
            Mode.INFERENCE: LayerRepository(
                repo_id="axolotl-ai-co/scattermoe",
                layer_name="HFScatterMoEGatedMLP",
            ),
        },
    }
})
replace_kernel_forward_from_hub(Qwen2MoeSparseMoeBlock, "HFScatterMoEParallelExperts")

@winglian winglian requested a review from MekkCyber as a code owner December 5, 2025 14:45
@MekkCyber
Copy link
Collaborator

cc @shawntan

@danieldk
Copy link
Member

danieldk commented Dec 5, 2025

Hi, @shawntan, any chance you could review this PR?

@winglian
Copy link
Author

winglian commented Dec 5, 2025

Here's a quick SFT test of olmoe.

Screenshot 2025-12-05 at 9 46 36 AM

@shawntan
Copy link
Contributor

shawntan commented Dec 8, 2025

Yeah this looks good and is a good idea, but have you tested it end-to-end with use_kernels=True?

I have an issue with another community kernel: #76

@winglian
Copy link
Author

yes, this was tested with use_kernels=True

@shawntan
Copy link
Contributor

shawntan commented Dec 12, 2025

Okay. I'm trying to test GraniteMoEHybrid with use_kernels=True to make sure these kernels work, but it seems the SiLU kernel does not work with non-contiguous tensors, and it seems both GraniteMoEHybrid and Qwen2MoE will be affected by this. See this comment: #76 (comment)

The decision seems to be to assert hidden_states.is_contiguous(). Which breaks both models, as far as I understand.

UPDATE: confirmed that it breaks Qwen2MoE as well.

@winglian Not fully sure how the error didn't show up when testing with use_kernels=True, but I'm interested to know how you get around it.

@MekkCyber FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants