Soft-RFDETR : An optionnal Mixture of Experts Module to RFDETR by LeosCtrt · Pull Request #245 · roboflow/rf-detr

LeosCtrt · 2025-06-19T15:02:18Z

Description

Hello, I propose here an optional soft mixture of experts implementation for RF-DETR. The idea comes from the article From Sparse to Soft Mixtures of Experts, and it seems to usually improve the overall performance of a model. I thought that it would be a nice option to add.

As the article suggests, I set the default value of experts per slots to 1.

In order to add this option we would need to add the python library "soft_moe". This library offers good a Soft MoE Wrapper, the source code is available here.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

I tested it on a small private dataset and it improved a bit the accuracy (I gained roughly 1% mAP, so a small gain but still a gain). I am not able to test it on the full COCO dataset but I guess it would also improve a bit the overall accuracy.

As expected the MoE RF-DETR model is slightly slower than the basic one, but still very fast.

isaacrob-roboflow · 2025-06-24T15:16:11Z

we are leveraging a pretrained backbone, so without robust pretrained weights we can't use a different architecture

isaacrob-roboflow · 2025-06-24T15:18:48Z

additionally, have you looked at TRT speed for this? many of these types of architectures actually end up being meaningfully slower in terms of latency when compiled due to memory bottleneck

Copilot

Pull request overview

This PR adds an optional Soft Mixture of Experts (MoE) architecture to RF-DETR, based on the paper "From Sparse to Soft Mixtures of Experts". The implementation allows users to optionally enable MoE in the transformer decoder layers to potentially improve model accuracy, though with increased computational cost.

Changes:

Added a new FFNBlock class to encapsulate the feedforward network logic needed by the MoE layer
Modified TransformerDecoderLayer to conditionally use either standard FFN or MoE-wrapped FFN based on configuration
Added MoE and MoE_params configuration fields to ModelConfig
Added soft_moe as a new required dependency

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 19 comments.

File	Description
rfdetr/models/transformer.py	Adds FFNBlock class and modifies TransformerDecoderLayer to support optional MoE architecture with conditional FFN replacement
rfdetr/config.py	Adds MoE configuration fields (MoE boolean flag and MoE_params list) to ModelConfig
pyproject.toml	Adds soft_moe library as a required dependency

Copilot · 2026-02-19T13:26:26Z

rfdetr/models/transformer.py

+from soft_moe import SoftMoELayerWrapper
 import torch.nn.functional as F
 from torch import nn, Tensor



The import statement for soft_moe should be placed after the standard library imports and before the local imports, following PEP 8 import order conventions. Currently it's between torch and torch.nn.functional imports. Additionally, since this is a conditional dependency (only used when MoE=True), consider either: 1) Making it an optional dependency in pyproject.toml and importing it conditionally within the init method where it's used, or 2) Documenting that soft_moe is required for MoE functionality.

Suggested change

from soft_moe import SoftMoELayerWrapper

import torch.nn.functional as F

from torch import nn, Tensor

import torch.nn.functional as F

from torch import nn, Tensor

from soft_moe import SoftMoELayerWrapper

Copilot · 2026-02-19T13:26:27Z

rfdetr/models/transformer.py

+class FFNBlock(nn.Module):
+    def __init__(self, d_model, dim_feedforward, dropout):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(d_model, dim_feedforward),
+            nn.ReLU(),
+            nn.Dropout(dropout),
+            nn.Linear(dim_feedforward, d_model),
+        )
+
+    def forward(self, x):
+        return self.net(x)


The FFNBlock class is missing mandatory type hints for all parameters and the return type, as required by the coding guidelines. Additionally, it lacks a Google-style docstring explaining its purpose, parameters, and return value. This is required for all new classes according to CONTRIBUTING.md.

Copilot · 2026-02-19T13:26:27Z

rfdetr/models/transformer.py

+                 skip_self_attn=False, MoE=False, MoE_params=[32,1]):
        super().__init__()


Using mutable default arguments (list) is a Python anti-pattern that can lead to unexpected behavior. The default value MoE_params=[32,1] should be replaced with None and then initialized inside the function.

Suggested change

skip_self_attn=False, MoE=False, MoE_params=[32,1]):

super().__init__()

skip_self_attn=False, MoE=False, MoE_params=None):

super().__init__()

if MoE_params is None:

MoE_params = [32, 1]

Copilot · 2026-02-19T13:26:27Z

rfdetr/models/transformer.py

        tgt = tgt + self.dropout2(tgt2)
        tgt = self.norm2(tgt)
-        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
+        if self.MoE == True:


Comparing boolean values with == True is not Pythonic. Use if self.MoE: instead of if self.MoE == True:.

Suggested change

if self.MoE == True:

if self.MoE:

Copilot · 2026-02-19T13:26:27Z

rfdetr/config.py

    group_detr: int = 13
    gradient_checkpointing: bool = False
+    MoE: bool = False
+    MoE_params: List[int] = [32, 1]


Using a mutable default value (list) in Pydantic model fields can lead to unexpected behavior. Use Field(default_factory=lambda: [32, 1]) instead of MoE_params: List[int] = [32, 1] to ensure each instance gets a fresh list.

Copilot · 2026-02-19T13:26:30Z

rfdetr/models/transformer.py

 from typing import Optional

 import torch
+from soft_moe import SoftMoELayerWrapper


The soft_moe dependency is imported unconditionally at the module level, but it's only used when MoE is enabled. This means the library will be required even for users who don't use the MoE feature. Consider making this import conditional to avoid forcing all users to install this dependency. One approach is to import it inside the __init__ method of TransformerDecoderLayer when MoE=True, with a helpful error message if the import fails directing users to install it.

Suggested change

from soft_moe import SoftMoELayerWrapper

try:

from soft_moe import SoftMoELayerWrapper

except ImportError: # soft_moe is optional and only needed for MoE

class SoftMoELayerWrapper: # type: ignore[no-redef]

"""Placeholder wrapper used when soft_moe is not installed.

This is instantiated only if MoE functionality is enabled. If you hit this

error, install the optional `soft_moe` dependency:

pip install soft-moe

"""

def __init__(self, *args, **kwargs) -> None:

raise ImportError(

"soft_moe is required for MoE support in TransformerDecoderLayer. "

"Please install the optional dependency with:\n\n"

" pip install soft-moe\n"

)

Copilot · 2026-02-19T13:26:30Z

rfdetr/models/transformer.py

+        if self.MoE == True:
+            print("\n" + "="*80)
+            print("Loading Mixture of Expert (MoE) Architecture")
+            print("="*80)
+            print(f"Experts Count         : {MoE_params[0]}")
+            print(f"Slots per Expert      : {MoE_params[1]}")
+            print("-"*80)
+            print("Warning: This custom architecture prevents loading full pretrained weights.")
+            print("Note   : It may be slightly slower but could improve accuracy.")
+            print("="*80 + "\n")
+
+            self.moe_layer = SoftMoELayerWrapper(
+                dim=d_model,
+                num_experts=MoE_params[0],
+                slots_per_expert=MoE_params[1],
+                layer=FFNBlock,
+                d_model=d_model,
+                dim_feedforward=dim_feedforward,
+                dropout=dropout
+            )


There's no validation for the MoE_params parameter. The code assumes it's a list with exactly 2 elements and directly accesses MoE_params[0] and MoE_params[1]. If the user provides an invalid value (e.g., empty list, single element, non-list), this will raise an IndexError. Add validation to ensure MoE_params contains exactly 2 positive integers before using them.

Copilot · 2026-02-19T13:26:30Z

rfdetr/models/transformer.py

                 decoder_norm_type='LN',
-                 bbox_reparam=False):
+                 bbox_reparam=False,
+                 MoE=False, MoE_params=[32,1]):


Inconsistent spacing in list literal. The default value [32,1] should have spaces after commas as per PEP 8 style guide: [32, 1]. This applies to all occurrences in the code.

Suggested change

MoE=False, MoE_params=[32,1]):

MoE=False, MoE_params=[32, 1]):

Copilot · 2026-02-19T13:26:30Z

rfdetr/models/transformer.py

+            self.moe_layer = SoftMoELayerWrapper(
+                dim=d_model,
+                num_experts=MoE_params[0],
+                slots_per_expert=MoE_params[1],
+                layer=FFNBlock,
+                d_model=d_model,
+                dim_feedforward=dim_feedforward,
+                dropout=dropout
+            )


The MoE implementation will significantly increase memory usage and model parameters. With the default num_experts=32, the FFN layer is replicated 32 times per decoder layer. For a 3-layer decoder, this means 96 expert FFN blocks instead of 3 regular FFN blocks. This should be documented in the docstring or config to help users understand the memory/compute tradeoffs. Consider adding a warning or documentation about the expected memory increase based on the number of experts.

Copilot · 2026-02-19T13:26:30Z

pyproject.toml

    "pydantic",
    "supervision",
    "matplotlib",
+    "soft_moe",


The new dependency soft_moe is added without any version pinning, meaning builds will always pull the latest mutable release from an external third-party package, which increases the risk of a malicious or compromised update being pulled into your supply chain. An attacker who gains control over the soft_moe package or its distribution channel could execute arbitrary code in your environment with the application's privileges. To reduce this risk, pin soft_moe to a specific, vetted version (or hash) and update it intentionally after review.

LeosCtrt added 5 commits June 19, 2025 16:14

Implementation of a soft MoE option for RF-DETR

d20e165

Change dependencies requirements

b710955

Small fix on default values

3481974

Change default values

30538ab

Fix a forgotten default value

26bb865

github-actions bot added the has conflicts label Jan 22, 2026

Borda added the enhancement New feature or request label Jan 22, 2026

Borda requested review from Borda, Matvezy, SkalskiP and probicheaux as code owners January 30, 2026 02:47

Borda requested a review from isaacrob as a code owner February 11, 2026 15:57

Borda force-pushed the develop branch 4 times, most recently from 60b16c1 to 523f9df Compare February 14, 2026 06:46

Borda requested a review from Copilot February 19, 2026 13:18

Copilot started reviewing on behalf of Borda February 19, 2026 13:19 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

		skip_self_attn=False, MoE=False, MoE_params=[32,1]):
		super().__init__()

-from soft_moe import SoftMoELayerWrapper
+try:
+    from soft_moe import SoftMoELayerWrapper
+except ImportError:  # soft_moe is optional and only needed for MoE
+    class SoftMoELayerWrapper:  # type: ignore[no-redef]
+        """Placeholder wrapper used when soft_moe is not installed.
+        This is instantiated only if MoE functionality is enabled. If you hit this
+        error, install the optional `soft_moe` dependency:
+            pip install soft-moe
+        """
+        def __init__(self, *args, **kwargs) -> None:
+            raise ImportError(
+                "soft_moe is required for MoE support in TransformerDecoderLayer. "
+                "Please install the optional dependency with:\n\n"
+                "    pip install soft-moe\n"
+            )

	MoE=False, MoE_params=[32,1]):
	MoE=False, MoE_params=[32, 1]):

Conversation

LeosCtrt commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Uh oh!

isaacrob-roboflow commented Jun 24, 2025

Uh oh!

isaacrob-roboflow commented Jun 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

LeosCtrt commented Jun 19, 2025 •

edited

Loading