Skip to content

Soft-RFDETR : An optionnal Mixture of Experts Module to RFDETR#245

Open
LeosCtrt wants to merge 5 commits intoroboflow:developfrom
LeosCtrt:soft_moe
Open

Soft-RFDETR : An optionnal Mixture of Experts Module to RFDETR#245
LeosCtrt wants to merge 5 commits intoroboflow:developfrom
LeosCtrt:soft_moe

Conversation

@LeosCtrt
Copy link

@LeosCtrt LeosCtrt commented Jun 19, 2025

Description

Hello, I propose here an optional soft mixture of experts implementation for RF-DETR. The idea comes from the article From Sparse to Soft Mixtures of Experts, and it seems to usually improve the overall performance of a model. I thought that it would be a nice option to add.

As the article suggests, I set the default value of experts per slots to 1.

In order to add this option we would need to add the python library "soft_moe". This library offers good a Soft MoE Wrapper, the source code is available here.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

I tested it on a small private dataset and it improved a bit the accuracy (I gained roughly 1% mAP, so a small gain but still a gain). I am not able to test it on the full COCO dataset but I guess it would also improve a bit the overall accuracy.

As expected the MoE RF-DETR model is slightly slower than the basic one, but still very fast.

@isaacrob-roboflow
Copy link

we are leveraging a pretrained backbone, so without robust pretrained weights we can't use a different architecture

@isaacrob-roboflow
Copy link

additionally, have you looked at TRT speed for this? many of these types of architectures actually end up being meaningfully slower in terms of latency when compiled due to memory bottleneck

@Borda Borda added the enhancement New feature or request label Jan 22, 2026
@Borda Borda requested a review from isaacrob as a code owner February 11, 2026 15:57
@Borda Borda force-pushed the develop branch 4 times, most recently from 60b16c1 to 523f9df Compare February 14, 2026 06:46
@Borda Borda requested a review from Copilot February 19, 2026 13:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional Soft Mixture of Experts (MoE) architecture to RF-DETR, based on the paper "From Sparse to Soft Mixtures of Experts". The implementation allows users to optionally enable MoE in the transformer decoder layers to potentially improve model accuracy, though with increased computational cost.

Changes:

  • Added a new FFNBlock class to encapsulate the feedforward network logic needed by the MoE layer
  • Modified TransformerDecoderLayer to conditionally use either standard FFN or MoE-wrapped FFN based on configuration
  • Added MoE and MoE_params configuration fields to ModelConfig
  • Added soft_moe as a new required dependency

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 19 comments.

File Description
rfdetr/models/transformer.py Adds FFNBlock class and modifies TransformerDecoderLayer to support optional MoE architecture with conditional FFN replacement
rfdetr/config.py Adds MoE configuration fields (MoE boolean flag and MoE_params list) to ModelConfig
pyproject.toml Adds soft_moe library as a required dependency

Comment on lines +23 to 26
from soft_moe import SoftMoELayerWrapper
import torch.nn.functional as F
from torch import nn, Tensor

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import statement for soft_moe should be placed after the standard library imports and before the local imports, following PEP 8 import order conventions. Currently it's between torch and torch.nn.functional imports. Additionally, since this is a conditional dependency (only used when MoE=True), consider either: 1) Making it an optional dependency in pyproject.toml and importing it conditionally within the init method where it's used, or 2) Documenting that soft_moe is required for MoE functionality.

Suggested change
from soft_moe import SoftMoELayerWrapper
import torch.nn.functional as F
from torch import nn, Tensor
import torch.nn.functional as F
from torch import nn, Tensor
from soft_moe import SoftMoELayerWrapper

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +54
class FFNBlock(nn.Module):
def __init__(self, d_model, dim_feedforward, dropout):
super().__init__()
self.net = nn.Sequential(
nn.Linear(d_model, dim_feedforward),
nn.ReLU(),
nn.Dropout(dropout),
nn.Linear(dim_feedforward, d_model),
)

def forward(self, x):
return self.net(x)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FFNBlock class is missing mandatory type hints for all parameters and the return type, as required by the coding guidelines. Additionally, it lacks a Google-style docstring explaining its purpose, parameters, and return value. This is required for all new classes according to CONTRIBUTING.md.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +460 to 461
skip_self_attn=False, MoE=False, MoE_params=[32,1]):
super().__init__()
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using mutable default arguments (list) is a Python anti-pattern that can lead to unexpected behavior. The default value MoE_params=[32,1] should be replaced with None and then initialized inside the function.

Suggested change
skip_self_attn=False, MoE=False, MoE_params=[32,1]):
super().__init__()
skip_self_attn=False, MoE=False, MoE_params=None):
super().__init__()
if MoE_params is None:
MoE_params = [32, 1]

Copilot uses AI. Check for mistakes.
tgt = tgt + self.dropout2(tgt2)
tgt = self.norm2(tgt)
tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
if self.MoE == True:
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing boolean values with == True is not Pythonic. Use if self.MoE: instead of if self.MoE == True:.

Suggested change
if self.MoE == True:
if self.MoE:

Copilot uses AI. Check for mistakes.
group_detr: int = 13
gradient_checkpointing: bool = False
MoE: bool = False
MoE_params: List[int] = [32, 1]
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a mutable default value (list) in Pydantic model fields can lead to unexpected behavior. Use Field(default_factory=lambda: [32, 1]) instead of MoE_params: List[int] = [32, 1] to ensure each instance gets a fresh list.

Copilot uses AI. Check for mistakes.
from typing import Optional

import torch
from soft_moe import SoftMoELayerWrapper
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The soft_moe dependency is imported unconditionally at the module level, but it's only used when MoE is enabled. This means the library will be required even for users who don't use the MoE feature. Consider making this import conditional to avoid forcing all users to install this dependency. One approach is to import it inside the __init__ method of TransformerDecoderLayer when MoE=True, with a helpful error message if the import fails directing users to install it.

Suggested change
from soft_moe import SoftMoELayerWrapper
try:
from soft_moe import SoftMoELayerWrapper
except ImportError: # soft_moe is optional and only needed for MoE
class SoftMoELayerWrapper: # type: ignore[no-redef]
"""Placeholder wrapper used when soft_moe is not installed.
This is instantiated only if MoE functionality is enabled. If you hit this
error, install the optional `soft_moe` dependency:
pip install soft-moe
"""
def __init__(self, *args, **kwargs) -> None:
raise ImportError(
"soft_moe is required for MoE support in TransformerDecoderLayer. "
"Please install the optional dependency with:\n\n"
" pip install soft-moe\n"
)

Copilot uses AI. Check for mistakes.
Comment on lines +475 to +494
if self.MoE == True:
print("\n" + "="*80)
print("Loading Mixture of Expert (MoE) Architecture")
print("="*80)
print(f"Experts Count : {MoE_params[0]}")
print(f"Slots per Expert : {MoE_params[1]}")
print("-"*80)
print("Warning: This custom architecture prevents loading full pretrained weights.")
print("Note : It may be slightly slower but could improve accuracy.")
print("="*80 + "\n")

self.moe_layer = SoftMoELayerWrapper(
dim=d_model,
num_experts=MoE_params[0],
slots_per_expert=MoE_params[1],
layer=FFNBlock,
d_model=d_model,
dim_feedforward=dim_feedforward,
dropout=dropout
)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no validation for the MoE_params parameter. The code assumes it's a list with exactly 2 elements and directly accesses MoE_params[0] and MoE_params[1]. If the user provides an invalid value (e.g., empty list, single element, non-list), this will raise an IndexError. Add validation to ensure MoE_params contains exactly 2 positive integers before using them.

Copilot uses AI. Check for mistakes.
decoder_norm_type='LN',
bbox_reparam=False):
bbox_reparam=False,
MoE=False, MoE_params=[32,1]):
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent spacing in list literal. The default value [32,1] should have spaces after commas as per PEP 8 style guide: [32, 1]. This applies to all occurrences in the code.

Suggested change
MoE=False, MoE_params=[32,1]):
MoE=False, MoE_params=[32, 1]):

Copilot uses AI. Check for mistakes.
Comment on lines +486 to +494
self.moe_layer = SoftMoELayerWrapper(
dim=d_model,
num_experts=MoE_params[0],
slots_per_expert=MoE_params[1],
layer=FFNBlock,
d_model=d_model,
dim_feedforward=dim_feedforward,
dropout=dropout
)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MoE implementation will significantly increase memory usage and model parameters. With the default num_experts=32, the FFN layer is replicated 32 times per decoder layer. For a 3-layer decoder, this means 96 expert FFN blocks instead of 3 regular FFN blocks. This should be documented in the docstring or config to help users understand the memory/compute tradeoffs. Consider adding a warning or documentation about the expected memory increase based on the number of experts.

Copilot uses AI. Check for mistakes.
"pydantic",
"supervision",
"matplotlib",
"soft_moe",
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new dependency soft_moe is added without any version pinning, meaning builds will always pull the latest mutable release from an external third-party package, which increases the risk of a malicious or compromised update being pulled into your supply chain. An attacker who gains control over the soft_moe package or its distribution channel could execute arbitrary code in your environment with the application's privileges. To reduce this risk, pin soft_moe to a specific, vetted version (or hash) and update it intentionally after review.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request has conflicts

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments