[moe] Add num_dense_layers to Grug MoE for first-k-dense ablation by claude[bot] · Pull Request #4058 · marin-community/marin

claude · 2026-03-23T21:54:05Z

Add num_dense_layers and dense_intermediate_dim to GrugModelConfig so the first K transformer layers use a dense FFN (DenseBlock) instead of MoE routing. DenseBlock returns zero router stats so Transformer aggregation is unchanged. FLOP accounting splits dense and MoE layer contributions. Defaults preserve current behavior (num_dense_layers=0).

Fixes #4022

Add num_dense_layers and dense_intermediate_dim fields to GrugModelConfig so that the first K layers use a dense FFN (DenseBlock) instead of MoE routing. This enables first-k-dense ablation experiments for the 10T gate. Updates FLOP accounting for mixed dense/MoE architectures and adds tests. Fixes #4022 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fbf80ad3a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T21:56:54Z

+    else:
+        dense_flops_per_token = 0.0
+
+    flops_per_token = moe_flops_per_token + dense_flops_per_token


Avoid double-counting LM-head FLOPs in mixed dense/MoE runs

When num_dense_layers > 0, this function adds two separate lm_flops_per_token(...) results, but lm_flops_per_token already includes an LM-head term each time, so the sum overcounts by one extra LM-head pass. This inflates throughput/flops_per_*_analytic and can skew throughput/utilization comparisons for first-k-dense ablations.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-23T21:56:54Z

+
+def test_grug_moe_num_dense_layers_validation():
+    """Validate that num_dense_layers must be <= num_layers."""
+    from experiments.grug.moe.model import GrugModelConfig


Move local import to module top-level per repo contract

This function-scoped import violates the repository rule in /workspace/marin/AGENTS.md (“All imports at the top of the file. No local imports except …”). Keeping it local hides dependencies and defers import failures to only this test path, so it should be moved into the module import block.

Useful? React with 👍 / 👎.

github-actions · 2026-04-16T01:53:18Z

This pull request has been inactive for 23 days and is marked as stale.
If there is no further activity within 7 days, it will be automatically closed.
If you believe this PR should remain open, please add a comment or update the PR.

claude Bot added the agent-generated Created by automation/agent label Mar 23, 2026

claude Bot mentioned this pull request Mar 23, 2026

[moe] Good 10T: ablate first-k dense #4022

Open

chatgpt-codex-connector Bot reviewed Mar 23, 2026

View reviewed changes

This was referenced Mar 23, 2026

[moe] Great 10T: ablate first-k dense #4040

Open

[moe] Add great 10T first-k-dense isoflop ablation sweep #4068

Open

Modeling April 2026 #4266

Closed

claude Bot mentioned this pull request Mar 31, 2026

MoE Scaling up to April goal #4281

Open

3 tasks

github-actions Bot added the stale label Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[moe] Add num_dense_layers to Grug MoE for first-k-dense ablation#4058

[moe] Add num_dense_layers to Grug MoE for first-k-dense ablation#4058
claude[bot] wants to merge 1 commit intomainfrom
agent/20260323-fix-4022

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

claude Bot commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants