Skip to content

Agent MoE Experiment: MHA (no GQA) #5151

@ClassicLarry

Description

@ClassicLarry

TL;DR

Test full multi-head attention (num_kv_heads = num_heads) vs baseline 4:1 GQA. Measures the quality cost of GQA at current compute budgets with the current recipe.

User prompt

follow agent.md and implement multi-head-attention, (no gqa)

Scope

Head counts

d GQA (baseline) heads/kv MHA heads/kv
512 4/1 4/4
768 6/1 6/6

Gate 1 runs (2 total)

Dim Budget Status
512 2.19e17 pending
768 1.70e18 pending

Decision log

empty

Conclusion

pending

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions