(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

ysjprojects · 2025-04-13T15:22:28Z

follow up from #1945

More alignment with the huggingface implementation, allowing custom values for q_lora_rank, v_dim, etc.
Cleaned up some lines added for debugging purposes.

Also, the errors in #1945 was simply because pythia simply did not have the necessary parameters for MLA.

for more information, see https://pre-commit.ci

…head_dim, etc.

for more information, see https://pre-commit.ci

Borda

let's add a test :)

ali-alshaar7

Looks good overall. +1 on the test though, and would be nice if we could share more with the base attention implementation instead of duplicating things. Maybe for a followup. Thank you.

for more information, see https://pre-commit.ci

Borda · 2025-04-23T08:12:28Z

@t-vi @lantiga mind have look, pls :)

for more information, see https://pre-commit.ci

ysjprojects · 2025-05-16T06:53:40Z

@t-vi @lantiga @Borda

DeepseekV3 architecture WIP.

Should we move DeepseekV3MoE and MultiHeadLatentAttention out of model.py into its own file since the architecture is quite unique so it's unlikely that a future model would implement it again?

simoneangarano and others added 17 commits February 24, 2025 18:23

v2

249f722

added link to results

8350ac2

uodated README_MLA

2a599da

Updated README_MLA.md

9ab7ed8

Update README_MLA.md

7ce13ff

add more comments and visual representation

f46a2b1

Merge branch 'main' of https://github.com/simoneangarano/litgpt

c3eef3f

Merge branch 'main' into main

98579a0

Merge branch 'main' into main

a7fb896

Merge branch 'main' into main

8b030ec

[pre-commit.ci] auto fixes from pre-commit.com hooks

48fb11d

for more information, see https://pre-commit.ci

Merge branch 'main' into main

5dc3985

typo

07b0538

Merge branch 'main' into main

27d3d40

MLA: modified to support specifying custom values for q_lora_rank, v_…

18af658

…head_dim, etc.

clean up

6cf4282

clean up

15727c6

ysjprojects requested review from lantiga, t-vi and Borda as code owners April 13, 2025 15:22

pre-commit-ci bot and others added 2 commits April 13, 2025 15:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

47bd94e

for more information, see https://pre-commit.ci

Merge branch 'main' into pr-feature-mla

ebeb67f

Borda approved these changes Apr 14, 2025

View reviewed changes

Borda reviewed Apr 14, 2025

View reviewed changes

ali-alshaar7 approved these changes Apr 15, 2025

View reviewed changes

ysjprojects and others added 2 commits April 23, 2025 03:37

major change ref

43187c2

[pre-commit.ci] auto fixes from pre-commit.com hooks

56e62ee

for more information, see https://pre-commit.ci

ysjprojects and others added 2 commits May 15, 2025 18:56

Merge branch 'main' into pr-feature-mla

9f35b9c

feat: deepseekv3 architecture

ba55cf1

shijie.yu and others added 2 commits May 16, 2025 06:47

deepseekv3

7e3ea78

[pre-commit.ci] auto fixes from pre-commit.com hooks

dcc89de

for more information, see https://pre-commit.ci

ysjprojects changed the title ~~Multi-head Latent Attention fixes~~ DeepseekV3 (and Multi-Head Latent Attention) May 16, 2025

ysjprojects changed the title ~~DeepseekV3 (and Multi-Head Latent Attention)~~ (WIP) DeepseekV3 (and Multi-Head Latent Attention) May 16, 2025

ysjprojects marked this pull request as draft May 21, 2025 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

Uh oh!

ysjprojects commented Apr 13, 2025 •

edited

Loading

Uh oh!

Borda left a comment

Uh oh!

ali-alshaar7 left a comment

Uh oh!

Borda commented Apr 23, 2025

Uh oh!

ysjprojects commented May 16, 2025

Uh oh!

Uh oh!

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

Are you sure you want to change the base?

(WIP) DeepseekV3 (and Multi-Head Latent Attention) #2012

Uh oh!

Conversation

ysjprojects commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Borda left a comment

Choose a reason for hiding this comment

Uh oh!

ali-alshaar7 left a comment

Choose a reason for hiding this comment

Uh oh!

Borda commented Apr 23, 2025

Uh oh!

ysjprojects commented May 16, 2025

Uh oh!

Uh oh!

ysjprojects commented Apr 13, 2025 •

edited

Loading