Support Medusa speculative decoding by AllentDan · Pull Request #2859 · InternLM/lmdeploy

AllentDan · 2024-12-05T06:29:38Z

No description provided.

Conflicts: lmdeploy/api.py

Conflicts: lmdeploy/pytorch/engine/model_agent.py lmdeploy/pytorch/kernels/cuda/flashattention.py

Tushar-ml · 2024-12-11T13:20:43Z

can we have docs section for using this feature? @AllentDan

snippetzero · 2024-12-23T13:45:06Z

Is there any plan for the Turbomind engine to support speculative sampling?

Conflicts: lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/model_agent.py lmdeploy/pytorch/kernels/cuda/flashattention.py lmdeploy/pytorch/model_inputs.py requirements/runtime_maca.txt tests/pytorch/kernel/test_flash_attention.py

AllentDan added 15 commits November 6, 2024 17:06

Support medusa inference

7e61a00

fix

7f46c3c

fix medusa inference and vicuna template

45d2503

fix finish

1459af8

Fix chat

69940bd

fix and remove cuda_graph func of medusa

731d70a

fix

bcec0af

Merge branch 'main' into medusa

eb0ed37

Conflicts: lmdeploy/api.py

support tp

1898213

fix tp

5131a05

add cli

8c0a76d

Add tree decoding

9930e61

support bonus token id

37afbb6

tp

dcc6e85

Merge branch 'main' into medusa

cc5d110

Conflicts: lmdeploy/pytorch/engine/model_agent.py lmdeploy/pytorch/kernels/cuda/flashattention.py

AllentDan added the WIP label Dec 5, 2024

AllentDan added 4 commits December 5, 2024 19:46

remove tl.constexpr to avoid repeatly compile triton kernel

bbc108c

fix with deep copy

2940dce

update kernels

055e7c7

fix UT

a6805d8

AllentDan removed the WIP label Dec 11, 2024

lvhan028 added the enhancement New feature or request label Dec 12, 2024

AllentDan added 6 commits December 31, 2024 17:19

prefill cuda graph for medusa

fbcfd3a

fix cudagraph batch error

20f426c

boundary check

28f3d75

Merge branch 'main' into medusa

4e9c31e

Conflicts: lmdeploy/pytorch/engine/engine.py lmdeploy/pytorch/engine/model_agent.py lmdeploy/pytorch/kernels/cuda/flashattention.py lmdeploy/pytorch/model_inputs.py requirements/runtime_maca.txt tests/pytorch/kernel/test_flash_attention.py

Merge branch 'main' into medusa-backup

59988d4

fix stop

3fe5fe4

lvhan028 closed this Sep 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Support Medusa speculative decoding#2859

Support Medusa speculative decoding#2859
AllentDan wants to merge 25 commits intoInternLM:mainfrom
AllentDan:medusa

AllentDan commented Dec 5, 2024

Uh oh!

Tushar-ml commented Dec 11, 2024

Uh oh!

snippetzero commented Dec 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

AllentDan commented Dec 5, 2024

Uh oh!

Tushar-ml commented Dec 11, 2024

Uh oh!

snippetzero commented Dec 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants