Skip to content

Comments

Support Medusa speculative decoding#2859

Closed
AllentDan wants to merge 25 commits intoInternLM:mainfrom
AllentDan:medusa
Closed

Support Medusa speculative decoding#2859
AllentDan wants to merge 25 commits intoInternLM:mainfrom
AllentDan:medusa

Conversation

@AllentDan
Copy link
Collaborator

No description provided.

@AllentDan AllentDan added the WIP label Dec 5, 2024
@AllentDan AllentDan removed the WIP label Dec 11, 2024
@Tushar-ml
Copy link

can we have docs section for using this feature? @AllentDan

@lvhan028 lvhan028 added the enhancement New feature or request label Dec 12, 2024
@snippetzero
Copy link

Is there any plan for the Turbomind engine to support speculative sampling?

Conflicts:
	lmdeploy/pytorch/engine/engine.py
	lmdeploy/pytorch/engine/model_agent.py
	lmdeploy/pytorch/kernels/cuda/flashattention.py
	lmdeploy/pytorch/model_inputs.py
	requirements/runtime_maca.txt
	tests/pytorch/kernel/test_flash_attention.py
@lvhan028 lvhan028 closed this Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants