Skip to content

Commit a5903e5

Browse files
Update docs/source/user_guide/feature_guide/speculative_decoding.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: zhaomingyu13 <[email protected]>
1 parent 84def8f commit a5903e5

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/user_guide/feature_guide/speculative_decoding.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ A variety of EAGLE draft models are available on the Hugging Face hub:
9999

100100
## Speculating using MTP speculators
101101

102-
The following code configures vLLM Ascend to use speculative decoding where proposals are generated by MTP(Multi Token Prediction),boosting inference performance by parallelizing the prediction of multiple tokens.For more information about MTP see [this doc](https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/feature_guide/Multi_Token_Prediction.html)
102+
The following code configures vLLM Ascend to use speculative decoding where proposals are generated by MTP (Multi Token Prediction), boosting inference performance by parallelizing the prediction of multiple tokens. For more information about MTP see [this doc](https://docs.vllm.ai/projects/ascend/en/latest/developer_guide/feature_guide/Multi_Token_Prediction.html)
103103

104104
- Offline inference
105105

0 commit comments

Comments
 (0)