[Question]  --max-length parameter actually determine when training a draft model

What does the --max-length parameter actually determine when training a draft model? If I set --max-length 2048, does it mean that the maximum context length for the draft model (including during speculative decoding/acceleration) is always 2048 tokens? In other words, even if the target model supports longer sequences, the draft model’s context window for acceleration will still be capped at 2048? I want to confirm whether this understanding is correct. Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] --max-length parameter actually determine when training a draft model #289

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] --max-length parameter actually determine when training a draft model #289

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions