Skip to content

[Feature] move the waiting queue to tokenizer #60

@hogura99

Description

@hogura99

Currently, the waiting queue is implemented inside the DPScheduler. However, the current implementation will make the request to be sent one by one, which may potentially reduce the effective batch size at the first layer of attention.

Putting the waiting queue in the tokenizer and gather the tokens as a full batch may ease this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions