Skip to content

Conversation

@iforgetmyname
Copy link
Collaborator

…raph(npu) & support dsv3_2 mtp

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Copy link
Collaborator

@sglang-bot sglang-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. upload torch profile to show that the overlap actually happens and there is no CPU overhead.
    • Rule: If you change any logic in (overlap) scheduler, attach a torch profile to show the overlap actually happens.
  2. compare the speed and acceptance length of overlap vs. non-overlap
  3. Get someone to verify on GPU
  4. add a test case

Copy link
Collaborator

@sglang-bot sglang-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 257 to 260
if _is_npu:
device = "npu"
else:
device = "cuda"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ways to remove if/else

  • Pu all if else into a single place (ideally in the initialization)
  • Try to reuse batch.input_ids.device or batch.sampling_info

self.cuda_graph_runner_for_draft_extend = (
EAGLEDraftExtendCudaGraphRunner(self)
if not _is_npu
else EAGLEDraftExtendNpuGraphRunner(self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device2ExtendCudaGraphRunner = {
    "npu": EAGLEDraftExtendNpuGraphRunner,
    "cuda": EAGLEDraftExtendCudaGraphRunner,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants