- 
                Notifications
    
You must be signed in to change notification settings  - Fork 3.3k
 
[NPU] support mtp(beta) pd disaggregation and dp attention #12443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…raph(npu) & support dsv3_2 mtp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- upload torch profile to show that the overlap actually happens and there is no CPU overhead.
- Rule: If you change any logic in (overlap) scheduler, attach a torch profile to show the overlap actually happens.
 
 - compare the speed and acceptance length of overlap vs. non-overlap
 - Get someone to verify on GPU
 - add a test case
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if _is_npu: | ||
| device = "npu" | ||
| else: | ||
| device = "cuda" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ways to remove if/else
- Pu all if else into a single place (ideally in the initialization)
 - Try to reuse 
batch.input_ids.deviceorbatch.sampling_info 
| self.cuda_graph_runner_for_draft_extend = ( | ||
| EAGLEDraftExtendCudaGraphRunner(self) | ||
| if not _is_npu | ||
| else EAGLEDraftExtendNpuGraphRunner(self) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Device2ExtendCudaGraphRunner = {
    "npu": EAGLEDraftExtendNpuGraphRunner,
    "cuda": EAGLEDraftExtendCudaGraphRunner,
}
c9f6e76    to
    58f2de8      
    Compare
  
    a4834dd    to
    a92651c      
    Compare
  
    
…raph(npu) & support dsv3_2 mtp
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist