xinference 用内置executor 跑deepseek MTP  模式性能低

### Is your feature request related to a problem? Please describe
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
目前用xinference 内置的executor 去跑deepseek MTP 模式，性能会低，大概低25% 
【相比较于用vllm内置原生的executor】

<img width="1125" height="493" alt="Image" src="https://github.com/user-attachments/assets/fbdf2c7b-f6a6-41b2-8d84-13fb4db1e578" />

### Describe the solution you'd like
A clear and concise description of what you want to happen.
有可能是因为xoscar 内部没有对pytorch 做zero copy 序列化处理。目前还需要调研

### Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
在序列化处理逻辑添加。 https://github.com/xorbitsai/xoscar/tree/main/python/xoscar/serialization
### Additional context
Add any other context or screenshots about the feature request here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

xinference 用内置executor 跑deepseek MTP 模式性能低 #176

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

xinference 用内置executor 跑deepseek MTP 模式性能低 #176

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions