-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[Serve.llm] Refactor LLMServer and LLMEngine to not diverge too much from vllm chat formatting logic #52597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
kouroshHakha
merged 30 commits into
ray-project:master
from
kouroshHakha:kh/fix-vlm-chat-template
Apr 30, 2025
Merged
[Serve.llm] Refactor LLMServer and LLMEngine to not diverge too much from vllm chat formatting logic #52597
Changes from 6 commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
b670ce5
refactor
kouroshHakha 28cd858
wip
kouroshHakha 37d624d
lint
kouroshHakha 1799bf5
wip
kouroshHakha 6594f34
wip
kouroshHakha ff2bb07
wip
kouroshHakha be35a11
wip
kouroshHakha e1045f0
Merge branch 'master' into kh/fix-vlm-chat-template
kouroshHakha 9da0c2a
fixed tests
kouroshHakha f5b4958
fixed release tests
kouroshHakha 7ccd0d4
wip
kouroshHakha 5804ba5
Update python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engin…
kouroshHakha 3101e81
Update python/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engin…
kouroshHakha b6eed94
removed serve context stuff
kouroshHakha 2ed151e
Merge branch 'kh/fix-vlm-chat-template' of https://github.com/kourosh…
kouroshHakha d74febc
wip
kouroshHakha 93ee153
wip
kouroshHakha a4b34f3
wip
kouroshHakha d4d2a81
wip
kouroshHakha be06a5c
fixed test
kouroshHakha db40d54
Fixed tests
kouroshHakha fd43abd
wip
kouroshHakha c1c4f2b
wip
kouroshHakha 0afb5f6
wip
kouroshHakha f473e27
wip
kouroshHakha d1e4164
wip
kouroshHakha 89e61c5
wip
kouroshHakha 84f39fe
wip
kouroshHakha 830a4b8
wip
kouroshHakha 916acea
Merge branch 'master' into kh/fix-vlm-chat-template
kouroshHakha File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
59 changes: 59 additions & 0 deletions
59
python/ray/llm/_internal/serve/deployments/llm/llm_engine.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
from typing import AsyncGenerator, Optional | ||
|
||
from ray.llm._internal.serve.configs.server_models import ( | ||
Prompt, | ||
LLMRawResponse, | ||
LLMConfig, | ||
GenerationRequest, | ||
DiskMultiplexConfig, | ||
) | ||
|
||
|
||
import abc | ||
|
||
|
||
class LLMEngine(abc.ABC): | ||
"""Base class for all LLM engines""" | ||
|
||
def __init__(self, llm_config: LLMConfig): | ||
self._llm_config = llm_config | ||
|
||
@abc.abstractmethod | ||
async def start(self): | ||
"""Start the engine""" | ||
pass | ||
|
||
@abc.abstractmethod | ||
async def prepare_request( | ||
self, | ||
request_id: str, | ||
prompt: Prompt, | ||
stream: bool, | ||
disk_lora_model: Optional[DiskMultiplexConfig] = None, | ||
**kwargs, | ||
) -> GenerationRequest: | ||
"""Prepare an EngineRequest for the engine""" | ||
pass | ||
|
||
@abc.abstractmethod | ||
async def generate( | ||
self, request: GenerationRequest | ||
) -> AsyncGenerator[LLMRawResponse, None]: | ||
"""Generate an LLMRawResponse stream""" | ||
pass | ||
|
||
async def check_health(self): | ||
kouroshHakha marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Check the health of the engine""" | ||
pass | ||
|
||
async def sleep(self): | ||
"""Puts the engine to sleep""" | ||
pass | ||
|
||
async def wakeup(self): | ||
"""Wakes up the engine""" | ||
pass | ||
|
||
def shutdown(self): | ||
kouroshHakha marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Shuts down the engine""" | ||
pass |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.