-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] /streaming_chat endpoint PoC #1527
base: main
Are you sure you want to change the base?
[DRAFT] /streaming_chat endpoint PoC #1527
Conversation
dae29f3
to
91c8daa
Compare
e833b89
to
d251b56
Compare
2d9bdbf
to
eed802b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a peek @TamiTakamiya
I don't think we need the async_invoke
function as you've now moved streaming support to a new type of pipeline (ModelPipelineStreamingChatBot
).
"inference_url": chatbot_service_url or "http://localhost:8000", | ||
"model_id": chatbot_service_model_id or "granite3-8b", | ||
"verify_ssl": model_service_verify_ssl, | ||
"stream": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment.
With this streaming support, AI Connect Service will have two chat endpoints: /chat
and /streaming_chat
and natually ANSIBLE_AI_MODEL_MESH_CONFIG
is going to have two pipeline definitions: ModelPipelineChatBot
and ModelPipelineStreamingChatBot
.
I started thinking that the stream
property in config
of ModelPipelineChatBot
should always be false
and the stream
property in config
of ModelPipelineStreamingChatBot
should always be true
because that's how each pipeline is written.
The CHATBOT_STREAMING
envvar sets the stream
hidden form field that is used for letting Chatbot UI know which endpoint (streaming/non-streaming) to be called. That's my intention.
Realistically, when ANSIBLE_AI_MODEL_MESH_CONFIG
contains ModelPipelineStreamingChatBot
, I think we can assume the streaming mode is used and we can eliminate that envvar. I will implement in that way and when we see a requirement to override the default, add an envvar or some other option.
) | ||
|
||
|
||
StreamingChatBotResponse = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should be StreamingChatBotResponse = AsyncGenerator
@@ -274,6 +301,9 @@ def alias() -> str: | |||
def invoke(self, params: PIPELINE_PARAMETERS) -> PIPELINE_RETURN: | |||
raise NotImplementedError | |||
|
|||
def async_invoke(self, params: PIPELINE_PARAMETERS) -> AsyncGenerator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method isn't needed (yes, yes, I know I proposed it.. but I found a better way.. I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update my code today :-) Thanks.
def __init__(self, config: HttpConfiguration): | ||
super().__init__(config=config) | ||
|
||
def invoke(self, params: StreamingChatBotParameters) -> StreamingHttpResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change L#222-L#225 to simply be:
async def invoke(self, params: StreamingChatBotParameters) -> StreamingChatBotResponse:
i.e. there is no need for an invoke
AND async_invoke
function; just an invoke
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TamiTakamiya it looks like you misunderstood some of my proposals in this latest commit.
Happy to go through with you tomorrow.
ansible_ai_connect/ai/api/views.py
Outdated
}, | ||
summary="Streaming chat request", | ||
) | ||
def post(self, request) -> Response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the return type be StreamingHttpResponse
?
ansible_ai_connect/ai/api/views.py
Outdated
self.event.modelName = self.req_model_id or self.llm.config.model_id | ||
|
||
return StreamingHttpResponse( | ||
self.llm.async_invoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my other changes are integrated; this can simply self.llm.invoke(....)
@@ -159,7 +159,7 @@ def assert_basic_data( | |||
self.assert_common_data(data, expected_status, deployed_region) | |||
timestamp = data["timestamp"] | |||
dependencies = data.get("dependencies", []) | |||
self.assertEqual(10, len(dependencies)) | |||
self.assertEqual(11, len(dependencies)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFYI, there's also a test in ansible-wisdom-testing
that will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the information.
I thought I should introduce a constant for this instead of replacing 10
with 11
(long time ago I was told immediate values other than 0 should not appear in source codes), but I took the easy way :-)
c661a8d
to
da5c188
Compare
|
Jira Issue: https://issues.redhat.com/browse/AAP-39044
Description
A PoC for
/streaming_chat
endpointTesting
Steps to test
Scenarios tested
Production deployment