Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] /streaming_chat endpoint PoC #1527

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

TamiTakamiya
Copy link
Contributor

Jira Issue: https://issues.redhat.com/browse/AAP-39044

Description

A PoC for /streaming_chat endpoint

Testing

Steps to test

  1. Pull down the PR
  2. ...
  3. ...

Scenarios tested

Production deployment

  • This code change is ready for production on its own
  • This code change requires the following considerations before going to production:

@TamiTakamiya TamiTakamiya changed the base branch from main to TamiTakamiya/AAP-39043/chatbot-streaming-ui February 10, 2025 21:12
Base automatically changed from TamiTakamiya/AAP-39043/chatbot-streaming-ui to main February 10, 2025 21:21
@TamiTakamiya TamiTakamiya force-pushed the TamiTakamiya/AAP-39044/streaming-chat-endpoint-poc branch 6 times, most recently from dae29f3 to 91c8daa Compare February 14, 2025 21:35
@TamiTakamiya TamiTakamiya force-pushed the TamiTakamiya/AAP-39044/streaming-chat-endpoint-poc branch from e833b89 to d251b56 Compare February 18, 2025 18:28
@TamiTakamiya TamiTakamiya force-pushed the TamiTakamiya/AAP-39044/streaming-chat-endpoint-poc branch 2 times, most recently from 2d9bdbf to eed802b Compare February 18, 2025 20:06
Copy link
Contributor

@manstis manstis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a peek @TamiTakamiya

I don't think we need the async_invoke function as you've now moved streaming support to a new type of pipeline (ModelPipelineStreamingChatBot).

"inference_url": chatbot_service_url or "http://localhost:8000",
"model_id": chatbot_service_model_id or "granite3-8b",
"verify_ssl": model_service_verify_ssl,
"stream": False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO use of environment variables is deprecated.

ANSIBLE_AI_MODEL_MESH_CONFIG should be considered the master configuration.

I'd therefore propose:

  • Add a new line after L#182: chatbot_streaming = os.getenv("CHATBOT_STREAMING")
  • Change L#201 to "stream": chatbot_streaming
  • Revert this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.

With this streaming support, AI Connect Service will have two chat endpoints: /chat and /streaming_chat and natually ANSIBLE_AI_MODEL_MESH_CONFIG is going to have two pipeline definitions: ModelPipelineChatBot and ModelPipelineStreamingChatBot.

I started thinking that the stream property in config of ModelPipelineChatBot should always be false and the stream property in config of ModelPipelineStreamingChatBot should always be true because that's how each pipeline is written.

The CHATBOT_STREAMING envvar sets the stream hidden form field that is used for letting Chatbot UI know which endpoint (streaming/non-streaming) to be called. That's my intention.

Realistically, when ANSIBLE_AI_MODEL_MESH_CONFIG contains ModelPipelineStreamingChatBot, I think we can assume the streaming mode is used and we can eliminate that envvar. I will implement in that way and when we see a requirement to override the default, add an envvar or some other option.

)


StreamingChatBotResponse = Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be StreamingChatBotResponse = AsyncGenerator

@@ -274,6 +301,9 @@ def alias() -> str:
def invoke(self, params: PIPELINE_PARAMETERS) -> PIPELINE_RETURN:
raise NotImplementedError

def async_invoke(self, params: PIPELINE_PARAMETERS) -> AsyncGenerator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't needed (yes, yes, I know I proposed it.. but I found a better way.. I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update my code today :-) Thanks.

def __init__(self, config: HttpConfiguration):
super().__init__(config=config)

def invoke(self, params: StreamingChatBotParameters) -> StreamingHttpResponse:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change L#222-L#225 to simply be:

async def invoke(self, params: StreamingChatBotParameters) -> StreamingChatBotResponse:

i.e. there is no need for an invoke AND async_invoke function; just an invoke function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TamiTakamiya it looks like you misunderstood some of my proposals in this latest commit.

Happy to go through with you tomorrow.

},
summary="Streaming chat request",
)
def post(self, request) -> Response:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the return type be StreamingHttpResponse?

self.event.modelName = self.req_model_id or self.llm.config.model_id

return StreamingHttpResponse(
self.llm.async_invoke(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my other changes are integrated; this can simply self.llm.invoke(....)

@@ -159,7 +159,7 @@ def assert_basic_data(
self.assert_common_data(data, expected_status, deployed_region)
timestamp = data["timestamp"]
dependencies = data.get("dependencies", [])
self.assertEqual(10, len(dependencies))
self.assertEqual(11, len(dependencies))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI, there's also a test in ansible-wisdom-testing that will break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the information.

I thought I should introduce a constant for this instead of replacing 10 with 11 (long time ago I was told immediate values other than 0 should not appear in source codes), but I took the easy way :-)

@TamiTakamiya TamiTakamiya force-pushed the TamiTakamiya/AAP-39044/streaming-chat-endpoint-poc branch from c661a8d to da5c188 Compare February 19, 2025 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants