-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for asynchronous requests for watsonx.ai chat #1666
Conversation
Signed-off-by: Paweł Knes <[email protected]>
Looks good. I've merged to check it in my branch of testing. The unit tests are failing but might not be related. @elronbandel and @eladven are looking at it. |
Signed-off-by: Paweł Knes <[email protected]>
@yoavkatz one error is related to the obsolete version of |
I see the catalog consistency test failing as well but it seems to be an issue with HF |
@yoavkatz |
Added support for asynchronous requests in the
WMLInferenceEngineChat
. The default concurrency limit is set to be the same as in the case of theWMLInferenceEngineGeneration
: 10.Small performance test (for the dataset at the bottom, averaged for 3 runs each):
concurrency_limit=10
): 6.4 seconds