-
Notifications
You must be signed in to change notification settings - Fork 169
Ollama openvino integration #953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama openvino integration #953
Conversation
will it be beneficial in future to reuse OpenVINO GenAI Cont Baching pipeline which is faster in case of multiple requests? |
|
could you please also add a single github actions workflow to check that integration is working? |
Our current integration just adopt ov::genai::LLMPipeline basic API integration, and the streaming generate with callback func is WIP. All these two workload are single client single server mode. I do not sure the invoke condition with CB under this mode, and optimization necessary. I think situation should be all aligned with current genAI benchmark. For multi-requests, yes, that should be have potential benefits. It will require to wrapper interface for ov::genai::ContinuousBatchingPipeline. I think it should be similar way of OVMS. I won't expect this feature in nearly version with 25.1 FRC, unless has customer request directly relevant to it. |
I think it is not make sense to add .github action in this folder cause github will think it is subrepo require to have hyperlink with the official repo. It means we have to maintain another official repo. If you really want to have an action check, do you think we add this submodule source compiling & test to here is enough? I think no need to download all model to test, that's too heavy. |
Disable cgocheck for runtime as well.
Hi @ilya-lavrenov We have added an action workflow and verified that the integration is working, please review it. |
|
Able to merge it now? |
|
@alvoron Learn that you are the one of owners for openvino_contrib, could you please help to review this PR. And help to approve merge if no more change need. |
Add new module of Ollama-OV which integrate OpenVINO GenAI as backend engine of Ollama to accelerate LLM inference on Intel platforms (CPU/iGPU/dGPU/NPU).