Skip to content

Conversation

@FionaZZ92
Copy link

@FionaZZ92 FionaZZ92 commented Mar 21, 2025

Add new module of Ollama-OV which integrate OpenVINO GenAI as backend engine of Ollama to accelerate LLM inference on Intel platforms (CPU/iGPU/dGPU/NPU).

  • current feature: LLM serving with OV genAI 2025RC2 supported model with single request/response token inference.
  • will add:
  1. streaming token generation - Done.
  2. update to 2025RC2. (Current 2025.2.0.0.dev20250320) - Done

@FionaZZ92 FionaZZ92 requested a review from a team as a code owner March 21, 2025 11:24
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Mar 21, 2025
@ilya-lavrenov
Copy link
Contributor

current feature: LLM serving with OV genAI 2025RC1 supported model with single request/response token inference

will it be beneficial in future to reuse OpenVINO GenAI Cont Baching pipeline which is faster in case of multiple requests?

@ilya-lavrenov
Copy link
Contributor

could you please also add a single github actions workflow to check that integration is working?

@FionaZZ92
Copy link
Author

current feature: LLM serving with OV genAI 2025RC1 supported model with single request/response token inference

will it be beneficial in future to reuse OpenVINO GenAI Cont Baching pipeline which is faster in case of multiple requests?

Our current integration just adopt ov::genai::LLMPipeline basic API integration, and the streaming generate with callback func is WIP. All these two workload are single client single server mode. I do not sure the invoke condition with CB under this mode, and optimization necessary. I think situation should be all aligned with current genAI benchmark.

For multi-requests, yes, that should be have potential benefits. It will require to wrapper interface for ov::genai::ContinuousBatchingPipeline. I think it should be similar way of OVMS. I won't expect this feature in nearly version with 25.1 FRC, unless has customer request directly relevant to it.

@FionaZZ92
Copy link
Author

FionaZZ92 commented Mar 24, 2025

could you please also add a single github actions workflow to check that integration is working?

I think it is not make sense to add .github action in this folder cause github will think it is subrepo require to have hyperlink with the official repo. It means we have to maintain another official repo. If you really want to have an action check, do you think we add this submodule source compiling & test to here is enough? I think no need to download all model to test, that's too heavy.

@FionaZZ92 FionaZZ92 requested a review from a team as a code owner March 26, 2025 05:57
@github-actions github-actions bot added the category: CI OpenVINO public CI label Mar 26, 2025
@zhaohb
Copy link
Contributor

zhaohb commented Mar 26, 2025

could you please also add a single github actions workflow to check that integration is working?

Hi @ilya-lavrenov We have added an action workflow and verified that the integration is working, please review it.

@FionaZZ92
Copy link
Author

Able to merge it now?

@FionaZZ92
Copy link
Author

@alvoron Learn that you are the one of owners for openvino_contrib, could you please help to review this PR. And help to approve merge if no more change need.

@ilya-lavrenov ilya-lavrenov merged commit 0e838a4 into openvinotoolkit:master Mar 27, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: CI OpenVINO public CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants