[GPU] Make singleton OCL Context#32738
Conversation
e56e26f to
5a3c56e
Compare
There was a problem hiding this comment.
Pull request overview
This PR implements singleton OCL context for the Intel GPU plugin to enable buffer sharing across multiple ov::Core instances. The change ensures that a single OCL context is created per process rather than per Core instance, allowing GenAI applications to use multiple Core instances while sharing GPU resources efficiently.
Key changes:
- Moved OCL context initialization from instance-level to static singleton in
Plugin::get_default_contexts() - Updated documentation across C++, Python, JavaScript, and Node.js bindings to reflect the new multi-Core behavior
- Added test to verify singleton OCL context behavior across multiple Core instances
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/plugins/intel_gpu/tests/functional/behavior/infer_request.cpp |
Added test case verifying that OCL contexts from different Core instances are identical |
src/plugins/intel_gpu/src/plugin/plugin.cpp |
Moved m_default_contexts and m_default_contexts_once from member variables to static variables for singleton behavior |
src/plugins/intel_gpu/include/intel_gpu/plugin/plugin.hpp |
Removed member variable declarations that are now static in the implementation |
src/inference/include/openvino/runtime/core.hpp |
Updated documentation to clarify that multiple Core instances now share underlying device resources |
src/bindings/python/src/pyopenvino/core/core.cpp |
Updated Python binding documentation to reflect resource sharing behavior |
src/bindings/python/src/openvino/_ov_api.pyi |
Updated Python type stub documentation for consistency |
src/bindings/js/node/lib/addon.ts |
Updated JavaScript/TypeScript documentation to reflect resource sharing |
docs/sphinx_setup/api/nodejs_api/openvino-node/interfaces/Core.rst |
Updated Node.js API documentation for consistency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ceee6c1 to
e1b7306
Compare
|
Please check behavior of sample app |
ad172d0 to
ff30c9f
Compare
|
This is not required for 2026.0 because genai is still using singleton core. openvinotoolkit/openvino.genai#2952 This can be merged in 2026.1 timeframe. |
src/inference/src/cpp/core.cpp
Outdated
| ov::DeviceIDParser parser(device_name); | ||
| std::string devName = parser.get_device_name(); | ||
|
|
||
| _impl->get_plugin(devName).cleanup(); |
There was a problem hiding this comment.
@praasz could you review this part? I added new plugin API cleanup to ensure context cleanup before unloading plugin. Without this, it caused hang on windows.
src/inference/src/cpp/core.cpp
Outdated
| @@ -265,17 +271,17 @@ void Core::unload_plugin(const std::string& device_name) { | |||
| OV_CORE_CALL_STATEMENT({ | |||
| ov::DeviceIDParser parser(device_name); | |||
| std::string devName = parser.get_device_name(); | |||
|
|
|||
| _impl->get_plugin(devName).cleanup(); | |||
There was a problem hiding this comment.
What reason to add this new method to plugins api? Can you perform the same steps inside of ov::intel_gpu::Plugin dtor?
### Details: - Make OCL context as singleton - GenAI tries to change its behavior to multi ov::Core, while buffers can be shared across ov::Core - We had two choices - explicit context sharing among multiple ov::Core - singleton OCL context - The expectation is that single OCL context is created for one process. So it should be fine to use singleton ocl context. - Singleton OCL context was implemented with weak ptr. It was design choice to release cl_context before plugin unload. - As Plugin class owns cl_context through shared_ptr, singleton cl_context will be freed when all Core instances are destructed. ### Tickets: - CVS-178537
Details:
Tickets: