[CORE] Make sure callback finishes execution before running it the second time for the next run in separate thread#36304
Conversation
e6bebcd to
be3872c
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses an async callback re-entrancy race in ov::IAsyncInferRequest where a callback that restarts inference and blocks execution could cause subsequent callback delivery to be missed, leading to accounting/pending-request hangs. It also adds an Intel NPU functional test intended to deterministically catch callback-miss regressions under callback re-entry + delay stress.
Changes:
- Serialize async callback execution by introducing a new mutex (
m_callback_invoke_mutex) and using it during callback assignment/invocation/clearing inIAsyncInferRequest. - Adjust callback handling in the final pipeline stage to avoid the previous “swap-out callback” behavior that could cause missed callbacks under re-entrancy.
- Add an Intel NPU functional test that restarts async inference from within the callback and sleeps to stress callback delivery ordering.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/plugins/intel_npu/tests/functional/behavior/ov_infer_request/infer_request_run.hpp |
Adds a regression test for callback re-entry with sleep to detect missed/empty callback delivery. |
src/inference/src/dev/iasync_infer_request.cpp |
Changes callback locking/invocation behavior to prevent overlapping callback executions. |
src/inference/dev_api/openvino/runtime/iasync_infer_request.hpp |
Adds m_callback_invoke_mutex to support new callback synchronization. |
9336030 to
89ebe0b
Compare
89ebe0b to
1d2c44d
Compare
1d2c44d to
6e9c728
Compare
6e9c728 to
5881d3d
Compare
| mutable std::mutex m_mutex; | ||
| mutable std::mutex m_callback_invoke_mutex; | ||
| std::function<void(std::exception_ptr)> m_callback; | ||
| std::exception_ptr m_pending_callback_exception = nullptr; |
There was a problem hiding this comment.
Let's first agree on the changes, and after that, update the doc if needed.
5881d3d to
e100d1d
Compare
| mutable std::mutex m_mutex; | ||
| mutable std::recursive_mutex m_callback_invoke_mutex; | ||
| std::function<void(std::exception_ptr)> m_callback; | ||
| // Exception thrown by user callback is captured and rethrown on the next start_async()/infer() call. | ||
| // This avoids deadlocks when callbacks re-enter inference on the same request. | ||
| std::exception_ptr m_pending_callback_exception = nullptr; |
There was a problem hiding this comment.
Let's first agree on the changes, and after that, update the doc if needed.
| mutable std::mutex m_mutex; | ||
| mutable std::recursive_mutex m_callback_invoke_mutex; | ||
| std::function<void(std::exception_ptr)> m_callback; |
There was a problem hiding this comment.
Not sure, but I think recursive_mutex is better to keep it more appropriate to the older approach as well.
…infer Signed-off-by: Bogdan Pereanu <bogdan.pereanu@intel.com>
Signed-off-by: Bogdan Pereanu <bogdan.pereanu@intel.com>
Signed-off-by: Bogdan Pereanu <bogdan.pereanu@intel.com>
Signed-off-by: Bogdan Pereanu <bogdan.pereanu@intel.com>
19ed7bf to
2aac8d1
Compare
Signed-off-by: Bogdan Pereanu <bogdan.pereanu@intel.com>
125dbd4 to
3fa22d2
Compare
Details:
This change fixes an async callback race where callback delivery could be missed for a request when the callback restarts inference and holds execution. In stressed scenarios, this could leave accounting's checkout inconsistent and cause pending-request hangs.
Expected outcome:
No unexpected “empty/missed callback” symptom under callback re-entry stress.
No pending-request stall caused by callback/accounting mismatch.
Deterministic failure if callback delivery regresses again. (NPU test case)
Tickets:
AI Assistance: