[GPU] Defer allocations of inputs by intbf · Pull Request #35126 · openvinotoolkit/openvino

intbf · 2026-04-02T14:43:08Z

In input_layout_node try to skip early mem allocations so that we can avoid mem increase for large inputs.

This optimization saves the total memory peak by the phi silica application from 10gb down to 6gb.

Details:

Before this change the allocate_mem was skipped (set to false) for example for dynamic shapes and internal networks. The PR forces it always to be false and also handle cases where the inputs are expected to be present (check for null, or allocate temp buffer for simplicity).

See the early version of the presentation: https://intel-my.sharepoint.com/:p:/p/bartlomiej_filipek/IQCJ4tTQG0XHQYjMm_FAUlyIAf2FeLPfsthPN6xxJt4TD-I?e=DOG6DL

Tickets:

CVS-178139

AI Assistance:

AI assistance used: yes
If yes, summarize how AI was used: Ai generated most of the code after several iterations. Manually tested and debugged on the phi silica script app.

Perf/mem Comparison:

Using benchmark_app.exe, LunarLake 5 236V, 16GB, iGPU,

Model	PR Avg FPS	Master Avg FPS	Δ FPS	PR Compile RAM	Master Compile RAM	Δ RAM
YOLOv3	~115.5	~115.0	~+0.4%	~256 MB	~256 MB	≈0
PSD2	~55.3	~55.9	~-1.1%	~841 MB	~825 MB	~+16 MB PR
PSD7	~5.28	~5.27	~+0.2%	~1362 MB	~1361 MB	≈0
PSR	~5.45	~5.47	~-0.4%	~5633 MB	~5633 MB	≈0
ResNet-50	~1160	~1158	~+0.2%	~1070 MB	~1066 MB	≈0

PR - binaries compiled with this PR
Master - OpenVino Master, as of 14th April, 2075ff4

Kotomi-Du · 2026-04-04T03:08:47Z

build_jenkins

Lyamin-Roman

Have there been any measurements of the impact of this change on performance?
And if there is an impact, maybe add a property like "disable_input_preallocation", that you will use in your application

intbf · 2026-04-06T18:47:00Z

Have there been any measurements of the impact of this change on performance? And if there is an impact, maybe add a property like "disable_input_preallocation", that you will use in your application

I haven't observed any performance drop/changes in the test phi_silica_app (on dummy weights). Also @Kotomi-Du checked the app on real weights and the performance and conformance was good.

Kotomi-Du · 2026-04-06T19:58:59Z

I confirmed the TPS for 2nd+ token is same as before; the PR should potentially impact first token latency because memory allocation for all nodes are deferred to the very first inference.
@intbf Why the feature has to be applied for the input of all operators? Could it be narrowed down only for parameters.

p-durandin · 2026-04-07T09:07:36Z

@intbf @Kotomi-Du please take a look on dynamic tests errors:
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1886 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1820 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=7_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1517 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1669 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1077 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 2029 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 2062 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=7_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 2070 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1575 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1417 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1374 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1575 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1529 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1555 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU

p-durandin · 2026-04-07T15:25:33Z

SIGABRT for 14 tests...

intbf · 2026-04-07T15:45:28Z

SIGABRT for 14 tests...

yes, the main issue is that the memory is allocated/updated lazily in set_data, but some tests doesn't call this. So in some cases I will try to update the tests and allocate memory so that no AV happens.

intbf · 2026-04-08T10:34:45Z

I confirmed the TPS for 2nd+ token is same as before; the PR should potentially impact first token latency because memory allocation for all nodes are deferred to the very first inference. @intbf Why the feature has to be applied for the input of all operators? Could it be narrowed down only for parameters.

Correction: this change already applies to Parameters, as during the compilation ov::op::v0::Parameter converts/creates cldnn::input_layout primitive.

isanghao · 2026-04-08T11:33:56Z

    std::vector<event::ptr> set_output_memory(const primitive_id& id, memory::ptr mem, bool is_remote = false);

    std::vector<std::shared_ptr<primitive_inst>> const& get_outputs() { return _outputs; }



random spot)

Based on the description, I think the performance impact might be observed in warm-up phase of static-shape model. Could you check that? maybe especially for large-input models, such as detection model.

How does this interact with memory pool reusing? Does it impact memory usage? or the input_layout memory was just not reused from memory pool and it does not have memory usage impact?

I updated the description with some benchmark runs: for psd2, psd7, psr, yolo... is that good enough? Or should I add more tests?

As for the second question: I'll try to prepare a unit test that would show if the memory is reused (following our discussion on chat)

maxnick · 2026-04-09T08:55:53Z

Several tests are still failing on DG2 GPU:

[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7010/24717] quantize_smoke/quantize_random_test.random/22 (596 ms)
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ERROR: ld.so: object 'libSegFault.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: WARNING: cl_cache_dir is not set. Test will take longer than expected
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;33mNote: Google Test filter = quantize_smoke/quantize_random_test.random/22
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite.
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment set-up.
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PG INFO ] �[mPostgreSQL Reporting is disabled due to missing environment settings
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ RUN ] �[mquantize_smoke/quantize_random_test.random/22
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: src/plugins/intel_gpu/tests/unit/test_cases/quantize_gpu_test.cpp:1054: Failure
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: The difference between opt_out_val and ref_out_val is 255, which exceeds 1, where
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: opt_out_val evaluates to 0,
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ref_out_val evaluates to 255, and
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 evaluates to 1.
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: index = 1536
[2026-04-08T15:40:49.483Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/22, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-19 CF-7A AB-61 00-00 09-00 00-00 00-00 00-00 38-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 40-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 48-19 CF-7A AB-61 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00> (44 ms)
[2026-04-08T15:40:49.483Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test (44 ms total)
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment tear-down
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[==========] �[m1 test from 1 test suite ran. (46 ms total)
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PASSED ] �[m0 tests.
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[m1 test, listed below:
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/22, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-19 CF-7A AB-61 00-00 09-00 00-00 00-00 00-00 38-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 40-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 48-19 CF-7A AB-61 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00>
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 FAILED TEST
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7010/24717] quantize_smoke/quantize_random_test.random/22 returned/aborted with exit code 1 (596 ms)
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7011/24717] quantize_smoke/quantize_random_test.random/16 (1254 ms)
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7012/24717] quantize_smoke/quantize_random_test.random/24 (557 ms)
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ERROR: ld.so: object 'libSegFault.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: WARNING: cl_cache_dir is not set. Test will take longer than expected
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;33mNote: Google Test filter = quantize_smoke/quantize_random_test.random/24
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite.
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment set-up.
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PG INFO ] �[mPostgreSQL Reporting is disabled due to missing environment settings
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ RUN ] �[mquantize_smoke/quantize_random_test.random/24
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: src/plugins/intel_gpu/tests/unit/test_cases/quantize_gpu_test.cpp:1054: Failure
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: The difference between opt_out_val and ref_out_val is 255, which exceeds 1, where
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: opt_out_val evaluates to 0,
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ref_out_val evaluates to 255, and
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 evaluates to 1.
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: index = 0
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/24, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-69 72-1C 9F-55 00-00 09-00 00-00 00-00 00-00 38-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 40-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 48-69 72-1C 9F-55 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00> (38 ms)
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test (38 ms total)
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment tear-down
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[==========] �[m1 test from 1 test suite ran. (40 ms total)
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PASSED ] �[m0 tests.
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[m1 test, listed below:
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/24, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-69 72-1C 9F-55 00-00 09-00 00-00 00-00 00-00 38-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 40-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 48-69 72-1C 9F-55 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00>
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 FAILED TEST
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7012/24717] quantize_smoke/quantize_random_test.random/24 returned/aborted with exit code 1 (557 ms)
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7013/24717] quantize_smoke/quantize_random_test.random/25 (663 ms)
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7014/24717] quantize_smoke/quantize_random_test.random/27 (599 ms)
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ERROR: ld.so: object 'libSegFault.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: WARNING: cl_cache_dir is not set. Test will take longer than expected
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;33mNote: Google Test filter = quantize_smoke/quantize_random_test.random/27
[2026-04-08T15:40:49.489Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite.
[2026-04-08T15:40:49.489Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment set-up.
[2026-04-08T15:40:49.489Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PG INFO ] �[mPostgreSQL Reporting is disabled due to missing environment settings
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ RUN ] �[mquantize_smoke/quantize_random_test.random/27
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: src/plugins/intel_gpu/tests/unit/test_cases/quantize_gpu_test.cpp:1054: Failure
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: The difference between opt_out_val and ref_out_val is 45, which exceeds 1, where
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: opt_out_val evaluates to 204,
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ref_out_val evaluates to 249, and
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 evaluates to 1.
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: index = 3584
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/27, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-49 50-70 F3-5C 00-00 09-00 00-00 00-00 00-00 38-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 40-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 48-49 50-70 F3-5C 00-00 ... 05-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00> (36 ms)
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test (36 ms total)
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment tear-down
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[==========] �[m1 test from 1 test suite ran. (37 ms total)
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PASSED ] �[m0 tests.
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[m1 test, listed below:
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/27, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-49 50-70 F3-5C 00-00 09-00 00-00 00-00 00-00 38-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 40-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 48-49 50-70 F3-5C 00-00 ... 05-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00>
[2026-04-08T15:40:49.493Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.493Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 FAILED TEST

intbf · 2026-04-10T05:35:23Z

more tests failing:

smoke/DynamicShapeStatefulModelDefault.smoke_Run_Stateful_Dynamic_Default/0
smoke_Transpose_7D_Infer_Twice/Transpose7DInferTwiceTest.infer_twice_diff_shapes_same_request/0
KVCacheTests.smoke_multipleIterations_stateful_gather_with_initializer_batch_1_5
KVCacheTests.smoke_multipleIterations_stateful_gather_with_initializer
KVCacheTests.smoke_multipleIterations_stateful_with_set_state
KVCacheTests.smoke_multipleIterations_stateful_no_gather_no_initializer_cached
smoke/SDPAWithKVCacheTest.MultipleIterationStateful/with_rearrange=1_batch=1_et=f16_num_iter=10_num_groups=4_initial_batch=1_qkv_order=(0.2.1.3)_mask=1_scale=0_causal=1_compressed=0k_head=128v_head=64
smoke/SDPAWithKVCacheTest.MultipleIterationStateful/with_rearrange=1_batch=2_et=f16_num_iter=10_num_groups=4_initial_batch=1_qkv_order=(0.1.2.3)_mask=0_scale=0_causal=1_compressed=0k_head=96v_head=64

I'll work on them today

Kotomi-Du · 2026-04-10T02:41:49Z

    auto& eng = get_engine();
+
+    if (p_inst->output_memory_ptr())
+        _in_out_shared_mem_types.push_back(p_inst->output_memory_ptr()->get_internal_params().mem_type);


why it needs to store the mem_type information into this object?

[_in_out_shared_mem_types] is the cached vector of shared_mem_type enum values for all inputs/outputs. It is used in network.cpp execute() to check if it requires GPU surface locking before execution. The data is stored in network.cpp allocate_primitive_instance which must be called before execution. Why does deferred allocation have impact on this?

You're right, those two locations were wrong. The original motivation was: lazy input_layout nodes have null output_memory_ptr() at allocate_primitive_instance() time, so the normal push_back there is skipped. If the user later provides a shared surface (VA/DX11) via set_input_data(), _in_out_shared_mem_types would never record it

I moved it to set_input_data,

Kotomi-Du · 2026-04-10T23:45:15Z

build_jenkins

Kotomi-Du · 2026-04-11T00:37:32Z

-    for (auto const& input : _inputs) ret.push_back(input->output_memory_ptr()->get_layout());
+    for (auto const& input : _inputs) {
+        if (input->output_memory_ptr())
+            _in_out_shared_mem_types.push_back(input->output_memory_ptr()->get_internal_params().mem_type);


same question as above.

p-durandin · 2026-04-20T05:34:29Z

build_jenkins

intbf · 2026-04-20T20:40:26Z

Have there been any measurements of the impact of this change on performance? And if there is an impact, maybe add a property like "disable_input_preallocation", that you will use in your application

In the description I added some results from benchmark runs, is that good enough? Since there's not much perf impact maybe there's no need to introduce this extra ov flag?

maxnick · 2026-04-21T11:57:56Z

Shouldn't we extend the unit tests cope verifying that the proper input memory address is indeed set to the next node input?
Check if it correctly propagated though a chain of optimized ops (again verifying the address).
Rebinding external memory (set a new input memory multiple times).
Possible implicit memory binding when the output memory of the previous run is fed into the input of the second run. So we can end up in a situation when the input and the output of the same primitive is the same memory address, therefore it can overwrite its own input. Should be covered with the existing tests though, but we need to double check if these checks validate this new code path.
Multi output, when the lazy allocated memory is reused across several consumers.

thanks for the comment, I addressed those scenarios in memory_test.cpp, please have a look in the latest commit/update

Kotomi-Du · 2026-04-30T16:10:01Z

build_jenkins

Kotomi-Du · 2026-05-01T22:09:43Z

build_jenkins

In typed_primitive_inst force "allocate mem" to false so that we can avoid allocations of large inputs. Handle cases where the inputs are expected to be present (check for null, or allocate temp buffer for simplicity)

…remove comments

…y allocated (like loop primitive)

previously the tests expected that the memory would be preallocated for the network's input layouts, now it's not, so the tests were adjusted for that

…roperly update dependencies and internals, ensure _reset_arguments is called on set_input_data

the primitive_inst::set_output_memory function has short circuit logic that might do nothing when pointers are the same, but in some cases the same pointer can be set with different layout, and in those cases the old state wouldn't be properly updated.

…ed, code style and small refactor for unit tests

…mem, simplify handling of _in_out_shared_mem_types

…ccessing input_memory_ptr in update_output_memory functions

…the push_back so that the new type is recorded even for mem change

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…etter

Cover the reviewer's checklist for the lazy input allocation PR: - Buffer propagation to direct consumers after set_input_data - Multi-consumer fan-out from a single lazy input - Chain of optimized ops aliasing through the lazy buffer - Rebinding external memory across multiple inferences - Feeding previous output back as next inference input

github-actions Bot added the category: GPU OpenVINO GPU plugin label Apr 2, 2026

intbf requested review from Kotomi-Du, Lyamin-Roman and isanghao April 2, 2026 16:34

intbf marked this pull request as ready for review April 5, 2026 18:51

intbf requested review from a team as code owners April 5, 2026 18:51

intbf force-pushed the gpu_defer_input_allocations branch from 57b2748 to 45351f8 Compare April 5, 2026 18:52

p-durandin added this to the 2026.2 milestone Apr 6, 2026

p-durandin reviewed Apr 6, 2026

View reviewed changes

Comment thread src/plugins/intel_gpu/src/graph/input_layout.cpp Outdated

intbf changed the title ~~[GPU Plugin] Defer allocations of inputs~~ [GPU] Defer allocations of inputs Apr 6, 2026

Lyamin-Roman reviewed Apr 6, 2026

View reviewed changes

intbf force-pushed the gpu_defer_input_allocations branch from e838f77 to f73832f Compare April 7, 2026 13:20

isanghao reviewed Apr 8, 2026

View reviewed changes

intbf force-pushed the gpu_defer_input_allocations branch from ab418f8 to 5382f0b Compare April 8, 2026 14:08

maxnick self-assigned this Apr 9, 2026

intbf force-pushed the gpu_defer_input_allocations branch from 43b6fc1 to d209e8b Compare April 9, 2026 21:19

intbf force-pushed the gpu_defer_input_allocations branch from 481e216 to 1ef0611 Compare April 10, 2026 14:02

Kotomi-Du reviewed Apr 10, 2026

View reviewed changes

Kotomi-Du reviewed Apr 11, 2026

View reviewed changes

intbf force-pushed the gpu_defer_input_allocations branch 2 times, most recently from 84c5fce to 7a6c73b Compare April 17, 2026 13:27

maxnick requested a review from Kotomi-Du April 21, 2026 11:06

maxnick requested changes Apr 21, 2026

View reviewed changes

intbf force-pushed the gpu_defer_input_allocations branch 2 times, most recently from d6d78be to 51dab1b Compare April 29, 2026 19:31

intbf requested review from Lyamin-Roman, isanghao and maxnick April 30, 2026 14:33

intbf force-pushed the gpu_defer_input_allocations branch from 1141b51 to 8660895 Compare May 1, 2026 18:37

intbf and others added 15 commits May 4, 2026 11:30

Defer allocations of inputs

9bc5c30

In typed_primitive_inst force "allocate mem" to false so that we can avoid allocations of large inputs. Handle cases where the inputs are expected to be present (check for null, or allocate temp buffer for simplicity)

remove has_optimized_users which is not used and causes build error, …

857ebc7

…remove comments

remove the problematic comment with illegal characters for the codepage

4851394

allocate memory for scalars that might assume the memory is not lazil…

fb9c93e

…y allocated (like loop primitive)

fix unit tests for memory pool

ca6fb91

previously the tests expected that the memory would be preallocated for the network's input layouts, now it's not, so the tests were adjusted for that

keep eager allocation for blocked layout, call set_output_memory to p…

b2a8428

…roperly update dependencies and internals, ensure _reset_arguments is called on set_input_data

review comments: set _reset_arguments only when input was not allocat…

eabbb64

…ed, code style and small refactor for unit tests

rreview comments: primitive_inst check for input layout for the eltw …

e40d24e

…mem, simplify handling of _in_out_shared_mem_types

a unit test that verifies that build_deps() is safer to call before a…

1c3f8a4

…ccessing input_memory_ptr in update_output_memory functions

better handling of _in_out_shared_mem_types in set_input_data - move …

5e3f88a

…the push_back so that the new type is recorded even for mem change

Apply suggestion from @Copilot - remove std::cout

a22b06b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

move build_deps() to a safer spot for quantize and slice, fix a typo

bb6b811

address comments about hardcoded numbers in unit test, explain them b…

9583ec9

…etter

intbf force-pushed the gpu_defer_input_allocations branch from 8660895 to 7be2342 Compare May 4, 2026 18:30

		std::vector<event::ptr> set_output_memory(const primitive_id& id, memory::ptr mem, bool is_remote = false);

		std::vector<std::shared_ptr<primitive_inst>> const& get_outputs() { return _outputs; }

Conversation

intbf commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

AI Assistance:

Perf/mem Comparison:

Uh oh!

Kotomi-Du commented Apr 4, 2026

Uh oh!

Uh oh!

Lyamin-Roman left a comment

Choose a reason for hiding this comment

Uh oh!

intbf commented Apr 6, 2026

Uh oh!

Kotomi-Du commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p-durandin commented Apr 7, 2026

Uh oh!

p-durandin commented Apr 7, 2026

Uh oh!

intbf commented Apr 7, 2026

Uh oh!

intbf commented Apr 8, 2026

Uh oh!

isanghao Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

intbf Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

maxnick commented Apr 9, 2026

Uh oh!

intbf commented Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kotomi-Du Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Kotomi-Du Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

intbf Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Kotomi-Du commented Apr 10, 2026

Uh oh!

Kotomi-Du Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

p-durandin commented Apr 20, 2026

Uh oh!

intbf commented Apr 20, 2026

Uh oh!

Uh oh!

Uh oh!

maxnick Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

intbf Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Kotomi-Du commented Apr 30, 2026

Uh oh!

Kotomi-Du commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

intbf commented Apr 2, 2026 •

edited

Loading

Kotomi-Du commented Apr 6, 2026 •

edited

Loading

Kotomi-Du Apr 11, 2026 •

edited

Loading

maxnick Apr 21, 2026 •

edited

Loading