Skip to content

[GPU] Defer allocations of inputs#35126

Open
intbf wants to merge 15 commits intoopenvinotoolkit:masterfrom
intbf:gpu_defer_input_allocations
Open

[GPU] Defer allocations of inputs#35126
intbf wants to merge 15 commits intoopenvinotoolkit:masterfrom
intbf:gpu_defer_input_allocations

Conversation

@intbf
Copy link
Copy Markdown
Contributor

@intbf intbf commented Apr 2, 2026

In input_layout_node try to skip early mem allocations so that we can avoid mem increase for large inputs.

This optimization saves the total memory peak by the phi silica application from 10gb down to 6gb.

Details:

Before this change the allocate_mem was skipped (set to false) for example for dynamic shapes and internal networks. The PR forces it always to be false and also handle cases where the inputs are expected to be present (check for null, or allocate temp buffer for simplicity).

See the early version of the presentation: https://intel-my.sharepoint.com/:p:/p/bartlomiej_filipek/IQCJ4tTQG0XHQYjMm_FAUlyIAf2FeLPfsthPN6xxJt4TD-I?e=DOG6DL

Tickets:

CVS-178139

AI Assistance:

  • AI assistance used: yes
  • If yes, summarize how AI was used: Ai generated most of the code after several iterations. Manually tested and debugged on the phi silica script app.

Perf/mem Comparison:

Using benchmark_app.exe, LunarLake 5 236V, 16GB, iGPU,

Model PR Avg FPS Master Avg FPS Δ FPS PR Compile RAM Master Compile RAM Δ RAM
YOLOv3 ~115.5 ~115.0 ~+0.4% ~256 MB ~256 MB ≈0
PSD2 ~55.3 ~55.9 ~-1.1% ~841 MB ~825 MB ~+16 MB PR
PSD7 ~5.28 ~5.27 ~+0.2% ~1362 MB ~1361 MB ≈0
PSR ~5.45 ~5.47 ~-0.4% ~5633 MB ~5633 MB ≈0
ResNet-50 ~1160 ~1158 ~+0.2% ~1070 MB ~1066 MB ≈0

PR - binaries compiled with this PR
Master - OpenVino Master, as of 14th April, 2075ff4

@github-actions github-actions Bot added the category: GPU OpenVINO GPU plugin label Apr 2, 2026
@Kotomi-Du
Copy link
Copy Markdown
Contributor

build_jenkins

@intbf intbf marked this pull request as ready for review April 5, 2026 18:51
@intbf intbf requested review from a team as code owners April 5, 2026 18:51
@intbf intbf force-pushed the gpu_defer_input_allocations branch from 57b2748 to 45351f8 Compare April 5, 2026 18:52
@p-durandin p-durandin added this to the 2026.2 milestone Apr 6, 2026
Comment thread src/plugins/intel_gpu/src/graph/input_layout.cpp Outdated
@intbf intbf changed the title [GPU Plugin] Defer allocations of inputs [GPU] Defer allocations of inputs Apr 6, 2026
Copy link
Copy Markdown
Contributor

@Lyamin-Roman Lyamin-Roman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have there been any measurements of the impact of this change on performance?
And if there is an impact, maybe add a property like "disable_input_preallocation", that you will use in your application

@intbf
Copy link
Copy Markdown
Contributor Author

intbf commented Apr 6, 2026

Have there been any measurements of the impact of this change on performance? And if there is an impact, maybe add a property like "disable_input_preallocation", that you will use in your application

I haven't observed any performance drop/changes in the test phi_silica_app (on dummy weights). Also @Kotomi-Du checked the app on real weights and the performance and conformance was good.

@Kotomi-Du
Copy link
Copy Markdown
Contributor

Kotomi-Du commented Apr 6, 2026

I confirmed the TPS for 2nd+ token is same as before; the PR should potentially impact first token latency because memory allocation for all nodes are deferred to the very first inference.
@intbf Why the feature has to be applied for the input of all operators? Could it be narrowed down only for parameters.

@p-durandin
Copy link
Copy Markdown
Contributor

@intbf @Kotomi-Du please take a look on dynamic tests errors:
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1886 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1820 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=7_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1517 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)
}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1669 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.2]{(4.1.2)}{(10.1.2)
}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1077 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)
}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 2029 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]{(4.1.2)
}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 2062 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=5_dynamic_exit=7_axis=1_start_value=0_max_iter_num=5_IS=([?.1.2]
{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 2070 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic_exit/DynamicShapeLoopTest.Inference/static_iter_num=1_static_continue_cond=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.2]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1575 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1417 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=0_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1374 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1575 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=5_dynamic_exit=3_axis=1_start_value=0_max_iter_num=5_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1529 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=i32_targetDevice=GPU
12:49:26 [2026-04-07 08:49:26,294] [138309791061568] ov_gpu_func_tests-0 INFO: 1555 ms: /home/jenkins/agent/workspace/private-ci/ie/ie-tests-linux-ubuntu22-gpu/b/install/dldt_cpack/tests/ov_gpu_func_tests smoke_DynamicShapeLoop_conflict_dynamic/DynamicShapeLoopDynamicInputTest.Inference/static_iter_num=1_static_continue_cond=1_static_input_shape=1_max_iter_num=1_dynamic_exit=5_axis=1_start_value=0_max_iter_num=1_IS=([?.1.?]{(4.1.2)}{(10.1.2)}{(12.1.2)})netType=f32_targetDevice=GPU

@intbf intbf force-pushed the gpu_defer_input_allocations branch from e838f77 to f73832f Compare April 7, 2026 13:20
@p-durandin
Copy link
Copy Markdown
Contributor

SIGABRT for 14 tests...

@intbf
Copy link
Copy Markdown
Contributor Author

intbf commented Apr 7, 2026

SIGABRT for 14 tests...

yes, the main issue is that the memory is allocated/updated lazily in set_data, but some tests doesn't call this. So in some cases I will try to update the tests and allocate memory so that no AV happens.

@intbf
Copy link
Copy Markdown
Contributor Author

intbf commented Apr 8, 2026

I confirmed the TPS for 2nd+ token is same as before; the PR should potentially impact first token latency because memory allocation for all nodes are deferred to the very first inference. @intbf Why the feature has to be applied for the input of all operators? Could it be narrowed down only for parameters.

Correction: this change already applies to Parameters, as during the compilation ov::op::v0::Parameter converts/creates cldnn::input_layout primitive.

std::vector<event::ptr> set_output_memory(const primitive_id& id, memory::ptr mem, bool is_remote = false);

std::vector<std::shared_ptr<primitive_inst>> const& get_outputs() { return _outputs; }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random spot)

  • Based on the description, I think the performance impact might be observed in warm-up phase of static-shape model. Could you check that? maybe especially for large-input models, such as detection model.
  • How does this interact with memory pool reusing? Does it impact memory usage? or the input_layout memory was just not reused from memory pool and it does not have memory usage impact?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the description with some benchmark runs: for psd2, psd7, psr, yolo... is that good enough? Or should I add more tests?

As for the second question: I'll try to prepare a unit test that would show if the memory is reused (following our discussion on chat)

@intbf intbf force-pushed the gpu_defer_input_allocations branch from ab418f8 to 5382f0b Compare April 8, 2026 14:08
@maxnick maxnick self-assigned this Apr 9, 2026
@maxnick
Copy link
Copy Markdown
Contributor

maxnick commented Apr 9, 2026

Several tests are still failing on DG2 GPU:

[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7010/24717] quantize_smoke/quantize_random_test.random/22 (596 ms)
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ERROR: ld.so: object 'libSegFault.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: WARNING: cl_cache_dir is not set. Test will take longer than expected
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;33mNote: Google Test filter = quantize_smoke/quantize_random_test.random/22
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite.
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment set-up.
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PG INFO ] �[mPostgreSQL Reporting is disabled due to missing environment settings
[2026-04-08T15:40:49.481Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ RUN ] �[mquantize_smoke/quantize_random_test.random/22
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: src/plugins/intel_gpu/tests/unit/test_cases/quantize_gpu_test.cpp:1054: Failure
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: The difference between opt_out_val and ref_out_val is 255, which exceeds 1, where
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: opt_out_val evaluates to 0,
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ref_out_val evaluates to 255, and
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 evaluates to 1.
[2026-04-08T15:40:49.482Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: index = 1536
[2026-04-08T15:40:49.483Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/22, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-19 CF-7A AB-61 00-00 09-00 00-00 00-00 00-00 38-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 40-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 48-19 CF-7A AB-61 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00> (44 ms)
[2026-04-08T15:40:49.483Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test (44 ms total)
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment tear-down
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,080] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[==========] �[m1 test from 1 test suite ran. (46 ms total)
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PASSED ] �[m0 tests.
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[m1 test, listed below:
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/22, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-19 CF-7A AB-61 00-00 09-00 00-00 00-00 00-00 38-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 40-19 CF-7A AB-61 00-00 01-00 00-00 00-00 00-00 48-19 CF-7A AB-61 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00>
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.484Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 FAILED TEST
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7010/24717] quantize_smoke/quantize_random_test.random/22 returned/aborted with exit code 1 (596 ms)
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7011/24717] quantize_smoke/quantize_random_test.random/16 (1254 ms)
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7012/24717] quantize_smoke/quantize_random_test.random/24 (557 ms)
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ERROR: ld.so: object 'libSegFault.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: WARNING: cl_cache_dir is not set. Test will take longer than expected
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;33mNote: Google Test filter = quantize_smoke/quantize_random_test.random/24
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite.
[2026-04-08T15:40:49.485Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment set-up.
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PG INFO ] �[mPostgreSQL Reporting is disabled due to missing environment settings
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ RUN ] �[mquantize_smoke/quantize_random_test.random/24
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: src/plugins/intel_gpu/tests/unit/test_cases/quantize_gpu_test.cpp:1054: Failure
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: The difference between opt_out_val and ref_out_val is 255, which exceeds 1, where
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: opt_out_val evaluates to 0,
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ref_out_val evaluates to 255, and
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 evaluates to 1.
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: index = 0
[2026-04-08T15:40:49.486Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/24, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-69 72-1C 9F-55 00-00 09-00 00-00 00-00 00-00 38-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 40-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 48-69 72-1C 9F-55 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00> (38 ms)
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test (38 ms total)
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment tear-down
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[==========] �[m1 test from 1 test suite ran. (40 ms total)
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PASSED ] �[m0 tests.
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[m1 test, listed below:
[2026-04-08T15:40:49.487Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/24, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-69 72-1C 9F-55 00-00 09-00 00-00 00-00 00-00 38-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 40-69 72-1C 9F-55 00-00 01-00 00-00 00-00 00-00 48-69 72-1C 9F-55 00-00 ... 0A-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00>
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 FAILED TEST
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7012/24717] quantize_smoke/quantize_random_test.random/24 returned/aborted with exit code 1 (557 ms)
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7013/24717] quantize_smoke/quantize_random_test.random/25 (663 ms)
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: [7014/24717] quantize_smoke/quantize_random_test.random/27 (599 ms)
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ERROR: ld.so: object 'libSegFault.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: Running main() from src/plugins/intel_gpu/tests/unit/gtest_main_gpu.cpp
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: WARNING: cl_cache_dir is not set. Test will take longer than expected
[2026-04-08T15:40:49.488Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;33mNote: Google Test filter = quantize_smoke/quantize_random_test.random/27
[2026-04-08T15:40:49.489Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[m�[0;32m[==========] �[mRunning 1 test from 1 test suite.
[2026-04-08T15:40:49.489Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment set-up.
[2026-04-08T15:40:49.489Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PG INFO ] �[mPostgreSQL Reporting is disabled due to missing environment settings
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ RUN ] �[mquantize_smoke/quantize_random_test.random/27
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: src/plugins/intel_gpu/tests/unit/test_cases/quantize_gpu_test.cpp:1054: Failure
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: The difference between opt_out_val and ref_out_val is 45, which exceeds 1, where
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: opt_out_val evaluates to 204,
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: ref_out_val evaluates to 249, and
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,081] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 evaluates to 1.
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: index = 3584
[2026-04-08T15:40:49.490Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/27, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-49 50-70 F3-5C 00-00 09-00 00-00 00-00 00-00 38-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 40-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 48-49 50-70 F3-5C 00-00 ... 05-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00> (36 ms)
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[m1 test from quantize_smoke/quantize_random_test (36 ms total)
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[----------] �[mGlobal test environment tear-down
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[==========] �[m1 test from 1 test suite ran. (37 ms total)
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;32m[ PASSED ] �[m0 tests.
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[m1 test, listed below:
[2026-04-08T15:40:49.491Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: �[0;31m[ FAILED ] �[mquantize_smoke/quantize_random_test.random/27, where GetParam() = 176-byte object <03-00 00-00 10-00 00-00 38-49 50-70 F3-5C 00-00 09-00 00-00 00-00 00-00 38-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 40-49 50-70 F3-5C 00-00 01-00 00-00 00-00 00-00 48-49 50-70 F3-5C 00-00 ... 05-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 01-00 00-00 00-00 00-00 1A-00 00-00 1A-00 00-00 05-00 00-00 00-00 00-00>
[2026-04-08T15:40:49.493Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO:
[2026-04-08T15:40:49.493Z] [2026-04-08 15:40:49,082] [124175102043712] cldnn_unit_tests_dg2-0 INFO: 1 FAILED TEST

@intbf intbf force-pushed the gpu_defer_input_allocations branch from 43b6fc1 to d209e8b Compare April 9, 2026 21:19
@intbf
Copy link
Copy Markdown
Contributor Author

intbf commented Apr 10, 2026

more tests failing:

smoke/DynamicShapeStatefulModelDefault.smoke_Run_Stateful_Dynamic_Default/0
smoke_Transpose_7D_Infer_Twice/Transpose7DInferTwiceTest.infer_twice_diff_shapes_same_request/0
KVCacheTests.smoke_multipleIterations_stateful_gather_with_initializer_batch_1_5
KVCacheTests.smoke_multipleIterations_stateful_gather_with_initializer
KVCacheTests.smoke_multipleIterations_stateful_with_set_state
KVCacheTests.smoke_multipleIterations_stateful_no_gather_no_initializer_cached
smoke/SDPAWithKVCacheTest.MultipleIterationStateful/with_rearrange=1_batch=1_et=f16_num_iter=10_num_groups=4_initial_batch=1_qkv_order=(0.2.1.3)_mask=1_scale=0_causal=1_compressed=0k_head=128v_head=64
smoke/SDPAWithKVCacheTest.MultipleIterationStateful/with_rearrange=1_batch=2_et=f16_num_iter=10_num_groups=4_initial_batch=1_qkv_order=(0.1.2.3)_mask=0_scale=0_causal=1_compressed=0k_head=96v_head=64

I'll work on them today

@intbf intbf force-pushed the gpu_defer_input_allocations branch from 481e216 to 1ef0611 Compare April 10, 2026 14:02
Comment thread src/plugins/intel_gpu/src/graph/broadcast.cpp
Comment thread src/plugins/intel_gpu/tests/unit/test_cases/memory_test.cpp Outdated
Comment thread src/plugins/intel_gpu/tests/unit/test_cases/memory_test.cpp Outdated
Comment thread src/plugins/intel_gpu/src/graph/network.cpp Outdated
auto& eng = get_engine();

if (p_inst->output_memory_ptr())
_in_out_shared_mem_types.push_back(p_inst->output_memory_ptr()->get_internal_params().mem_type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it needs to store the mem_type information into this object?

Copy link
Copy Markdown
Contributor

@Kotomi-Du Kotomi-Du Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[_in_out_shared_mem_types] is the cached vector of shared_mem_type enum values for all inputs/outputs. It is used in network.cpp execute() to check if it requires GPU surface locking before execution. The data is stored in network.cpp allocate_primitive_instance which must be called before execution. Why does deferred allocation have impact on this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, those two locations were wrong. The original motivation was: lazy input_layout nodes have null output_memory_ptr() at allocate_primitive_instance() time, so the normal push_back there is skipped. If the user later provides a shared surface (VA/DX11) via set_input_data(), _in_out_shared_mem_types would never record it

I moved it to set_input_data,

@Kotomi-Du
Copy link
Copy Markdown
Contributor

build_jenkins

for (auto const& input : _inputs) ret.push_back(input->output_memory_ptr()->get_layout());
for (auto const& input : _inputs) {
if (input->output_memory_ptr())
_in_out_shared_mem_types.push_back(input->output_memory_ptr()->get_internal_params().mem_type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above.

@intbf intbf force-pushed the gpu_defer_input_allocations branch 2 times, most recently from 84c5fce to 7a6c73b Compare April 17, 2026 13:27
@p-durandin
Copy link
Copy Markdown
Contributor

build_jenkins

@intbf
Copy link
Copy Markdown
Contributor Author

intbf commented Apr 20, 2026

Have there been any measurements of the impact of this change on performance? And if there is an impact, maybe add a property like "disable_input_preallocation", that you will use in your application

In the description I added some results from benchmark runs, is that good enough? Since there's not much perf impact maybe there's no need to introduce this extra ov flag?

@maxnick maxnick requested a review from Kotomi-Du April 21, 2026 11:06
Comment thread src/plugins/intel_gpu/src/graph/dynamic_quantize.cpp Outdated
Comment thread src/plugins/intel_gpu/src/graph/strided_slice.cpp Outdated
Copy link
Copy Markdown
Contributor

@maxnick maxnick Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we extend the unit tests cope verifying that the proper input memory address is indeed set to the next node input?
Check if it correctly propagated though a chain of optimized ops (again verifying the address).
Rebinding external memory (set a new input memory multiple times).
Possible implicit memory binding when the output memory of the previous run is fed into the input of the second run. So we can end up in a situation when the input and the output of the same primitive is the same memory address, therefore it can overwrite its own input. Should be covered with the existing tests though, but we need to double check if these checks validate this new code path.
Multi output, when the lazy allocated memory is reused across several consumers.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comment, I addressed those scenarios in memory_test.cpp, please have a look in the latest commit/update

@intbf intbf force-pushed the gpu_defer_input_allocations branch 2 times, most recently from d6d78be to 51dab1b Compare April 29, 2026 19:31
@Kotomi-Du
Copy link
Copy Markdown
Contributor

build_jenkins

@intbf intbf force-pushed the gpu_defer_input_allocations branch from 1141b51 to 8660895 Compare May 1, 2026 18:37
@Kotomi-Du
Copy link
Copy Markdown
Contributor

build_jenkins

intbf and others added 15 commits May 4, 2026 11:30
In typed_primitive_inst force "allocate mem" to false so that we can avoid allocations of large inputs. Handle cases where the inputs are expected to be present (check for null, or allocate temp buffer for simplicity)
previously the tests expected that the memory would be preallocated for the network's input layouts, now it's not, so the tests were adjusted for that
…roperly update dependencies and internals, ensure _reset_arguments is called on set_input_data
the primitive_inst::set_output_memory function has short circuit logic that might do nothing when pointers are the same, but in some cases the same pointer can be set with different layout, and in those cases the old state wouldn't be properly updated.
…ed, code style and small refactor for unit tests
…mem, simplify handling of _in_out_shared_mem_types
…ccessing input_memory_ptr in update_output_memory functions
…the push_back so that the new type is recorded even for mem change
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Cover the reviewer's checklist for the lazy input allocation PR:
- Buffer propagation to direct consumers after set_input_data
- Multi-consumer fan-out from a single lazy input
- Chain of optimized ops aliasing through the lazy buffer
- Rebinding external memory across multiple inferences
- Feeding previous output back as next inference input
@intbf intbf force-pushed the gpu_defer_input_allocations branch from 8660895 to 7be2342 Compare May 4, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants