Skip to content

fixes for events in rf hist_random#3667

Open
Alexandr-Solovev wants to merge 2 commits into
uxlfoundation:mainfrom
Alexandr-Solovev:dev/asolovev_hist_batch_random_fix
Open

fixes for events in rf hist_random#3667
Alexandr-Solovev wants to merge 2 commits into
uxlfoundation:mainfrom
Alexandr-Solovev:dev/asolovev_hist_batch_random_fix

Conversation

@Alexandr-Solovev

@Alexandr-Solovev Alexandr-Solovev commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes a host/device memory ordering hazard in the GPU histogram-based Random Forest training kernel. The two helpers that produce per-node random data for split selection (gen_feature_list and gen_random_thresholds) were filling shared-USM buffers on the host and returning an empty sycl::event, then the caller blocked the host on event.wait_and_throw() before launching the consumer kernel. With shared USM, host writes that race a device kernel on the same allocation can be observed inconsistently — and the empty event provided no actual ordering guarantee against subsequent device work.

What changed

File: cpp/oneapi/dal/algo/decision_forest/backend/gpu/train_kernel_hist_impl_dpc.cpp

gen_feature_list

  • Allocate the staging buffer as a plain host ndarray (selected_features_host) instead of shared USM.
  • Allocate a device-only buffer (selected_features_com, alloc::device) and copy host → device with assign_from_host, returning the resulting sycl::event (was previously an empty event).

gen_random_thresholds

  • Same pattern: host ndarray for the RNG fill, then a device buffer + assign_from_host whose event is returned.

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@Alexandr-Solovev Alexandr-Solovev changed the title fixes for rf random fixes for events in rf hist_random Jun 23, 2026
@Alexandr-Solovev Alexandr-Solovev added the dpc++ Issue/PR related to DPC++ functionality label Jun 23, 2026
@Alexandr-Solovev Alexandr-Solovev marked this pull request as ready for review June 23, 2026 19:04
@Alexandr-Solovev

Copy link
Copy Markdown
Contributor Author

/intelci: run

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes host/device ordering for per-node random data generation in the GPU histogram-based Random Forest training path by ensuring the generated feature lists and thresholds are properly transferred to device memory and ordered via SYCL events.

Changes:

  • gen_feature_list: generate selected features on host, then copy into a device allocation and return the copy event.
  • gen_random_thresholds: generate random thresholds on host, then copy into a device allocation and return the copy event.
  • Training loop: stop waiting on empty events; instead, pass the returned copy events as dependencies into compute_best_split.

Comment on lines +512 to +520
auto selected_features_com =
pr::ndarray<Index, 1>::empty(queue_,
{ node_count * ctx.selected_ftr_count_ },
alloc::device);
auto copy_event = selected_features_com.assign_from_host(queue_,
selected_features_host_ptr,
selected_features_host.get_count());

return std::tuple{ selected_features_com, copy_event };
Comment on lines +546 to +553
auto random_bins_com = pr::ndarray<Float, 1>::empty(queue_,
{ node_count * ctx.selected_ftr_count_ },
alloc::device);
auto copy_event = random_bins_com.assign_from_host(queue_,
random_bins_host_ptr,
random_bins_host.get_count());

return std::tuple{ random_bins_com, copy_event };
Comment on lines 534 to 535
auto node_vs_tree_map_list_host = node_vs_tree_map.to_host(queue_);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dpc++ Issue/PR related to DPC++ functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants