Skip to content

bugprone-misplaced-widening-cast#9506

Open
Jacobfaib wants to merge 2 commits into
NVIDIA:mainfrom
Jacobfaib:jacobf/2026-06-17/bugprone-misplaced-widening-cast
Open

bugprone-misplaced-widening-cast#9506
Jacobfaib wants to merge 2 commits into
NVIDIA:mainfrom
Jacobfaib:jacobf/2026-06-17/bugprone-misplaced-widening-cast

Conversation

@Jacobfaib

Copy link
Copy Markdown
Contributor

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@Jacobfaib Jacobfaib self-assigned this Jun 17, 2026
@Jacobfaib Jacobfaib requested review from a team as code owners June 17, 2026 17:36
@Jacobfaib Jacobfaib requested a review from alliepiper June 17, 2026 17:36
@Jacobfaib Jacobfaib requested review from caugonnet and gonidelis June 17, 2026 17:36
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 17, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 17, 2026
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f78850fe-0a49-4cfb-9347-aec8f85246b3

📥 Commits

Reviewing files that changed from the base of the PR and between f412ebc and f30e5e9.

📒 Files selected for processing (75)
  • .clang-tidy
  • cub/cub/agent/agent_batch_memcpy.cuh
  • cub/cub/agent/agent_for.cuh
  • cub/cub/agent/agent_merge.cuh
  • cub/cub/agent/agent_radix_sort_downsweep.cuh
  • cub/cub/agent/agent_radix_sort_histogram.cuh
  • cub/cub/agent/agent_radix_sort_onesweep.cuh
  • cub/cub/agent/agent_reduce.cuh
  • cub/cub/agent/agent_reduce_by_key.cuh
  • cub/cub/agent/agent_rle.cuh
  • cub/cub/agent/agent_select_if.cuh
  • cub/cub/agent/agent_three_way_partition.cuh
  • cub/cub/agent/agent_topk.cuh
  • cub/cub/agent/agent_unique_by_key.cuh
  • cub/cub/block/block_load.cuh
  • cub/cub/block/block_store.cuh
  • cub/cub/detail/rfa.cuh
  • cub/cub/device/device_histogram.cuh
  • cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
  • cub/cub/device/dispatch/dispatch_for.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/cub/device/dispatch/dispatch_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_reduce.cuh
  • cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
  • cub/cub/device/dispatch/dispatch_rle.cuh
  • cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
  • cub/cub/device/dispatch/dispatch_segmented_sort.cuh
  • cub/cub/device/dispatch/dispatch_select_if.cuh
  • cub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuh
  • cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_scan.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_three_way_partition.cuh
  • cub/cub/device/dispatch/kernels/kernel_transform.cuh
  • cub/cub/device/dispatch/kernels/kernel_unique_by_key.cuh
  • cub/cub/grid/grid_even_share.cuh
  • cub/cub/thread/thread_reduce.cuh
  • cub/test/catch2_segmented_sort_helper.cuh
  • cub/test/catch2_test_block_adjacent_difference.cu
  • cub/test/catch2_test_device_for.cu
  • cub/test/catch2_test_device_reduce.cuh
  • cub/test/catch2_test_device_run_length_encode_non_trivial_runs.cu
  • cub/test/catch2_test_device_scan_alignment.cu
  • cub/test/catch2_test_device_segmented_scan.cu
  • cub/test/catch2_test_device_segmented_scan_multi_segment.cu
  • cub/test/catch2_test_device_segmented_scan_noncommutative.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • cub/test/catch2_test_device_topk_common.cuh
  • cub/test/catch2_test_device_transform_reduce.cu
  • cub/test/catch2_test_warp_load.cu
  • cub/test/catch2_test_warp_scan_api.cu
  • cub/test/catch2_test_warp_store.cu
  • cub/test/warp/catch2_test_warp_reduce.cu
  • cub/test/warp/catch2_test_warp_reduce_batched.cu
  • cub/test/warp/catch2_test_warp_segmented_reduce.cu
  • cudax/benchmarks/bench/cuco/hashers.cu
  • cudax/test/copy/copy.cu
  • cudax/test/copy/copy_edge_cases.cu
  • cudax/test/copy_bytes/mdspan_d2h_h2d.cu
  • cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_right.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.sorting/alg.merge/pstl_merge.cu
  • libcudacxx/test/libcudacxx/std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu
  • thrust/examples/bucket_sort2d.cu
  • thrust/examples/discrete_voronoi.cu
  • thrust/examples/monte_carlo_disjoint_sequences.cu
  • thrust/examples/padded_grid_reduction.cu
  • thrust/examples/scan_matrix_by_rows.cu
  • thrust/testing/counting_iterator.cu
  • thrust/thrust/system/cuda/detail/reduce.h
  • thrust/thrust/system/cuda/detail/set_operations.h
  • thrust/thrust/system/detail/generic/shuffle.inl
  • thrust/thrust/system/detail/sequential/stable_radix_sort.h
✅ Files skipped from review due to trivial changes (53)
  • cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
  • cub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuh
  • cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
  • cub/cub/agent/agent_three_way_partition.cuh
  • cub/cub/detail/rfa.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/test/catch2_test_device_run_length_encode_non_trivial_runs.cu
  • cub/test/catch2_segmented_sort_helper.cuh
  • thrust/testing/counting_iterator.cu
  • libcudacxx/test/libcudacxx/std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu
  • cub/cub/device/dispatch/kernels/kernel_unique_by_key.cuh
  • cub/cub/grid/grid_even_share.cuh
  • cub/cub/agent/agent_unique_by_key.cuh
  • cub/cub/device/dispatch/dispatch_reduce.cuh
  • cub/cub/device/dispatch/kernels/kernel_three_way_partition.cuh
  • cub/cub/agent/agent_radix_sort_downsweep.cuh
  • cub/cub/block/block_load.cuh
  • cub/test/catch2_test_device_scan_alignment.cu
  • cub/cub/thread/thread_reduce.cuh
  • cub/test/catch2_test_device_for.cu
  • cub/cub/agent/agent_for.cuh
  • cub/cub/agent/agent_radix_sort_onesweep.cuh
  • cub/cub/agent/agent_reduce_by_key.cuh
  • cub/test/catch2_test_block_adjacent_difference.cu
  • cub/test/catch2_test_device_segmented_scan_noncommutative.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.sorting/alg.merge/pstl_merge.cu
  • cub/cub/agent/agent_merge.cuh
  • cub/cub/device/dispatch/kernels/kernel_transform.cuh
  • cub/test/catch2_test_warp_load.cu
  • cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
  • cub/cub/agent/agent_select_if.cuh
  • thrust/examples/discrete_voronoi.cu
  • cub/cub/block/block_store.cuh
  • cub/test/catch2_test_device_topk_common.cuh
  • cub/cub/agent/agent_batch_memcpy.cuh
  • cub/cub/agent/agent_reduce.cuh
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_right.cu
  • cub/cub/device/dispatch/kernels/kernel_segmented_scan.cuh
  • thrust/examples/scan_matrix_by_rows.cu
  • cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
  • thrust/examples/monte_carlo_disjoint_sequences.cu
  • cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
  • libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu
  • thrust/thrust/system/detail/sequential/stable_radix_sort.h
  • cub/cub/agent/agent_rle.cuh
  • thrust/thrust/system/cuda/detail/set_operations.h
  • cub/test/catch2_test_device_segmented_scan_multi_segment.cu
  • cub/test/catch2_test_warp_store.cu
  • cub/cub/device/dispatch/dispatch_radix_sort.cuh
  • cub/test/warp/catch2_test_warp_segmented_reduce.cu
  • cub/test/catch2_test_device_segmented_scan.cu
  • cub/test/catch2_test_warp_scan_api.cu
🚧 Files skipped from review as they are similar to previous changes (20)
  • thrust/thrust/system/detail/generic/shuffle.inl
  • cub/cub/device/dispatch/dispatch_for.cuh
  • cub/test/catch2_test_device_transform_reduce.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu
  • cub/cub/agent/agent_radix_sort_histogram.cuh
  • cub/test/warp/catch2_test_warp_reduce_batched.cu
  • cub/cub/device/dispatch/dispatch_rle.cuh
  • thrust/examples/bucket_sort2d.cu
  • thrust/thrust/system/cuda/detail/reduce.h
  • cub/cub/agent/agent_topk.cuh
  • thrust/examples/padded_grid_reduction.cu
  • cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu
  • cudax/test/copy/copy_edge_cases.cu
  • cub/cub/device/dispatch/dispatch_select_if.cuh
  • cudax/benchmarks/bench/cuco/hashers.cu
  • cub/test/warp/catch2_test_warp_reduce.cu
  • cub/test/catch2_test_device_reduce.cuh
  • cudax/test/copy_bytes/mdspan_d2h_h2d.cu
  • cub/cub/device/device_histogram.cuh
  • cudax/test/copy/copy.cu

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Summary

This PR simplifies the .clang-tidy configuration file by removing extensive comments and complex settings, while enabling the bugprone-misplaced-widening-cast check across the codebase.

Configuration Changes

.clang-tidy:

  • Simplified the configuration format: converted the verbose multi-line checks list with extensive inline documentation (326 lines) into a compact folded-scalar format (23 lines total)
  • Removed all comments documenting rationale for specific check exclusions
  • Enabled bugprone-misplaced-widening-cast by removing its disable entry from the bugprone checks
  • Added -bugprone-signed-char-misuse to the disabled checks list
  • Changed WarningsAsErrors from '*' (treat all warnings as errors) to '' (disabled)
  • Removed extensive configuration sections:
    • HeaderFileExtensions, ImplementationFileExtensions, and SystemHeaders settings
    • HeaderFilterRegex and compiler argument specifications (ExtraArgsBefore, ExtraArgs)
    • All CheckOptions customizations for various checks (100+ lines of configuration)
  • Retained all functional check enable/disable logic in a more concise format

Impact

  • The configuration is now more maintainable with reduced file size (303 line reduction)
  • Warnings are no longer treated as errors by default, allowing for a transitional period as the codebase addresses the newly enabled bugprone-misplaced-widening-cast check
  • The disabled checks list is significantly streamlined while preserving project-specific customizations

Walkthrough

Enables the bugprone-misplaced-widening-cast clang-tidy check by removing it from .clang-tidy's disabled list, then resolves all resulting violations across CUB, Thrust, cudax, and libcudacxx. Fixes are either arithmetic order corrections (casting individual operands to the wider type before multiplying) or NOLINT suppressions where the existing pattern is intentional.

Changes

Enable check and fix all violations

Layer / File(s) Summary
clang-tidy configuration
.clang-tidy
Removes bugprone-misplaced-widening-cast from disabled checks.
Dispatch-layer tile-size and offset arithmetic fixes
cub/cub/device/dispatch/dispatch_for.cuh, dispatch_reduce.cuh, dispatch_reduce_deterministic.cuh, dispatch_rle.cuh, dispatch_select_if.cuh, dispatch_streaming_reduce_by_key.cuh, dispatch_radix_sort.cuh, dispatch_merge.cuh, dispatch_segmented_sort.cuh, dispatch_segmented_reduce.cuh, dispatch_batch_memcpy.cuh, cub/cub/grid/grid_even_share.cuh
Changes static_cast<OffsetT>(threads * items) patterns to static_cast<OffsetT>(threads) * static_cast<OffsetT>(items) so multiplication is performed in the wider type, eliminating overflow before widening.
Agent-level NOLINT suppressions and cast-order fixes
cub/cub/agent/agent_batch_memcpy.cuh, agent_for.cuh, agent_merge.cuh, agent_radix_sort_downsweep.cuh, agent_radix_sort_histogram.cuh, agent_radix_sort_onesweep.cuh, agent_reduce.cuh, agent_reduce_by_key.cuh, agent_rle.cuh, agent_select_if.cuh, agent_three_way_partition.cuh, agent_topk.cuh, agent_unique_by_key.cuh
Applies NOLINT suppressions and cast-before-multiply fixes across all CUB agent headers, covering partial-tile guards, scatter offsets, load/store indexing, and tile-base calculations.
Kernel-level NOLINT suppressions and histogram cast fix
cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh, kernel_segmented_scan.cuh, kernel_segmented_sort.cuh, kernel_three_way_partition.cuh, kernel_transform.cuh, kernel_unique_by_key.cuh, cub/cub/device/device_histogram.cuh
Adds NOLINT/NOLINTNEXTLINE annotations to kernel offset expressions where the existing pattern is intentional; updates device_histogram overflow check from C-style cast to static_cast<unsigned long long>.
Block and warp primitive NOLINT suppressions
cub/cub/block/block_load.cuh, block_store.cuh, cub/cub/thread/thread_reduce.cuh, cub/cub/detail/rfa.cuh
Annotates element-index expressions in block load/store variants and thread-level reduction with NOLINT comments.
Thrust internal and example fixes
thrust/thrust/system/cuda/detail/reduce.h, set_operations.h, thrust/thrust/system/detail/generic/shuffle.inl, thrust/thrust/system/detail/sequential/stable_radix_sort.h, thrust/examples/bucket_sort2d.cu, discrete_voronoi.cu, padded_grid_reduction.cu, scan_matrix_by_rows.cu, monte_carlo_disjoint_sequences.cu, thrust/testing/counting_iterator.cu
Fixes cast order for ITEMS_PER_TILE/blockIdx.x products in reduce and set-operations; adds iter_difference_t cast in shuffle's inclusive_scan range; annotates or rewrites index expressions in Thrust examples.
CUB test arithmetic and NOLINT fixes
cub/test/catch2_test_device_reduce.cuh, catch2_test_device_for.cu, catch2_test_device_scan_alignment.cu, catch2_test_device_segmented_scan.cu, catch2_test_device_segmented_scan_multi_segment.cu, catch2_test_device_segmented_scan_noncommutative.cu, catch2_test_device_run_length_encode_non_trivial_runs.cu, catch2_test_device_transform_reduce.cu, catch2_test_block_adjacent_difference.cu, catch2_test_device_segmented_topk_pairs.cu, catch2_test_device_topk_common.cuh, catch2_test_warp_load.cu, catch2_test_warp_scan_api.cu, catch2_test_warp_store.cu, catch2_segmented_sort_helper.cuh, warp/catch2_test_warp_reduce.cu, warp/catch2_test_warp_reduce_batched.cu, warp/catch2_test_warp_segmented_reduce.cu
Adds explicit long/size_t/uint64_t casts to segment iterator and index arithmetic in host-reference computations; annotates remaining instances with NOLINT comments.
cudax and libcudacxx test index arithmetic fixes
cudax/benchmarks/bench/cuco/hashers.cu, cudax/test/copy/copy.cu, cudax/test/copy/copy_edge_cases.cu, cudax/test/copy_bytes/mdspan_d2h_h2d.cu, cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu, libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu, libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu, pstl_shift_right.cu, alg.sorting/alg.merge/pstl_merge.cu, std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu
Adds static_cast<std::size_t> or static_cast<unsigned long> to multidimensional index computations and algorithm test expressions to prevent int overflow before widening.

Possibly related PRs

  • NVIDIA/cccl#9467: Updates clang-tidy bugprone check configuration related to bugprone-misplaced-widening-cast handling.

Suggested reviewers

  • bernhardmgruber
  • gonidelis
  • caugonnet

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cub/cub/device/device_histogram.cuh (1)

753-786: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

important: Mirror this cast-before-multiply fix in the env overloads.

These branches are fine, but the EnvT overloads later in cub/cub/device/device_histogram.cuh still do the cast after multiplying num_rows * row_stride_bytes, so the same clang-tidy hit and overflow risk remain in this header.

Also applies to: 1453-1484

cub/cub/agent/agent_rle.cuh (1)

363-369: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

critical: tile_successor_item/tile_predecessor_item are only written by one lane, but every lane still passes the local variable by value into FlagHeadsAndTails. Most lanes therefore hand the callee an indeterminate value here, which makes the head/tail classification undefined. Broadcast the neighbor item before the call instead of just suppressing the lint.

Also applies to: 377-383, 389-402


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca2cbe9d-1d29-425b-833e-2ac72191e0ff

📥 Commits

Reviewing files that changed from the base of the PR and between a4ffbcf and f412ebc.

📒 Files selected for processing (75)
  • .clang-tidy
  • cub/cub/agent/agent_batch_memcpy.cuh
  • cub/cub/agent/agent_for.cuh
  • cub/cub/agent/agent_merge.cuh
  • cub/cub/agent/agent_radix_sort_downsweep.cuh
  • cub/cub/agent/agent_radix_sort_histogram.cuh
  • cub/cub/agent/agent_radix_sort_onesweep.cuh
  • cub/cub/agent/agent_reduce.cuh
  • cub/cub/agent/agent_reduce_by_key.cuh
  • cub/cub/agent/agent_rle.cuh
  • cub/cub/agent/agent_select_if.cuh
  • cub/cub/agent/agent_three_way_partition.cuh
  • cub/cub/agent/agent_topk.cuh
  • cub/cub/agent/agent_unique_by_key.cuh
  • cub/cub/block/block_load.cuh
  • cub/cub/block/block_store.cuh
  • cub/cub/detail/rfa.cuh
  • cub/cub/device/device_histogram.cuh
  • cub/cub/device/dispatch/dispatch_batch_memcpy.cuh
  • cub/cub/device/dispatch/dispatch_for.cuh
  • cub/cub/device/dispatch/dispatch_merge.cuh
  • cub/cub/device/dispatch/dispatch_radix_sort.cuh
  • cub/cub/device/dispatch/dispatch_reduce.cuh
  • cub/cub/device/dispatch/dispatch_reduce_deterministic.cuh
  • cub/cub/device/dispatch/dispatch_rle.cuh
  • cub/cub/device/dispatch/dispatch_segmented_reduce.cuh
  • cub/cub/device/dispatch/dispatch_segmented_sort.cuh
  • cub/cub/device/dispatch/dispatch_select_if.cuh
  • cub/cub/device/dispatch/dispatch_streaming_reduce_by_key.cuh
  • cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_scan.cuh
  • cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
  • cub/cub/device/dispatch/kernels/kernel_three_way_partition.cuh
  • cub/cub/device/dispatch/kernels/kernel_transform.cuh
  • cub/cub/device/dispatch/kernels/kernel_unique_by_key.cuh
  • cub/cub/grid/grid_even_share.cuh
  • cub/cub/thread/thread_reduce.cuh
  • cub/test/catch2_segmented_sort_helper.cuh
  • cub/test/catch2_test_block_adjacent_difference.cu
  • cub/test/catch2_test_device_for.cu
  • cub/test/catch2_test_device_reduce.cuh
  • cub/test/catch2_test_device_run_length_encode_non_trivial_runs.cu
  • cub/test/catch2_test_device_scan_alignment.cu
  • cub/test/catch2_test_device_segmented_scan.cu
  • cub/test/catch2_test_device_segmented_scan_multi_segment.cu
  • cub/test/catch2_test_device_segmented_scan_noncommutative.cu
  • cub/test/catch2_test_device_segmented_topk_pairs.cu
  • cub/test/catch2_test_device_topk_common.cuh
  • cub/test/catch2_test_device_transform_reduce.cu
  • cub/test/catch2_test_warp_load.cu
  • cub/test/catch2_test_warp_scan_api.cu
  • cub/test/catch2_test_warp_store.cu
  • cub/test/warp/catch2_test_warp_reduce.cu
  • cub/test/warp/catch2_test_warp_reduce_batched.cu
  • cub/test/warp/catch2_test_warp_segmented_reduce.cu
  • cudax/benchmarks/bench/cuco/hashers.cu
  • cudax/test/copy/copy.cu
  • cudax/test/copy/copy_edge_cases.cu
  • cudax/test/copy_bytes/mdspan_d2h_h2d.cu
  • cudax/test/copy_bytes/mdspan_d2h_h2d_relaxed.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/hierarchy/hierarchy_smoke.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_left.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.modifying/alg.shift/pstl_shift_right.cu
  • libcudacxx/test/libcudacxx/std/algorithms/alg.sorting/alg.merge/pstl_merge.cu
  • libcudacxx/test/libcudacxx/std/numerics/numeric.ops/transform.reduce/pstl_transform_reduce_unary.cu
  • thrust/examples/bucket_sort2d.cu
  • thrust/examples/discrete_voronoi.cu
  • thrust/examples/monte_carlo_disjoint_sequences.cu
  • thrust/examples/padded_grid_reduction.cu
  • thrust/examples/scan_matrix_by_rows.cu
  • thrust/testing/counting_iterator.cu
  • thrust/thrust/system/cuda/detail/reduce.h
  • thrust/thrust/system/cuda/detail/set_operations.h
  • thrust/thrust/system/detail/generic/shuffle.inl
  • thrust/thrust/system/detail/sequential/stable_radix_sort.h
💤 Files with no reviewable changes (1)
  • .clang-tidy

Comment thread cub/cub/device/dispatch/dispatch_merge.cuh
Comment thread cub/cub/device/dispatch/kernels/kernel_merge_sort.cuh
Comment thread cub/cub/device/dispatch/kernels/kernel_segmented_sort.cuh
using Difference = thrust::detail::it_difference_t<Iterator>;

Difference diff = std::numeric_limits<std::uint32_t>::max() + 1;
Difference diff = std::numeric_limits<std::uint32_t>::max() + 1; // NOLINT(bugprone-misplaced-widening-cast)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

In C++, for std::uint32_t x = std::numeric_limitsstd::uint32_t::max(); auto y = x + 1;, does y overflow in 32-bit before assignment to a wider signed type? Please cite the usual arithmetic conversion rules from cppreference.

💡 Result:

In the expression auto y = x + 1; where x is std::uint32_t initialized to std::numeric_limits<std::uint32_t>::max(), the operation does not "overflow" in the sense of undefined behavior [1][2]. Instead, it performs well-defined unsigned integer arithmetic [1]. According to the C++ standard's rules for arithmetic operators [1][3]: 1. Integral Promotion: Before the addition, the operand x (type std::uint32_t) and the literal 1 (type int) undergo integral promotion [3][4]. If int can represent all values of std::uint32_t (which is typically not the case on systems where both are 32 bits), the uint32_t is promoted to int [5][4]. However, if std::uint32_t cannot be represented by int (common on 32-bit systems), it remains std::uint32_t or is promoted to unsigned int [5]. 2. Usual Arithmetic Conversions: Once promoted, if the types of the operands differ (e.g., std::uint32_t and int), the "usual arithmetic conversions" are applied to bring them to a common type [6][3]. Specifically, if one operand is unsigned and the other is signed, the signed operand is converted to the corresponding unsigned type if the unsigned type's rank is greater than or equal to the signed type's rank [6]. As a result, the addition is performed using unsigned arithmetic [7][8]. 3. Unsigned Wrap-around: Unsigned integer arithmetic is defined to be performed modulo 2^n, where n is the number of bits [1]. Therefore, adding 1 to the maximum value of a 32-bit unsigned integer results in 0 [1][2]. This wrap-around occurs during the evaluation of the expression x + 1 itself, before any assignment takes place [8]. Consequently, y will be assigned the value 0, not a value that overflowed into a wider signed type [8][9].

Citations:


🏁 Script executed:

cd /tmp && git clone --depth 1 https://github.com/NVIDIA/cccl.git 2>&1 | head -20

Repository: NVIDIA/cccl

Length of output: 140


🏁 Script executed:

cat -n cccl/thrust/testing/counting_iterator.cu | sed -n '270,290p'

Repository: NVIDIA/cccl

Length of output: 128


🏁 Script executed:

# Find the Difference type definition in the test file
rg -A 5 -B 5 "Difference" cccl/thrust/testing/counting_iterator.cu | head -40

Repository: NVIDIA/cccl

Length of output: 136


🏁 Script executed:

find /tmp/cccl -name "counting_iterator.cu" -type f

Repository: NVIDIA/cccl

Length of output: 148


🏁 Script executed:

find /tmp/cccl -name "*counting_iterator*" -type f

Repository: NVIDIA/cccl

Length of output: 426


🏁 Script executed:

cat -n /tmp/cccl/thrust/testing/counting_iterator.cu | sed -n '270,290p'

Repository: NVIDIA/cccl

Length of output: 714


🏁 Script executed:

# Get more context around line 280 to understand the Difference type
cat -n /tmp/cccl/thrust/testing/counting_iterator.cu | sed -n '260,300p'

Repository: NVIDIA/cccl

Length of output: 1492


🏁 Script executed:

# Search for Difference typedef or using declaration
rg -B 20 "std::numeric_limits<std::uint32_t>::max\(\) \+ 1" /tmp/cccl/thrust/testing/counting_iterator.cu

Repository: NVIDIA/cccl

Length of output: 794


🏁 Script executed:

rg -A 10 "it_difference_t" /tmp/cccl/thrust/thrust/iterator/detail/counting_iterator.inl | head -30

Repository: NVIDIA/cccl

Length of output: 157


🏁 Script executed:

# Look for the difference_type in counting_iterator definition
rg -B 5 -A 5 "difference_type" /tmp/cccl/thrust/thrust/iterator/counting_iterator.h | head -50

Repository: NVIDIA/cccl

Length of output: 2024


🏁 Script executed:

# Find the definition of it_difference_t
find /tmp/cccl -name "*.h" -o -name "*.hpp" | xargs rg "template.*it_difference_t|using it_difference_t" | head -20

Repository: NVIDIA/cccl

Length of output: 190


🏁 Script executed:

cat /tmp/cccl/thrust/thrust/iterator/iterator_traits.h | grep -A 5 "it_difference_t"

Repository: NVIDIA/cccl

Length of output: 346


🏁 Script executed:

# Verify the difference_type for counting_iterator with uint64_t
cat /tmp/cccl/thrust/thrust/iterator/counting_iterator.h | sed -n '1,150p' | grep -A 20 "counting_iterator_difference_type"

Repository: NVIDIA/cccl

Length of output: 1619


🏁 Script executed:

# Confirm ptrdiff_t is typically 64-bit on 64-bit systems and larger than uint32_t
python3 << 'PY'
# In C++, ptrdiff_t is typically signed long on 64-bit systems
# The issue is that:
# std::numeric_limits<std::uint32_t>::max() = 4294967295 (0xFFFFFFFF)
# std::numeric_limits<std::uint32_t>::max() + 1
# Both operands are uint32_t, so addition happens in uint32_t
# Result: 0xFFFFFFFF + 1 = 0x100000000, but wraps to 0x00000000 in uint32_t
# Then assigns 0 to ptrdiff_t

# The fix would cast first:
# static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1
# This casts 0xFFFFFFFF to ptrdiff_t first (value 4294967295)
# Then adds 1 in ptrdiff_t, resulting in 4294967296

print("Original expression: std::numeric_limits<std::uint32_t>::max() + 1")
print("Operand types: uint32_t + int -> uint32_t + uint32_t")
print("Arithmetic: 0xFFFFFFFF + 0x00000001 = 0x100000000 (wraps to 0x00000000 in uint32_t)")
print("Result assigned to ptrdiff_t: 0")
print()
print("Fixed expression: static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1")
print("Operand types: ptrdiff_t + int -> ptrdiff_t + ptrdiff_t")
print("Arithmetic: 4294967295 + 1 = 4294967296 (in ptrdiff_t)")
print("Result assigned to ptrdiff_t: 4294967296 ✓")
PY

Repository: NVIDIA/cccl

Length of output: 535


important: the expression std::numeric_limits<std::uint32_t>::max() + 1 wraps to 0 in 32-bit unsigned arithmetic before assignment to Difference (which is ::cuda::std::ptrdiff_t for counting_iterator<std::uint64_t>). The test then validates with diff = 0, defeating its purpose to test >32-bit iterator differences. Cast before addition: static_cast<Difference>(std::numeric_limits<std::uint32_t>::max()) + 1.

@github-actions

This comment has been minimized.

@miscco miscco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just too much noise

@github-project-automation github-project-automation Bot moved this from In Review to In Progress in CCCL Jun 22, 2026
@Jacobfaib

Copy link
Copy Markdown
Contributor Author

This is just too much noise

@miscco this check is arguably even more valuable than the narrowing cast. I strongly urge us to keep it. Basically, this check warns on explicit conversions which still result in lossy conversion.

I.e. a developer has come along, considered the possibility that x * y may overflow, and added a static_cast<BiggerType>(x * y) thinking they are OK. But they did not realize that this cast still results in loss of precision because the inner computation is performed on smaller types. Instead they should have written static_cast<BiggerType>(x) * y.

@fbusato

fbusato commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

noisy, but not too noisy. It affects ~1 LoC for a subset of files. Also, we could focus this warning and the narrowing one only for library code which requires higher quality

@Jacobfaib Jacobfaib force-pushed the jacobf/2026-06-17/bugprone-misplaced-widening-cast branch from f412ebc to f30e5e9 Compare June 22, 2026 16:54
@github-actions

Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 6h 04m: Pass: 99%/443 | Total: 18d 12h | Max: 2h 42m | Hits: 29%/2218283

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants