Skip to content

feat(autoware_ptv3): source cloud reconstruction & entropy#12547

Open
amadeuszsz wants to merge 7 commits intoautowarefoundation:mainfrom
amadeuszsz:feat/ptv3-reconstruction
Open

feat(autoware_ptv3): source cloud reconstruction & entropy#12547
amadeuszsz wants to merge 7 commits intoautowarefoundation:mainfrom
amadeuszsz:feat/ptv3-reconstruction

Conversation

@amadeuszsz
Copy link
Copy Markdown
Contributor

@amadeuszsz amadeuszsz commented May 6, 2026

Description

This PR refines the PTv3 runtime point cloud handling and output reconstruction behavior. It adds configurable source reconstruction modes so PTv3 can optionally publish predictions back on the original input point layout instead of only voxel representatives. Changes:

  • Added source_reconstruction node parameter with modes:
    • none: existing behavior, publish voxelized output only.
    • partial: unvoxelize predictions back to cropped in-range input points.
    • full: unvoxelize and uncrop predictions back to the full input cloud.
      • Out-of-range or clipped points receive unknown label 255 and 0.0 class probabilities.
  • Updated segmentation output to include normalized Shannon entropy for uncertainty inspection.
  • Updated visualization and filtered point cloud generation to work correctly with reconstructed outputs.
  • Added schema/config support for the new reconstruction mode.
  • Extra:
    • Remap blackboard's (cuda) topics with dummy suffix. This let's topics negotiation fail explicitly which is expected outcome for nodes in separate process.
    • Fixes an out-of-bounds initialization of hash index buffers by matching the Thrust sequence range to the allocated cloud capacity.

Related links

Parent Issue:

  • Link

How was this PR tested?

Average latency in milliseconds for sample rosbag input with RTX 3060 laptop.

Metric Original Mode none Mode partial Mode full
Preprocessing 1.23 1.43 2.92 2.56
Inference 31.11 32.55 29.63 31.01
Postprocessing 1.19 1.38 3.88 4.05
Total 33.55 35.38 36.44 37.63

Original implementation (before this PR) and mode none seem to have equivalent latency. partial and full come with expected overhead.

I would suggest to keep mode full as default, because intuitively we expect same output points sequence as input.

Notes for reviewers

None.

Interface changes

None.

Effects on system behavior

None.

Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
@amadeuszsz amadeuszsz self-assigned this May 6, 2026
@amadeuszsz amadeuszsz added the run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci) label May 6, 2026
@amadeuszsz amadeuszsz requested review from knzo25 and manato as code owners May 6, 2026 07:59
@github-actions github-actions Bot added the component:perception Advanced sensor data processing and environment understanding. (auto-assigned) label May 6, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55caa03d08

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread perception/autoware_ptv3/lib/preprocess/preprocess_kernel.cu Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 0% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (b1df772) to head (2d0f101).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
perception/autoware_ptv3/lib/ptv3_trt.cpp 0.00% 62 Missing ⚠️
.../autoware_ptv3/lib/preprocess/preprocess_kernel.cu 0.00% 22 Missing ⚠️
...utoware_ptv3/lib/postprocess/postprocess_kernel.cu 0.00% 16 Missing ⚠️
...utoware_ptv3/include/autoware/ptv3/ptv3_config.hpp 0.00% 6 Missing ⚠️
perception/autoware_ptv3/src/ptv3_node.cpp 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (b1df772) and HEAD (2d0f101). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (b1df772) HEAD (2d0f101)
full-suite 2 1
daily 5 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main   #12547       +/-   ##
===========================================
- Coverage   18.65%    0.00%   -18.66%     
===========================================
  Files        1917       11     -1906     
  Lines      131368      759   -130609     
  Branches    44506        0    -44506     
===========================================
- Hits        24501        0    -24501     
+ Misses      86756      759    -85997     
+ Partials    20111        0    -20111     
Flag Coverage Δ
daily ?
full-suite 0.00% <0.00%> (-18.65%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the autoware_ptv3 segmentation pipeline to optionally reconstruct PTv3 voxel predictions back onto the original (cropped or full) source point layout, and augments the segmented output with a per-point normalized Shannon entropy field for uncertainty inspection.

Changes:

  • Added source_reconstruction parameter (none/partial/full) and implemented partial/full reconstruction paths (inverse mapping + reconstruction kernels).
  • Extended segmented pointcloud output with an entropy field and updated visualization/filtered outputs to operate on reconstructed outputs.
  • Updated schema/config/launch to support the new parameter and adjust CUDA topic remappings.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
perception/autoware_ptv3/src/ptv3_node.cpp Declares source_reconstruction ROS parameter and forwards it into PTv3Config.
perception/autoware_ptv3/schema/ptv3.schema.json Adds schema support for source_reconstruction (enum/default/required).
perception/autoware_ptv3/lib/ptv3_trt.cpp Allocates reconstruction buffers, selects output capacity based on reconstruction mode, clears buffers, and switches postprocess outputs to reconstructed sources.
perception/autoware_ptv3/lib/preprocess/preprocess_kernel.cu Adds inverse-map scatter kernel and expands feature generation to optionally emit source features/cropped source points/inverse mapping.
perception/autoware_ptv3/lib/postprocess/postprocess_kernel.cu Adds entropy to segmentation output and introduces partial/full reconstruction CUDA kernels + safer visualization color lookup.
perception/autoware_ptv3/launch/ptv3.launch.xml Remaps CUDA topics to .../cuda/dummy to force explicit negotiation failures across processes.
perception/autoware_ptv3/include/autoware/ptv3/ptv3_trt.hpp Adds state for source/cropped counts, raw input pointer, and reconstruction buffers.
perception/autoware_ptv3/include/autoware/ptv3/ptv3_config.hpp Introduces SourceReconstruction enum + parsing, and threads the new config through the constructor.
perception/autoware_ptv3/include/autoware/ptv3/preprocess/preprocess_kernel.hpp Updates generateFeatures() signature and exposes crop mask/indices for full reconstruction.
perception/autoware_ptv3/include/autoware/ptv3/postprocess/postprocess_kernel.hpp Updates visualization API and adds reconstruction method declarations.
perception/autoware_ptv3/config/ptv3.param.yaml Sets a default source_reconstruction mode in the shipped param file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread perception/autoware_ptv3/src/ptv3_node.cpp
Comment thread perception/autoware_ptv3/schema/ptv3.schema.json
Comment thread perception/autoware_ptv3/config/ptv3.param.yaml
Comment thread perception/autoware_ptv3/lib/preprocess/preprocess_kernel.cu Outdated
Comment thread perception/autoware_ptv3/lib/ptv3_trt.cpp Outdated
amadeuszsz added 4 commits May 6, 2026 17:20
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
@amadeuszsz amadeuszsz requested a review from mojomex May 7, 2026 06:06
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary changes made it into the PR

float z;
std::uint8_t class_id;
float probability;
float entropy;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to be used by downstream modules? If it's only for debug/evaluation it should ideally be disabled/excluded in a production context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ktro2828 do you plan to use ~/ptv3/segmentation topic? Are you ok with this change?

Anyway, if segmentation topic will be used in any downstream module, I think entropy can be useful as probability can be overconfident. So far PTv3 does not support real confidence output, therefore maybe we can keep entropy as extra measurement. Please note we do not use this publisher if not subscribed, so the overhead of this change is negligible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amadeuszsz From the perspective of the downstream module, it's fine to change the field unless the existing fields are not deleted.
Currently, the downstream module refers to x/y/z and optionally class_id/probability by field name using PointCloud2ConstIterator, so adding some other field does not affect iteration order.

Copy link
Copy Markdown
Contributor

@manato manato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amadeuszsz
Thank you for your contribution!
I left a few comments on the CUDA code. Your consideration would be appreciated!

Comment thread perception/autoware_ptv3/lib/postprocess/postprocess_kernel.cu Outdated
Comment on lines +92 to +95
for (std::size_t class_idx = 0; class_idx < num_classes; ++class_idx) {
output_probs[idx * num_classes + class_idx] =
has_valid_voxel ? voxel_probs[voxel_idx * num_classes + class_idx] : 0.0f;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance, it looks like this loop still can be parallelized because each iteration is independent of the others. Do you think we can add one more (i.e., y) block dimension for more parallelization?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and postprocessing went up from 4 ms to 7 ms:

  • preprocessing: 2.65 ms
  • inference: 35.48 ms
  • postprocessing: 7.03 ms
  • total: 45.18 ms

Each block does trivially little work, so the GPU block scheduler overhead far exceeds the cost of the original 20~ iteration for loop.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for testing. If you have spare time, it would be great if you also try adding y dimension to the grid instead of the block (i.e., auto y = blockIdx.y; instead of auto y = threadIdx.y;) because this generates a separate executable unit that would be executed on the different SMs, which requires less overhead.

This is not mandatory. I can approve this PR if there are no other unhandled comments from other reviews.

amadeuszsz and others added 2 commits May 7, 2026 19:15
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>

Co-authored-by: Manato Hirabayashi <3022416+manato@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:perception Advanced sensor data processing and environment understanding. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci)

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

5 participants