feat(autoware_ptv3): source cloud reconstruction & entropy#12547
feat(autoware_ptv3): source cloud reconstruction & entropy#12547amadeuszsz wants to merge 7 commits intoautowarefoundation:mainfrom
Conversation
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
|
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 55caa03d08
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12547 +/- ##
===========================================
- Coverage 18.65% 0.00% -18.66%
===========================================
Files 1917 11 -1906
Lines 131368 759 -130609
Branches 44506 0 -44506
===========================================
- Hits 24501 0 -24501
+ Misses 86756 759 -85997
+ Partials 20111 0 -20111
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR extends the autoware_ptv3 segmentation pipeline to optionally reconstruct PTv3 voxel predictions back onto the original (cropped or full) source point layout, and augments the segmented output with a per-point normalized Shannon entropy field for uncertainty inspection.
Changes:
- Added
source_reconstructionparameter (none/partial/full) and implemented partial/full reconstruction paths (inverse mapping + reconstruction kernels). - Extended segmented pointcloud output with an
entropyfield and updated visualization/filtered outputs to operate on reconstructed outputs. - Updated schema/config/launch to support the new parameter and adjust CUDA topic remappings.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| perception/autoware_ptv3/src/ptv3_node.cpp | Declares source_reconstruction ROS parameter and forwards it into PTv3Config. |
| perception/autoware_ptv3/schema/ptv3.schema.json | Adds schema support for source_reconstruction (enum/default/required). |
| perception/autoware_ptv3/lib/ptv3_trt.cpp | Allocates reconstruction buffers, selects output capacity based on reconstruction mode, clears buffers, and switches postprocess outputs to reconstructed sources. |
| perception/autoware_ptv3/lib/preprocess/preprocess_kernel.cu | Adds inverse-map scatter kernel and expands feature generation to optionally emit source features/cropped source points/inverse mapping. |
| perception/autoware_ptv3/lib/postprocess/postprocess_kernel.cu | Adds entropy to segmentation output and introduces partial/full reconstruction CUDA kernels + safer visualization color lookup. |
| perception/autoware_ptv3/launch/ptv3.launch.xml | Remaps CUDA topics to .../cuda/dummy to force explicit negotiation failures across processes. |
| perception/autoware_ptv3/include/autoware/ptv3/ptv3_trt.hpp | Adds state for source/cropped counts, raw input pointer, and reconstruction buffers. |
| perception/autoware_ptv3/include/autoware/ptv3/ptv3_config.hpp | Introduces SourceReconstruction enum + parsing, and threads the new config through the constructor. |
| perception/autoware_ptv3/include/autoware/ptv3/preprocess/preprocess_kernel.hpp | Updates generateFeatures() signature and exposes crop mask/indices for full reconstruction. |
| perception/autoware_ptv3/include/autoware/ptv3/postprocess/postprocess_kernel.hpp | Updates visualization API and adds reconstruction method declarations. |
| perception/autoware_ptv3/config/ptv3.param.yaml | Sets a default source_reconstruction mode in the shipped param file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp>
There was a problem hiding this comment.
Temporary changes made it into the PR
| float z; | ||
| std::uint8_t class_id; | ||
| float probability; | ||
| float entropy; |
There was a problem hiding this comment.
Is this expected to be used by downstream modules? If it's only for debug/evaluation it should ideally be disabled/excluded in a production context.
There was a problem hiding this comment.
@ktro2828 do you plan to use ~/ptv3/segmentation topic? Are you ok with this change?
Anyway, if segmentation topic will be used in any downstream module, I think entropy can be useful as probability can be overconfident. So far PTv3 does not support real confidence output, therefore maybe we can keep entropy as extra measurement. Please note we do not use this publisher if not subscribed, so the overhead of this change is negligible.
There was a problem hiding this comment.
@amadeuszsz From the perspective of the downstream module, it's fine to change the field unless the existing fields are not deleted.
Currently, the downstream module refers to x/y/z and optionally class_id/probability by field name using PointCloud2ConstIterator, so adding some other field does not affect iteration order.
manato
left a comment
There was a problem hiding this comment.
@amadeuszsz
Thank you for your contribution!
I left a few comments on the CUDA code. Your consideration would be appreciated!
| for (std::size_t class_idx = 0; class_idx < num_classes; ++class_idx) { | ||
| output_probs[idx * num_classes + class_idx] = | ||
| has_valid_voxel ? voxel_probs[voxel_idx * num_classes + class_idx] : 0.0f; | ||
| } |
There was a problem hiding this comment.
At a glance, it looks like this loop still can be parallelized because each iteration is independent of the others. Do you think we can add one more (i.e., y) block dimension for more parallelization?
There was a problem hiding this comment.
I tested it and postprocessing went up from 4 ms to 7 ms:
- preprocessing: 2.65 ms
- inference: 35.48 ms
- postprocessing: 7.03 ms
- total: 45.18 ms
Each block does trivially little work, so the GPU block scheduler overhead far exceeds the cost of the original 20~ iteration for loop.
There was a problem hiding this comment.
Thank you for testing. If you have spare time, it would be great if you also try adding y dimension to the grid instead of the block (i.e., auto y = blockIdx.y; instead of auto y = threadIdx.y;) because this generates a separate executable unit that would be executed on the different SMs, which requires less overhead.
This is not mandatory. I can approve this PR if there are no other unhandled comments from other reviews.
Signed-off-by: Amadeusz Szymko <amadeusz.szymko.2@tier4.jp> Co-authored-by: Manato Hirabayashi <3022416+manato@users.noreply.github.com>
Description
This PR refines the PTv3 runtime point cloud handling and output reconstruction behavior. It adds configurable source reconstruction modes so PTv3 can optionally publish predictions back on the original input point layout instead of only voxel representatives. Changes:
Related links
Parent Issue:
How was this PR tested?
Average latency in milliseconds for sample rosbag input with RTX 3060 laptop.
nonepartialfullOriginal implementation (before this PR) and mode
noneseem to have equivalent latency.partialandfullcome with expected overhead.I would suggest to keep mode
fullas default, because intuitively we expect same output points sequence as input.Notes for reviewers
None.
Interface changes
None.
Effects on system behavior
None.