Skip to content

perf(autoware_tensorrt_plugins): remove Thrust from sort kernels#12554

Draft
mojomex wants to merge 8 commits intoautowarefoundation:mainfrom
mojomex:perf/trt-plugins-no-thrust
Draft

perf(autoware_tensorrt_plugins): remove Thrust from sort kernels#12554
mojomex wants to merge 8 commits intoautowarefoundation:mainfrom
mojomex:perf/trt-plugins-no-thrust

Conversation

@mojomex
Copy link
Copy Markdown
Contributor

@mojomex mojomex commented May 7, 2026

Stack

This PR is stacked on #12561, which adds the reference kernel tests and the minimal pre-existing unique counts fix those tests expose. Review #12561 first; this PR contains the no-thrust implementation and cleanup on top.

Summary

Removes Thrust from the TensorRT plugin sort kernels and keeps the mutable unique temp-storage follow-up used in the benchmarked variant.

This draft PR corresponds to the benchmarked variant ptv3-t18-no-thrust-c8f76ed-20260506.
All PRs in this cohort target main; each later PR contains the changes benchmarked in the earlier ones.

Cohort

Benchmarks

Source report: reports/2026-05-07_22-20-12/report.md

Total Latency Summary

Variant Measurement CPU mean (ms) CPU p95 (ms) CPU faster vs baseline GPU mean (ms) GPU p95 (ms) GPU faster vs baseline Mean voxels
ptv3-t18 series (7 x 50) 29.117 31.755 +0.0% 28.020 30.598 +0.0% 120572
ptv3-t18-no-thrust-c8f76ed-20260506 series (7 x 50) 27.559 28.502 +5.7% 26.416 27.306 +6.1% 120572
ptv3-t18-no-thrust-no-alloc-e9515b790-20260506 series (7 x 50) 26.874 28.863 +8.3% 25.741 27.665 +8.9% 120572
ptv3-t18-no-thrust-no-alloc-no-sync-13f3672a0-20260506 series (7 x 50) 26.278 27.471 +10.8% 25.140 26.288 +11.5% 120572
ptv3-t18-no-thrust-no-alloc-no-sync-maxnumel-47bf5656f-20260506 series (7 x 50) 26.214 26.642 +11.1% 25.084 25.488 +11.7% 120572
ptv3-t18-no-thrust-no-alloc-no-sync-maxnumel-maxauxstreams1 series (7 x 50) 25.378 25.984 +14.7% 24.230 24.792 +15.6% 120572
ptv3-t18-no-thrust-no-alloc-no-sync-maxnumel-maxauxstreams3 series (7 x 50) 26.809 28.083 +8.6% 25.614 26.849 +9.4% 120572

Relative Performance

Relative performance graph

@github-actions github-actions Bot added type:documentation Creating or refining documentation. (auto-assigned) component:perception Advanced sensor data processing and environment understanding. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) component:planning Route planning, decision-making, and navigation. (auto-assigned) component:control Vehicle control algorithms and mechanisms. (auto-assigned) component:system System design and integration. (auto-assigned) component:vehicle Vehicle-specific implementations, drivers, packages. (auto-assigned) type:ci Continuous Integration (CI) processes and testing. (auto-assigned) component:common Common packages from the autoware-common repository. (auto-assigned) component:simulation Virtual environment setups and simulations. (auto-assigned) component:evaluator Evaluation tools for planning, localization etc. (auto-assigned) labels May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@mojomex mojomex force-pushed the perf/trt-plugins-no-thrust branch from ff5ad6d to 23b3f79 Compare May 7, 2026 14:24
@github-actions github-actions Bot removed type:documentation Creating or refining documentation. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) component:planning Route planning, decision-making, and navigation. (auto-assigned) component:control Vehicle control algorithms and mechanisms. (auto-assigned) component:system System design and integration. (auto-assigned) component:vehicle Vehicle-specific implementations, drivers, packages. (auto-assigned) type:ci Continuous Integration (CI) processes and testing. (auto-assigned) component:common Common packages from the autoware-common repository. (auto-assigned) component:simulation Virtual environment setups and simulations. (auto-assigned) component:evaluator Evaluation tools for planning, localization etc. (auto-assigned) labels May 7, 2026
@mojomex
Copy link
Copy Markdown
Contributor Author

mojomex commented May 7, 2026

@codex review. Is the implementation memory safe and has no race conditions? Do the unit tests guarantee correct behavior? Is the implementation as concise and readable as possible?

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mojomex
Copy link
Copy Markdown
Contributor Author

mojomex commented May 8, 2026

@codex Code isn't documented and behavior is not immediately obvious to the reader, e.g. write_unique_offset_sentinel. Document function contracts/pre/post conditions concisely, annotate argsort and unique flows so user can easily follow. For design choices, e.g. where to put sentinels etc., explain the "why".

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

@mojomex mojomex force-pushed the perf/trt-plugins-no-thrust branch from 4d19d0e to 2165fc5 Compare May 8, 2026 05:39
pre-commit-ci-lite Bot and others added 7 commits May 8, 2026 05:42
Co-authored-by: Copilot <copilot@github.com>
Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
Co-authored-by: Copilot <copilot@github.com>
Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
@mojomex mojomex force-pushed the perf/trt-plugins-no-thrust branch from 2165fc5 to 0372bf9 Compare May 8, 2026 06:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:perception Advanced sensor data processing and environment understanding. (auto-assigned)

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

1 participant