Skip to content

fix(cie): fix cancel_executor race condition#1278

Open
atsushi421 wants to merge 1 commit intomainfrom
fix/cancel-executor-race
Open

fix(cie): fix cancel_executor race condition#1278
atsushi421 wants to merge 1 commit intomainfrom
fix/cancel-executor-race

Conversation

@atsushi421
Copy link
Copy Markdown
Collaborator

@atsushi421 atsushi421 commented Apr 22, 2026

Description

Port race condition fix from upstream callback_isolated_executor#52.

The thread_initialized_ flag was set to true before spin() internally sets spinning=true. If cancel_executor observed thread_initialized_==true and skipped the is_spinning() wait, it could call cancel() (which sets spinning=false) before spin() set spinning=true. The subsequent spin() would override the cancellation and run indefinitely, blocking thread.join() forever.

The fix:

  • Remove the thread_initialized_ fast-path and always wait for is_spinning() before calling cancel().
  • Remove the thread_initialized_ field from ExecutorWrapper entirely.
  • Add a joinable() guard to avoid undefined behavior if the thread was never started.

Related links

How was this PR tested?

  • Autoware (required)
  • bash scripts/test/e2e_test_1to1.bash (required)
  • bash scripts/test/e2e_test_2to2.bash (required)
  • kunit tests (required when modifying the kernel module)
  • bash scripts/test/run_requires_kernel_module_tests.bash (required)
  • sample application

Notes for reviewers

Version Update Label (Required)

Please add exactly one of the following labels to this PR:

  • need-major-update: User API breaking changes
  • need-minor-update: Internal API breaking changes (heaphook/kmod/agnocastlib compatibility)
  • need-patch-update: Bug fixes and other changes

Important notes:

  • If you need need-major-update or need-minor-update, please include this in the PR title as well.
    • Example: fix(foo)[needs major version update]: bar or feat(baz)[needs minor version update]: qux
  • After receiving approval from reviewers, add the run-build-test label. The PR can only be merged after the build tests pass.

See CONTRIBUTING.md for detailed versioning rules.

The `thread_initialized` flag was set before `spin()` internally sets
`spinning=true`. If `cancel_executor` observed `thread_initialized==true`
and skipped the `is_spinning()` wait, it could call `cancel()` before
`spin()` set `spinning=true`, causing `spin()` to run indefinitely and
block `join()` forever.

Remove the `thread_initialized` flag entirely and always wait for
`is_spinning()` before calling `cancel()`. Also add a `joinable()` guard
to avoid UB if the thread was never started.

Signed-off-by: atsushi421 <atsushi.yano.2@tier4.jp>
@atsushi421 atsushi421 added the need-patch-update Bug fixes and other changes - requires PATCH version update label Apr 22, 2026
@atsushi421 atsushi421 requested a review from Copilot April 22, 2026 21:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports an upstream fix to prevent a race in cancel_executor() where cancellation could be lost if it happened before the executor’s spin() marks itself as spinning, potentially causing join() to block indefinitely.

Changes:

  • Remove the thread_initialized_ fast-path and always wait for executor_->is_spinning() before calling cancel().
  • Remove thread_initialized_ from ExecutorWrapper.
  • Add a thread_.joinable() guard to avoid undefined behavior when joining.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@atsushi421 atsushi421 marked this pull request as ready for review April 22, 2026 21:59
@kobayu858 kobayu858 added the run-build-test Run build-test in CI label Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

need-patch-update Bug fixes and other changes - requires PATCH version update run-build-test Run build-test in CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants