Skip to content

refactor(scheduler): use Own and RawRef for TaskQueue#514

Closed
Paraworker wants to merge 1 commit into
compio-rs:masterfrom
Paraworker:raw-ref
Closed

refactor(scheduler): use Own and RawRef for TaskQueue#514
Paraworker wants to merge 1 commit into
compio-rs:masterfrom
Paraworker:raw-ref

Conversation

@Paraworker
Copy link
Copy Markdown
Contributor

Use the Own + RawRef model to avoid SendWrapper being dropped on another thread via a temporary upgraded Arc.

Supersedes #513, Closes #512

Copy link
Copy Markdown
Member

@George-Miao George-Miao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some debug-gated failsafe mechanism.....

@Paraworker Paraworker force-pushed the raw-ref branch 2 times, most recently from 5462d0b to 2a801eb Compare November 17, 2025 03:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the task scheduler to use a new Own<T> and RawRef<T> pattern instead of the previous Arc/Weak reference counting approach. The primary goal is to ensure that TaskQueue (which contains SendWrapper) is always dropped on its creator thread, avoiding potential panics from cross-thread drops.

  • Introduces new raw_ref.rs module with Own (ownership wrapper) and RawRef (raw pointer wrapper) types
  • Replaces Arc<TaskQueue> with Own<TaskQueue> and captures RawRef<TaskQueue> in the scheduler closure
  • Removes sync::Arc from imports and updates the schedule closure to use raw pointer dereferencing

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
compio-runtime/src/runtime/scheduler/raw_ref.rs New module introducing Own<T> and RawRef<T> primitives for unsafe lifetime-erased references
compio-runtime/src/runtime/scheduler/mod.rs Refactors from Arc<TaskQueue> to Own<TaskQueue> and updates the schedule closure to use RawRef::as_ref()
compio-runtime/Cargo.toml Version bump from 0.9.4 to 0.9.5

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +156 to +161
// SAFETY:
// `Scheduler` drains and drops all `Runnable`s before it is dropped.
// If this closure is still invoked and can push a `Runnable`, the
// `Scheduler` is necessarily still alive, so the `task_queue` reference
// obtained from `raw_ref()` is valid.
unsafe { task_queue.as_ref() }.push(runnable, &notify);
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safety reasoning here has a potential soundness issue. While the comment states that "Scheduler drains and drops all Runnables before it is dropped", this doesn't account for Wakers that may have been cloned and stored outside of the scheduler's control (e.g., in user code or external futures).

If a Waker is cloned and stored elsewhere, then used after the Runtime/Scheduler is dropped, the schedule closure would dereference the dangling RawRef, causing undefined behavior. The previous implementation using Arc::downgrade() and upgrade() was safer because it would fail gracefully (panic on unwrap) rather than cause UB.

Consider either:

  1. Documenting that Wakers must not outlive the Runtime/Scheduler, or
  2. Adding runtime checks to detect use-after-drop, or
  3. Reverting to a reference-counted approach with better drop ordering to avoid the SendWrapper issue

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +161
// `Scheduler` drains and drops all `Runnable`s before it is dropped.
// If this closure is still invoked and can push a `Runnable`, the
// `Scheduler` is necessarily still alive, so the `task_queue` reference
// obtained from `raw_ref()` is valid.
unsafe { task_queue.as_ref() }.push(runnable, &notify);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this closure is still invoked and can push a Runnable, the Scheduler is necessarily still alive

How about the scheduler is being dropped while the waker is kept on another thread? Scheduler::drop and Waker::wake could be executed simultaneously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a Waker::wake() is called from another thread and the schedule closure is successfully invoked, then the Scheduler cannot be in the process of being dropped. The latest possible state is that Scheduler::clear() is running (the Scheduler’s documentation requires calling Scheduler::clear() before drop, and the Scheduler must always be dropped with an empty TaskQueue). Once clear() finishes, any Waker::wake() becomes a no-op, and the schedule closure will no longer be invoked.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no such guarantee. Here's a possible case:

  1. Waker::wake() is called and enters the schedule closure.
  2. TaskQueue::push enters.
  3. The runtime is going to drop, and Scheduler::clear().
  4. The runtime is dropped, so is the task queue.
  5. TaskQueue::push tries to push a runnable. BOOM.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s an important detail that was overlooked. In Scheduler::clear(), it first explicitly wakes all the Wakers in active_tasks. So the latest state at which a Waker from another thread can invoke the schedule closure is before Scheduler::clear() wakes the corresponding Waker in active_tasks (async-task avoids duplicate scheduling).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the given case says that the waker is waked before Scheduler::clear. There's no duplicate scheduling.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain about the memory leaks? Especially in #513 .

At least memory leaks are safe and sound. Or we might need to rethink about writing our own Task.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean by “memory leak” refers to the case where the closure captures an Arc instead of a Weak, because it seems that using Weak has a chance of failing to upgrade if the closure can run at the same time as the runtime drop.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak is atomic so the upgrade is always safe. But yes, if it fails, the only safe solution is to panic, I think...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s the choice right now: a memory leak or a panic. As a temporary workaround, the small chance of a memory leak is still the better option. Later we can look for a cleaner solution (maybe rewrite the task).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's focus on #513, then. I'll push a commit to forget instead of panic.

@Berrysoft
Copy link
Copy Markdown
Member

Closing because #513 has been merged.

@Berrysoft Berrysoft closed this Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wake_cross_thread still randomly crashes

4 participants