refactor(scheduler): use Own and RawRef for TaskQueue#514
Conversation
George-Miao
left a comment
There was a problem hiding this comment.
I think we need some debug-gated failsafe mechanism.....
5462d0b to
2a801eb
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the task scheduler to use a new Own<T> and RawRef<T> pattern instead of the previous Arc/Weak reference counting approach. The primary goal is to ensure that TaskQueue (which contains SendWrapper) is always dropped on its creator thread, avoiding potential panics from cross-thread drops.
- Introduces new
raw_ref.rsmodule withOwn(ownership wrapper) andRawRef(raw pointer wrapper) types - Replaces
Arc<TaskQueue>withOwn<TaskQueue>and capturesRawRef<TaskQueue>in the scheduler closure - Removes
sync::Arcfrom imports and updates the schedule closure to use raw pointer dereferencing
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| compio-runtime/src/runtime/scheduler/raw_ref.rs | New module introducing Own<T> and RawRef<T> primitives for unsafe lifetime-erased references |
| compio-runtime/src/runtime/scheduler/mod.rs | Refactors from Arc<TaskQueue> to Own<TaskQueue> and updates the schedule closure to use RawRef::as_ref() |
| compio-runtime/Cargo.toml | Version bump from 0.9.4 to 0.9.5 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // SAFETY: | ||
| // `Scheduler` drains and drops all `Runnable`s before it is dropped. | ||
| // If this closure is still invoked and can push a `Runnable`, the | ||
| // `Scheduler` is necessarily still alive, so the `task_queue` reference | ||
| // obtained from `raw_ref()` is valid. | ||
| unsafe { task_queue.as_ref() }.push(runnable, ¬ify); |
There was a problem hiding this comment.
The safety reasoning here has a potential soundness issue. While the comment states that "Scheduler drains and drops all Runnables before it is dropped", this doesn't account for Wakers that may have been cloned and stored outside of the scheduler's control (e.g., in user code or external futures).
If a Waker is cloned and stored elsewhere, then used after the Runtime/Scheduler is dropped, the schedule closure would dereference the dangling RawRef, causing undefined behavior. The previous implementation using Arc::downgrade() and upgrade() was safer because it would fail gracefully (panic on unwrap) rather than cause UB.
Consider either:
- Documenting that Wakers must not outlive the Runtime/Scheduler, or
- Adding runtime checks to detect use-after-drop, or
- Reverting to a reference-counted approach with better drop ordering to avoid the SendWrapper issue
| // `Scheduler` drains and drops all `Runnable`s before it is dropped. | ||
| // If this closure is still invoked and can push a `Runnable`, the | ||
| // `Scheduler` is necessarily still alive, so the `task_queue` reference | ||
| // obtained from `raw_ref()` is valid. | ||
| unsafe { task_queue.as_ref() }.push(runnable, ¬ify); |
There was a problem hiding this comment.
If this closure is still invoked and can push a
Runnable, theScheduleris necessarily still alive
How about the scheduler is being dropped while the waker is kept on another thread? Scheduler::drop and Waker::wake could be executed simultaneously.
There was a problem hiding this comment.
If a Waker::wake() is called from another thread and the schedule closure is successfully invoked, then the Scheduler cannot be in the process of being dropped. The latest possible state is that Scheduler::clear() is running (the Scheduler’s documentation requires calling Scheduler::clear() before drop, and the Scheduler must always be dropped with an empty TaskQueue). Once clear() finishes, any Waker::wake() becomes a no-op, and the schedule closure will no longer be invoked.
There was a problem hiding this comment.
There's no such guarantee. Here's a possible case:
Waker::wake()is called and enters the schedule closure.TaskQueue::pushenters.- The runtime is going to drop, and
Scheduler::clear(). - The runtime is dropped, so is the task queue.
TaskQueue::pushtries to push a runnable. BOOM.
There was a problem hiding this comment.
There’s an important detail that was overlooked. In Scheduler::clear(), it first explicitly wakes all the Wakers in active_tasks. So the latest state at which a Waker from another thread can invoke the schedule closure is before Scheduler::clear() wakes the corresponding Waker in active_tasks (async-task avoids duplicate scheduling).
There was a problem hiding this comment.
Yes, but the given case says that the waker is waked before Scheduler::clear. There's no duplicate scheduling.
There was a problem hiding this comment.
Could you explain about the memory leaks? Especially in #513 .
At least memory leaks are safe and sound. Or we might need to rethink about writing our own Task.
There was a problem hiding this comment.
What I mean by “memory leak” refers to the case where the closure captures an Arc instead of a Weak, because it seems that using Weak has a chance of failing to upgrade if the closure can run at the same time as the runtime drop.
There was a problem hiding this comment.
Weak is atomic so the upgrade is always safe. But yes, if it fails, the only safe solution is to panic, I think...
There was a problem hiding this comment.
Yes, that’s the choice right now: a memory leak or a panic. As a temporary workaround, the small chance of a memory leak is still the better option. Later we can look for a cleaner solution (maybe rewrite the task).
There was a problem hiding this comment.
OK, let's focus on #513, then. I'll push a commit to forget instead of panic.
|
Closing because #513 has been merged. |
Use the
Own+RawRefmodel to avoidSendWrapperbeing dropped on another thread via a temporary upgradedArc.Supersedes #513, Closes #512