2525// It behaves similarly to a typical thread.
2626//
2727// M (Machine)
28- // A real operating system thread.
29- // It is responsible for executing a coroutine.
28+ // A real operating system thread, aka worker thread .
29+ // It is responsible for executing a coroutine, timer and other works .
3030// Only as many M instances may be created as permitted by COMAXPROCS.
3131//
3232// P (Processor)
4444// When a parked coroutine is woken up, it must be enqueued into the runnable
4545// coroutine queue. The scheduler decides when it will actually execute.
4646//
47+ // WORKER THREAD PARKING/UNPARKING
48+ //
49+ // We need to balance between keeping enough running worker threads to utilize
50+ // available hardware parallelism and parking excessive running worker threads
51+ // to conserve CPU resources and power. This is not simple for two reasons:
52+ // (1) scheduler state is intentionally distributed (in particular, per-P work
53+ // queues), so it is not possible to compute global predicates on fast paths;
54+ // (2) for optimal thread management we would need to know the future (don't park
55+ // a worker thread when a new coroutine will be readied in near future).
56+ //
57+ // The current approach applies to three primary sources of potential work:
58+ // readying a coroutine and new/modified-earlier timers.
59+ // See below for additional details.
60+ //
61+ // We unpark an additional thread when we submit work if (this is wakep()):
62+ // 1. There is an idle P, and
63+ // 2. There are no "spinning" worker threads.
64+ //
65+ // A worker thread is considered spinning if it is out of local work and did
66+ // not find work in the global run queue or eventpoller; the spinning state is
67+ // denoted in m.spinning and in sched.nmspinning. Threads unparked this way are
68+ // also considered spinning; we don't do coroutine handoff so such threads are
69+ // out of work initially. Spinning threads spin on looking for work in per-P
70+ // run queues and timer heaps or from the GC before parking. If a spinning
71+ // thread finds work it takes itself out of the spinning state and proceeds to
72+ // execution. If it does not find work it takes itself out of the spinning
73+ // state and then parks.
74+ //
75+ // If there is at least one spinning thread (sched.nmspinning>1), we don't
76+ // unpark new threads when submitting work. To compensate for that, if the last
77+ // spinning thread finds work and stops spinning, it must unpark a new spinning
78+ // thread. This approach smooths out unjustified spikes of thread unparking,
79+ // but at the same time guarantees eventual maximal CPU parallelism
80+ // utilization.
81+ //
82+ // The main implementation complication is that we need to be very careful
83+ // during spinning->non-spinning thread transition. This transition can race
84+ // with submission of new work, and either one part or another needs to unpark
85+ // another worker thread. If they both fail to do that, we can end up with
86+ // semi-persistent CPU underutilization.
87+ //
88+ // The general pattern for submission is:
89+ // 1. Submit work to the local or global run queue, or timer heap.
90+ // 2. #StoreLoad-style memory barrier.
91+ // 3. Check sched.nmspinning.
92+ //
93+ // The general pattern for spinning->non-spinning transition is:
94+ // 1. Decrement nmspinning.
95+ // 2. #StoreLoad-style memory barrier.
96+ // 3. Check all per-P work queues for new work.
97+ //
98+ // Note that all this complexity does not apply to global run queue as we are
99+ // not sloppy about thread unparking when submitting to global queue. Also see
100+ // comments for nmspinning manipulation.
101+ //
102+ // How these different sources of work behave varies, though it doesn't affect
103+ // the synchronization approach:
104+ // * Ready coroutine: this is an obvious source of work; the coroutine is
105+ // immediately ready and must run on some thread eventually.
106+ // * New/modified-earlier timer: The current timer implementation uses eventpoll
107+ // in a thread with no work available to wait for the soonest timer.
108+ // If there is no thread waiting, we want a new spinning thread to go wait.
109+ //
47110// STACK-ORIENTED COROUTINE HANDLING
48111//
49112// Jule runtime prioritizes avoiding heap allocations for coroutines.
50113// Each coroutine instance is used from the stack. When necessary,
51114// it can be copied or passed around via references/pointers.
52115// Since it is stack-oriented, it must be handled carefully.
53116// Otherwise, a stale coroutine copy may be used and cause critical issues.
117+ //
118+ // SYSCALLS/BLOCKING OPERATIONS
119+ //
120+ // Syscalls that cannot be scheduled by the scheduler (for example, blocking
121+ // read/write operations or waiting for a process) and other blocking tasks
122+ // block/occupy the worker thread (M) executing them. The scheduler cannot
123+ // detect this situation, and therefore assumes that the worker thread is still
124+ // executing work, so no new M is spawned to replace it. This can lead to a loss
125+ // of parallelism.
126+ //
127+ // The scheduler or the standard library does not explicitly try to prevent
128+ // this and instead delegates the responsibility to the developer. The scheduler
129+ // always assumes that coroutines are schedulable. This follows a "force correct
130+ // usage" philosophy rather than a "save everything" philosophy. Blocking tasks
131+ // must be handled by the developer outside of the scheduler.
132+ //
133+ // Go addresses this at the scheduler level. To solve this, when a worker thread
134+ // is blocked (for example, in a syscall), it releases its P, allowing the
135+ // scheduler to pair the released P with another M and continue executing ready
136+ // work. While this solves the lost worker thread problem, it can lead to a
137+ // significant increase in thread creation. In such cases, an approach similar
138+ // to "one thread per blocking task" is not sufficiently efficient or
139+ // performant.
140+ //
141+ // The Jule runtime does not attempt to automatically tolerate blocking tasks,
142+ // but it does not completely ignore them either. A runtime-managed blocking task
143+ // pool is provided for developers. This pool creates a reasonable number of
144+ // worker threads and distributes all blocking tasks among them. Developers are
145+ // expected to intentionally manage blocking tasks using this pool.
146+ // The implementation is located at: `runtime/blocking.jule`
147+ //
148+ // THREAD SYNCHRONIZATION
149+ //
150+ // Various mechanisms are used together to synchronize threads. For example,
151+ // eventpoll uses operating-system–specific facilities such as kqueue, epoll,
152+ // or IOCP in the background. This ensures that the thread does not consume CPU
153+ // while waiting in eventpoll, and it is also used to wait for the possible timer.
154+ //
155+ // In cases that require thread park/unpark, a `parker` is used. The parker
156+ // relies on synchronization primitives such as futex, depending
157+ // on the underlying operating system.
158+ //
159+ // CREDITS
160+ //
161+ // The overall structure and several scheduling concepts of this scheduler are
162+ // heavily inspired by the Go runtime scheduler. In particular, the C:M:P model,
163+ // work distribution strategy, and worker thread parking/unparking logic follow
164+ // the same high-level design philosophy.
165+ //
166+ // This is NOT a reimplementation of Go's scheduler, nor a line-by-line port.
167+ // The implementation is purpose-built for the Jule runtime, adapted to its
168+ // coroutine model, ABI, and constraints.
169+ //
170+ // See Go's runtime scheduler at the source code: `runtime/proc.go`
54171
55172use "std/internal/runtime"
56173use "std/internal/runtime/atomic"
@@ -279,7 +396,8 @@ fn pidleget(): &p {
279396fn pidlegetSpinning(): &p {
280397 mut pp := pidleget()
281398 if pp == nil {
282- // We found work that we cannot take, we must synchronize with non-spinning
399+ // See "Delicate dance" comment in findRunnable. We found work
400+ // that we cannot take, we must synchronize with non-spinning
283401 // Ms that may be preparing to drop their P.
284402 atomic::Store(&sched.needspinning, 1, atomic::Release)
285403 ret nil
@@ -522,8 +640,20 @@ fn injectclist(mut &batch: *[prunqsize]c, batchStart: u32, bsize: u32) {
522640 runqputbatch(m.pp, batch, batchStart+n, bsize)
523641 }
524642
643+ // Some P's might have become idle after we loaded `sched.npidle`
644+ // but before any coroutines were added to the queue, which could
645+ // lead to idle P's when there is work available in the global queue.
646+ // That could potentially last until other coroutines become ready
647+ // to run. That said, we need to find a way to hedge
648+ //
649+ // Calling wakep() here is the best bet, it will do nothing in the
650+ // common case (no racing on `sched.npidle`), while it could wake one
651+ // more P to execute C's, which might end up with >1 P's: the first one
652+ // wakes another P and so forth until there is no more work, but this
653+ // ought to be an extremely rare case.
654+ //
655+ // Also see "Worker thread parking/unparking" comment at the top of the file for details.
525656 wakep()
526- ret
527657}
528658
529659// Gets a coroutine from local runnable queue and writes to cp.
@@ -849,11 +979,13 @@ top:
849979 // behalf. If we are not racing and the system is truly fully loaded
850980 // then no spinning threads are required, and the next thread to
851981 // naturally become spinning will clear the flag.
982+ //
983+ // Also see "Worker thread parking/unparking" comment at the top of the file.
852984 wasSpinning := m.spinning
853985 if m.spinning {
854986 m.spinning = false
855987 if atomic::Add(&sched.nmspinning, -1, atomic::Relaxed) < 0 {
856- panic("findrunnable : negative nmspinning")
988+ panic("findRunnable : negative nmspinning")
857989 }
858990
859991 // Note the for correctness, only the last M transitioning from
@@ -1286,9 +1418,10 @@ fn resetspinning() {
12861418 m.spinning = false
12871419 nmspinning := atomic::Add(&sched.nmspinning, -1, atomic::Release)
12881420 if nmspinning < 0 {
1289- panic("findrunnable : negative nmspinning")
1421+ panic("findRunnable : negative nmspinning")
12901422 }
12911423 // M wakeup policy is deliberately somewhat conservative, so check if we
1292- // need to wakeup another P here.
1424+ // need to wakeup another P here. See "Worker thread parking/unparking"
1425+ // comment at the top of the file for details.
12931426 wakep()
12941427}
0 commit comments