Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 2: ExecutionContext #15302

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ysbaddaden
Copy link
Contributor

Here finally comes the first draft pull request for RFC 2 extracted from the execution_context shard.

Status

All three execution contexts have been implemented:

  • ExecutionContext::SingleThreaded to create a concurrent only context (the fibers run on a single thread but will run in parallel to other fibers in other contexts); this is the default context (unless you specify -Dmt).

  • ExecutionContext::MultiThreaded to create a concurrent+parallel context with work stealing (fibers in the context may be resumed by any thread); this is the default context is you specify -Dmt.

  • ExecutionContext::Isolated to run a single fiber in a dedicated thread (no concurrency, no parallelism), while still being able to communicate with other fibers in other contexts normally (Channel, Mutex, ...), doing IO operations, or spawning fibers (transparently to another context).

Both the single and multi threaded contexts share the same queues and overall logic, but with different optimizations. The isolated context doesn't need any queues and relies on a special loop. It's the only context that can shutdown for the time being (we may implement cooperation shutdown or shrinking/growing the MT context).

Alongside the execution contexts, a monitoring thread is running, for the moment limited to collecting fiber stacks regularly, but shall evolve (in subsequent pull requests) to handle much more situations by monitoring the execution contexts. For example the shard has a proof-of-concept for a cooperative yield for fibers that have been running for too time (may be checked at cancellation points or manually in CPU heavy loops). See the TODO in monitor.cr for more exciting ideas.

Stability

Ovber the development of the shard, the schedulers have proved hard to fix of race conditions, though all known races have been squashed, and the schedulers have proved to be quite stable (and fast).

So far both the ST and MT contexts can run the crystal std specs... save for:

  • MT: some sporadic segfaults 😭 maybe because of the same-thread fiber assumption being broken, or maybe an issue with threads starting up or shutdown down in parallel to GC collections.
  • ST: a GC bug where the GC will sometimes enter an infinite loop trying to allocate a large object (see Infinite loop when trying to allocate large object (v8.2.8) ivmai/bdwgc#691), the issue can't be reproduced with GC 8.3 (unreleased).

Usage

The feature is opt-in for the time being.

You must compile your application with both the -Dexecution_context to use the ExecutionContext schedulers, and -Dpreview_mt compile time flags.

Notes

A number of individual commits peripheral to the current feature have already been extracted into individual pull requests. The ones that haven't are because they may still change as this PR continues to evolve or may be dropped (e.g. Thread::WaitGroup).

Copy link

@Qard Qard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very cool. I’m excited for this! 🚀

@ysbaddaden ysbaddaden force-pushed the feature/execution-contexts branch 2 times, most recently from bb5e112 to f131156 Compare January 7, 2025 13:38
@ysbaddaden ysbaddaden linked an issue Jan 14, 2025 that may be closed by this pull request
Introduces the first EC scheduler that runs in a single thread. Uses the
same queues (Runnables, GlobalQueue) as the multi-threaded scheduler
that will come next. The Runnables local queue could be simplified (no
parallel accesses, hence no need for atomics) at the expense of
duplicating the implementation.

The scheduler doesn't need to actively park the thread, since the event
loops always block (when told to), even when they are no events, which
acts as parking the thread.
Introduces the second EC scheduler that runs in multiple threads. Uses
the thread-safe queues (Runnables, GlobalQueue).

Contrary to the ST scheduler, the MT scheduler needs to actively park
the thread in addition to waiting on the event loop, because only one
thread is allowed to run the event loop.
Introduces the last EC scheduler that runs a single fiber in a single
thread. Contrary to the other schedulers, concurrency is disabled.

Like the ST scheduler, the scheduler doesn't need to actively park the
thread and merely waits on the event loop.
@ysbaddaden ysbaddaden force-pushed the feature/execution-contexts branch from f131156 to 21aba17 Compare February 11, 2025 16:00
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Feb 11, 2025

Rebased from master with the latest #15345 and #15350 squashed in.

I can run the std specs with the ST context, with an occasional hang of Boehm GC.

I can also run the std specs with the MT context, with an occasional segfault when GC collects memory. So far I suspect the specs have some MT issues with fibers jumping threads... or something related to libxml or something else?

Invalid memory access (signal 11) at address 0x2f00000001
[0x5b988942b616] print_backtrace at /home/julien/work/crystal-lang/crystal/src/exception/call_stack/libunwind.cr:106:5
[0x5b98887bc0d3] -> at /home/julien/work/crystal-lang/crystal/src/crystal/system/unix/signal.cr:200:5
[0x7689d3842520] ?? +130334331249952 in /lib/x86_64-linux-gnu/libc.so.6
[0x5b988a39427d] GC_generic_malloc_many at /home/julien/src/gc-8.2.8/mallocx.c:434:32
[0x5b988a3a366b] GC_malloc_kind at /home/julien/src/gc-8.2.8/thread_local_alloc.c:187:5
[0x5b988959ed36] malloc_atomic at /home/julien/work/crystal-lang/crystal/src/gc/boehm.cr:191:7
[0x5b988959ecde] malloc_atomic at /home/julien/work/crystal-lang/crystal/src/gc.cr:88:5
[0x5b988942de79] interpolation at /home/julien/work/crystal-lang/crystal/src/string.cr:254:5
[0x5b98898f5907] report at /home/julien/work/crystal-lang/crystal/src/spec/context.cr:352:29
[0x5b98898f5898] report at /home/julien/work/crystal-lang/crystal/src/spec/context.cr:351:15
[0x5b98898b99d4] internal_run at /home/julien/work/crystal-lang/crystal/src/spec/example.cr:51:7
[0x5b988880265e] -> at /home/julien/work/crystal-lang/crystal/src/spec/example.cr:37:73
[0x5b98898b8771] run at /home/julien/work/crystal-lang/crystal/src/spec/example/procsy.cr:16:15
[0x5b98888026d6] -> at /home/julien/work/crystal-lang/crystal/src/spec/context.cr:374:11
[0x5b98898b8771] run at /home/julien/work/crystal-lang/crystal/src/spec/example/procsy.cr:16:15
[0x5b98887bf30f] -> at /home/julien/work/crystal-lang/crystal/spec/support/mt_abort_timeout.cr:10:7
[0x5b988973c832] run at /home/julien/work/crystal-lang/crystal/src/fiber.cr:170:11
[0x5b98887bc736] -> at /home/julien/work/crystal-lang/crystal/src/fiber.cr:105:3
[0x0] ???
make: *** [Makefile:118 : std_spec] Erreur 11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Implement RFC 0002: ExecutionContext [EPIC]
2 participants