Skip to content

Use Base.Semaphore to control test execution parallelism#119

Open
giordano wants to merge 5 commits intomainfrom
mg/semaphore
Open

Use Base.Semaphore to control test execution parallelism#119
giordano wants to merge 5 commits intomainfrom
mg/semaphore

Conversation

@giordano
Copy link
Collaborator

This is a refactoring of the code, to then enable #77 in a follow up PR (CC @christiangnrd).

The idea of using a semaphore is mine, initial implementation is from Claude, but I made a few manual refinements afterwards (some of them folded in the first commit). Overall summary of the changes:

Replace the fixed worker-task-per-slot model with a semaphore-based
approach: one task per test, with a Base.Semaphore(jobs) limiting
concurrency and a Channel-based worker pool for reuse. This decouples
the number of tasks from the parallelism level and simplifies the
control flow (no inner while loop, tests array is immutable).

I'm mostly happy about the result, it's very close to what I had in mind, and the net diff is relatively small (+48/-22), so this shouldn't be too hard to review.

#118 was useful because it detected that the case of no tests to run wasn't handled correctly, so yay for the extra tests.

giordano and others added 4 commits March 25, 2026 18:49
Replace the fixed worker-task-per-slot model with a semaphore-based
approach: one task per test, with a Base.Semaphore(jobs) limiting
concurrency and a Channel-based worker pool for reuse. This decouples
the number of tasks from the parallelism level and simplifies the
control flow (no inner while loop, tests array is immutable).

Co-authored-by: Claude <noreply@anthropic.com>
Made-with: Cursor
@giordano giordano requested review from maleadt and vchuravy March 26, 2026 01:19
Comment on lines +1020 to +1027
for p in workers
tests_to_start = Threads.Atomic{Int}(length(tests))
for test in tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core change I was interested in was changing this for loop from an iteration over the workers, to an iteration over the tests: the idea for #77 is to have different semaphores for different subsets of tests, and here we iterate over the different subsets (and associated semaphores). The whole execution cycle would then require only this change on this single line, the rest would remain as is.

I still need to fully work out the user-facing changes for designating the serial tests for #77. I have some bad ideas in mind, but I want to decouple them from this refactoring, which I believe is still worth on its own right for making the code easier to follow.

@giordano
Copy link
Collaborator Author

giordano commented Mar 26, 2026

Uhm, the test failure

default workers stopped at end: Test Failed at /Users/runner/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:472
  Expression: after == before
   Evaluated: 1 == 2
  Stacktrace:
   [1] top-level scope
     @ ~/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:10
   [2] macro expansion
     @ ~/hostedtoolcache/julia/nightly/aarch64/share/julia/stdlib/v1.14/Test/src/Test.jl:2243 [inlined]
   [3] macro expansion
     @ ~/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:424 [inlined]
   [4] macro expansion
     @ ~/hostedtoolcache/julia/nightly/aarch64/share/julia/stdlib/v1.14/Test/src/Test.jl:2243 [inlined]
   [5] macro expansion
     @ ~/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:472 [inlined]
   [6] macro expansion
     @ ~/hostedtoolcache/julia/nightly/aarch64/share/julia/stdlib/v1.14/Test/src/Test.jl:781 [inlined]

is interesting:

  1. it happened only on macOS with Julia nightly, but that's precisely what I use locally, and tests passed for me several times 😅
  2. the fact that there were fewer subprocesses still alive after the tests finished than before is quite surprising: if something went wrong with the termination of the subprocesses I'd expect more processes to be still around, not less.

The error disappeared after a rerun, I'd say it was a glitch? 😬 Edit: tests on all platforms always passed in the following 3 full reruns.

@giordano
Copy link
Collaborator Author

giordano commented Mar 26, 2026

I pushed a change to use a Threads.@spawn + @sync mechanism, which more closely matches what I had in mind for addressing #77.

As a standalone demo of my idea for running the serial tests, followed by the fully concurrent tests, start Julia with 4 threads and run the following code:

@time "outer loop" for (tests, semaphore) in ((1:4, Base.Semaphore(1)), (5:12, Base.Semaphore(4)))
    @info "running batch" tests semaphore
    @time "inner loop - semaphore size $(semaphore.sem_size)" @sync for test in tests
        Threads.@spawn Base.acquire(semaphore) do
            @show test
            sleep(1)
        end
    end
end

You should observe something like

┌ Info: running batch
│   tests = 1:4
└   semaphore = Base.Semaphore(1, 0, Base.GenericCondition(ReentrantLock()))
test = 4
test = 3
test = 2
test = 1
inner loop - semaphore size 1: 4.016979 seconds (15.27 k allocations: 845.766 KiB, 0.73% compilation time)
┌ Info: running batch
│   tests = 5:12
└   semaphore = Base.Semaphore(4, 0, Base.GenericCondition(ReentrantLock()))
test = 5
test = 8
test = 6
test = 12
test = 9
test = 11
test = 10
test = 7
inner loop - semaphore size 4: 2.005876 seconds (163 allocations: 8.047 KiB, 7 lock conflicts)
outer loop: 6.088579 seconds (73.02 k allocations: 3.853 MiB, 7 lock conflicts, 1.54% compilation time)

The first batch of "tests" (1 to 4), associated to the semaphore of size 1, is run serially, and takes 4 seconds, the second batch of "tests" (5 to 12), associated to the semaphore of size 4, runs concurrently and takes 2 seconds, for a total of 6 seconds for the overall "test suite".

With this design, we could even have multiple semaphores of intermediate sizes between 1 and njobs, but I'm not planning to do that at the moment.

Edit: as a complete demo of my proposal for #77, with the following hacky change

diff --git a/src/ParallelTestRunner.jl b/src/ParallelTestRunner.jl
index 31f77f9..106c428 100644
--- a/src/ParallelTestRunner.jl
+++ b/src/ParallelTestRunner.jl
@@ -826,11 +826,6 @@ function runtests(mod::Module, args::ParsedArgs;
     jobs = clamp(jobs, 1, length(tests))
     println(stdout, "Running $(length(tests)) tests using $jobs parallel jobs. If this is too many concurrent jobs, specify the `--jobs=N` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.")
     !isnothing(args.verbose) && println(stdout, "Available memory: $(Base.format_bytes(available_memory()))")
-    sem = Base.Semaphore(max(1, jobs))
-    worker_pool = Channel{Union{Nothing, PTRWorker}}(jobs)
-    for _ in 1:jobs
-        put!(worker_pool, nothing)
-    end
 
     t0 = time()
     results = []
@@ -1022,9 +1017,15 @@ function runtests(mod::Module, args::ParsedArgs;
     #
     # execution
     #
+    worker_pool = Channel{Union{Nothing, PTRWorker}}(jobs)
+    for _ in 1:jobs
+        put!(worker_pool, nothing)
+    end
 
     tests_to_start = Threads.Atomic{Int}(length(tests))
-    @sync for test in tests
+    tests_semaphores = ((tests[1:4], Base.Semaphore(1)), (tests[5:end], Base.Semaphore(max(1, jobs))))
+    for (batch, sem) in tests_semaphores
+    @sync for test in batch
         push!(worker_tasks, Threads.@spawn begin
             local p = nothing
             acquired = false
@@ -1123,6 +1124,7 @@ function runtests(mod::Module, args::ParsedArgs;
             end
         end)
     end
+    end
 
     #
     # finalization

run

using ParallelTestRunner
testsuite = Dict(string(l) => :(sleep(1)) for l in 'a':'l');
runtests(ParallelTestRunner, ["--verbose", "--jobs=4"]; testsuite);

you should get

Running 12 tests using 4 parallel jobs. If this is too many concurrent jobs, specify the `--jobs=N` argument to the tests, or set the `JULIA_CPU_THREADS` environment variable.
Available memory: 8.825 GiB
               │   Test   │   Init   │ Compile │ ──────────────── CPU ──────────────── │
Test  (Worker) │ time (s) │ time (s) │   (%)   │ GC (s) │ GC % │ Alloc (MB) │ RSS (MB) │
l          (1) │        started at 2026-03-26T12:06:56.175
l          (1) │     1.06 │     2.09 │    5.80 │   0.00 │  0.0 │       2.01 │   321.14 │
k          (2) │        started at 2026-03-26T12:06:59.336
k          (2) │     1.08 │     1.88 │    7.09 │   0.00 │  0.0 │       2.01 │   319.59 │
i          (3) │        started at 2026-03-26T12:07:02.369
i          (3) │     1.08 │     1.82 │    7.36 │   0.00 │  0.0 │       2.01 │   321.47 │
j          (4) │        started at 2026-03-26T12:07:05.431
j          (4) │     1.08 │     1.82 │    7.10 │   0.00 │  0.0 │       2.01 │   321.00 │
d          (1) │        started at 2026-03-26T12:07:07.880
h          (2) │        started at 2026-03-26T12:07:07.880
e          (3) │        started at 2026-03-26T12:07:07.880
g          (4) │        started at 2026-03-26T12:07:07.881
e          (3) │     1.00 │     0.23 │    0.00 │   0.00 │  0.0 │       0.00 │   322.14 │
c          (3) │        started at 2026-03-26T12:07:09.210
d          (1) │     1.00 │     0.23 │    0.00 │   0.00 │  0.0 │       0.00 │   322.28 │
b          (1) │        started at 2026-03-26T12:07:09.210
h          (2) │     1.00 │     0.23 │    0.00 │   0.00 │  0.0 │       0.00 │   323.28 │
a          (2) │        started at 2026-03-26T12:07:09.210
g          (4) │     1.00 │     0.23 │    0.00 │   0.00 │  0.0 │       0.00 │   322.50 │
f          (4) │        started at 2026-03-26T12:07:09.211
f          (4) │     1.00 │     0.22 │    0.00 │   0.00 │  0.0 │       0.00 │   322.66 │
b          (1) │     1.00 │     0.22 │    0.00 │   0.00 │  0.0 │       0.00 │   322.98 │
a          (2) │     1.00 │     0.23 │    0.00 │   0.00 │  0.0 │       0.00 │   325.30 │
c          (3) │     1.00 │     0.22 │    0.00 │   0.00 │  0.0 │       0.00 │   323.34 │

Test Summary: | Total   Time
  Overall     |     0  15.5s
    l         |     0   1.1s
    k         |     0   1.1s
    i         |     0   1.1s
    j         |     0   1.1s
    e         |     0   1.0s
    d         |     0   1.0s
    h         |     0   1.0s
    g         |     0   1.0s
    f         |     0   1.0s
    b         |     0   1.0s
    a         |     0   1.0s
    c         |     0   1.0s
    SUCCESS

First batch of 4 tests were run serially, the rest of the tests was run concurrently. This also shows that the workers are still recycled correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants