Skip to content

Commit 407e74a

Browse files
leifericfclaude
andcommitted
tests: adapt new async tests to tight-CPU CI runners
Two of the v0.255.27 regression tests (added in the BC safepoint poll commit) failed on macos-14 and ubuntu-24.04 runners with MTH001 thread-limit-exceeded. Root cause: CI runners have ~3 CPU allocations (vs 12 on dev), so the host thread grant is small; combined with the test order (busy-spin's N workers immediately followed by ex-info-data-preserved's 1 future), the prior workers were still in worker_run cleanup when the next spawn ran -- their thread_count slots not yet released. Fixes: - async-busy-spin-does-not-starve-siblings: clamp N to (max 2 (min 4 (- mino-thread-limit 2))) so the test fits any 3+ thread budget. Save the writer futures and deref each one after the assertions, then thread-sleep 200ms so post-publish cleanup completes before the next test. - async-future-cancel-interrupts-cpu-bound: bumped its existing cleanup wait from 100ms to 200ms for the same reason. Local: 1290 / 4654 green; release-gate green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 66341a3 commit 407e74a

3 files changed

Lines changed: 48 additions & 16 deletions

File tree

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,22 @@
11
# Changelog
22

3+
## v0.255.28 — Fix: tighten new async tests for tight-CPU CI runners
4+
5+
Follow-up to v0.255.27: two of the new regression tests
6+
(`async-busy-spin-does-not-starve-siblings` and
7+
`async-future-cancel-interrupts-cpu-bound`) failed on macos-14 and
8+
ubuntu-24.04 GHA runners with MTH001 `thread-limit-exceeded`. CI
9+
runners get ~3 CPU allocations vs 12 on dev; combined with the
10+
test order, prior workers were still in worker_run cleanup when
11+
the next spawn ran -- their thread_count slots not yet released.
12+
13+
- busy-spin: clamp N to the host's thread grant, save the writer
14+
futures and deref each after assertions, then thread-sleep 200ms
15+
so post-publish cleanup completes before the next test.
16+
- future-cancel: existing 100ms cleanup wait bumped to 200ms.
17+
18+
No runtime code change; just test hygiene.
19+
320
## v0.255.27 — Bug-fix sweep: deref/regex/location/concurrency/cleanup
421

522
Nine fixes landed in this patch, covering Clojure-canon correctness

src/mino.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
*/
2929
#define MINO_VERSION_MAJOR 0
3030
#define MINO_VERSION_MINOR 255
31-
#define MINO_VERSION_PATCH 27
31+
#define MINO_VERSION_PATCH 28
3232

3333
/*
3434
* Human-readable version string of the *linked* runtime, e.g. "0.48.0".

tests/async_smoke_test.clj

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -116,13 +116,15 @@
116116
;; Fix: BC VM safepoint poll at every backward jump reads through
117117
;; a TLS pointer to the worker's owning impl->cancel_flag and throws
118118
;; :mino/cancelled when set. The worker unwinds, worker_run
119-
;; publishes CANCELLED, quiesce's join returns.
119+
;; publishes CANCELLED, quiesce's join completes.
120120
(testing "future-cancel breaks a tight recur loop and lets script exit"
121121
(let [f (future (loop [i 0] (recur (inc i))))]
122122
(thread-sleep 50)
123123
(future-cancel f)
124-
;; Give the worker a moment to observe the cancel and unwind.
125-
(thread-sleep 100)
124+
;; Give the worker a moment to observe the cancel, unwind, and
125+
;; release its thread_count slot. CI runners with tight CPU
126+
;; budgets fail the next test's spawn otherwise.
127+
(thread-sleep 200)
126128
(is (future-cancelled? f)))))
127129

128130
(deftest async-busy-spin-does-not-starve-siblings
@@ -132,18 +134,31 @@
132134
;; auto-yields state_lock periodically (~64K backward jumps) so
133135
;; siblings get scheduling time.
134136
(testing "busy-spin reader doesn't block writer futures from delivering"
135-
(let [n 4
136-
ps (vec (repeatedly n promise))]
137-
(dotimes [i n]
138-
(future (dotimes [_ 200] :work)
139-
(deliver (nth ps i) :done)))
140-
(let [reader (future
141-
(loop [it 0]
142-
(if (every? realized? ps)
143-
:done
144-
(recur (inc it)))))]
145-
(doseq [p ps] (is (= :done @p)))
146-
(is (= :done @reader))))))
137+
;; Adapt n to the host's thread grant. Reserve one slot for the
138+
;; reader future + main thread; clamp [2, 4]. CI runners with
139+
;; 3-4 CPU allocations need the lower bound; high-core dev
140+
;; machines don't need the upper.
141+
(let [n (max 2 (min 4 (- (mino-thread-limit) 2)))
142+
ps (vec (repeatedly n promise))
143+
writers (vec (for [i (range n)]
144+
(future (dotimes [_ 200] :work)
145+
(deliver (nth ps i) :done))))
146+
reader (future
147+
(loop [it 0]
148+
(if (every? realized? ps)
149+
:done
150+
(recur (inc it)))))]
151+
(doseq [p ps] (is (= :done @p)))
152+
(is (= :done @reader))
153+
;; Deref each writer so its body has returned (publish complete),
154+
;; then sleep briefly so the worker thread can finish its post-
155+
;; publish cleanup and release its thread_count slot. Without
156+
;; this two-step, CI runners with tight CPU budgets can hit
157+
;; MTH001 on the following test's spawn even though the test
158+
;; itself only spawns one future -- the prior workers are still
159+
;; in the worker_run cleanup path with their slots reserved.
160+
(doseq [w writers] (deref w))
161+
(thread-sleep 200))))
147162

148163
(deftest async-future-ex-info-data-preserved
149164
;; Regression: when a future body throws (ex-info "..." {:k :v}),

0 commit comments

Comments
 (0)