tests: adapt new async tests to tight-CPU CI runners

leifericf · claude · leifericf · commit 407e74a3a38e · 2026-05-17T09:08:53.000+02:00
Two of the v0.255.27 regression tests (added in the BC safepoint
poll commit) failed on macos-14 and ubuntu-24.04 runners with
MTH001 thread-limit-exceeded. Root cause: CI runners have ~3 CPU
allocations (vs 12 on dev), so the host thread grant is small;
combined with the test order (busy-spin's N workers immediately
followed by ex-info-data-preserved's 1 future), the prior workers
were still in worker_run cleanup when the next spawn ran -- their
thread_count slots not yet released.

Fixes:

- async-busy-spin-does-not-starve-siblings: clamp N to
  (max 2 (min 4 (- mino-thread-limit 2))) so the test fits any
  3+ thread budget. Save the writer futures and deref each one
  after the assertions, then thread-sleep 200ms so post-publish
  cleanup completes before the next test.

- async-future-cancel-interrupts-cpu-bound: bumped its existing
  cleanup wait from 100ms to 200ms for the same reason.

Local: 1290 / 4654 green; release-gate green.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,22 @@
 # Changelog
 
+## v0.255.28 — Fix: tighten new async tests for tight-CPU CI runners
+
+Follow-up to v0.255.27: two of the new regression tests
+(`async-busy-spin-does-not-starve-siblings` and
+`async-future-cancel-interrupts-cpu-bound`) failed on macos-14 and
+ubuntu-24.04 GHA runners with MTH001 `thread-limit-exceeded`. CI
+runners get ~3 CPU allocations vs 12 on dev; combined with the
+test order, prior workers were still in worker_run cleanup when
+the next spawn ran -- their thread_count slots not yet released.
+
+- busy-spin: clamp N to the host's thread grant, save the writer
+  futures and deref each after assertions, then thread-sleep 200ms
+  so post-publish cleanup completes before the next test.
+- future-cancel: existing 100ms cleanup wait bumped to 200ms.
+
+No runtime code change; just test hygiene.
+
 ## v0.255.27 — Bug-fix sweep: deref/regex/location/concurrency/cleanup
 
 Nine fixes landed in this patch, covering Clojure-canon correctness
diff --git a/src/mino.h b/src/mino.h
@@ -28,7 +28,7 @@
  */
 #define MINO_VERSION_MAJOR 0
 #define MINO_VERSION_MINOR 255
-#define MINO_VERSION_PATCH 27
+#define MINO_VERSION_PATCH 28
 
 /*
  * Human-readable version string of the *linked* runtime, e.g. "0.48.0".
diff --git a/tests/async_smoke_test.clj b/tests/async_smoke_test.clj
@@ -116,13 +116,15 @@
   ;; Fix: BC VM safepoint poll at every backward jump reads through
   ;; a TLS pointer to the worker's owning impl->cancel_flag and throws
   ;; :mino/cancelled when set. The worker unwinds, worker_run
-  ;; publishes CANCELLED, quiesce's join returns.
+  ;; publishes CANCELLED, quiesce's join completes.
   (testing "future-cancel breaks a tight recur loop and lets script exit"
     (let [f (future (loop [i 0] (recur (inc i))))]
       (thread-sleep 50)
       (future-cancel f)
-      ;; Give the worker a moment to observe the cancel and unwind.
-      (thread-sleep 100)
+      ;; Give the worker a moment to observe the cancel, unwind, and
+      ;; release its thread_count slot. CI runners with tight CPU
+      ;; budgets fail the next test's spawn otherwise.
+      (thread-sleep 200)
       (is (future-cancelled? f)))))
 
 (deftest async-busy-spin-does-not-starve-siblings
@@ -132,18 +134,31 @@
   ;; auto-yields state_lock periodically (~64K backward jumps) so
   ;; siblings get scheduling time.
   (testing "busy-spin reader doesn't block writer futures from delivering"
-    (let [n 4
-          ps (vec (repeatedly n promise))]
-      (dotimes [i n]
-        (future (dotimes [_ 200] :work)
-                (deliver (nth ps i) :done)))
-      (let [reader (future
-                     (loop [it 0]
-                       (if (every? realized? ps)
-                         :done
-                         (recur (inc it)))))]
-        (doseq [p ps] (is (= :done @p)))
-        (is (= :done @reader))))))
+    ;; Adapt n to the host's thread grant. Reserve one slot for the
+    ;; reader future + main thread; clamp [2, 4]. CI runners with
+    ;; 3-4 CPU allocations need the lower bound; high-core dev
+    ;; machines don't need the upper.
+    (let [n       (max 2 (min 4 (- (mino-thread-limit) 2)))
+          ps      (vec (repeatedly n promise))
+          writers (vec (for [i (range n)]
+                         (future (dotimes [_ 200] :work)
+                                 (deliver (nth ps i) :done))))
+          reader  (future
+                    (loop [it 0]
+                      (if (every? realized? ps)
+                        :done
+                        (recur (inc it)))))]
+      (doseq [p ps] (is (= :done @p)))
+      (is (= :done @reader))
+      ;; Deref each writer so its body has returned (publish complete),
+      ;; then sleep briefly so the worker thread can finish its post-
+      ;; publish cleanup and release its thread_count slot. Without
+      ;; this two-step, CI runners with tight CPU budgets can hit
+      ;; MTH001 on the following test's spawn even though the test
+      ;; itself only spawns one future -- the prior workers are still
+      ;; in the worker_run cleanup path with their slots reserved.
+      (doseq [w writers] (deref w))
+      (thread-sleep 200))))
 
 (deftest async-future-ex-info-data-preserved
   ;; Regression: when a future body throws (ex-info "..." {:k :v}),