Skip to content

Commit b9603f9

Browse files
leifericfclaude
andcommitted
gc: finish in-flight major before nested minor in MINO_GC_FULL
Root cause of the CI hang at transient-survives-gc-yield across all four matrix entries: mino_gc_collect(MINO_GC_FULL) ran the nested minor BEFORE finishing the in-flight major. The minor freed a YOUNG header that was still on major's mark stack from a prior incremental slice; the subsequent gc_force_finish_major chased the freed pointer through gc_mark_child_push. gc_tick_during_major (driver.c:141-147) documents the correct ordering -- "finish-then-minor rather than nest" -- and honours it for the auto-tick path. The public-API path simply had the calls reversed. Auto-tick is the common case, so the bug stayed unnoticed locally; CI's test ordering puts transient-survives-gc-yield (which explicitly calls (gc!)) inside an in-flight major on every push, making the hang deterministic. Symptom shape varied with environment: * GHA matrix: 8-min timeout in the Test step at the identical post- io_test point on macos-14, ubuntu-24.04, ubuntu-24.04-arm, and windows-2022. The v0.255.8 trace artifact pinned it to transient-survives-gc-yield. * Local sans sanitizer: intermittent SIGSEGV in tiny_free_no_lock (sweep double-free) or error[MTY001] "inc expects a number" (corrupted value via the reused freed header). * Local under ASan: heap-use-after-free at gc_mark_child_push driver.c:414, freer site gc_minor_collect minor.c:472, both inside the same mino_gc_collect call. Fix: reorder MINO_GC_FULL to finish the major BEFORE the minor. MINO_GC_MINOR and MINO_GC_MAJOR unchanged. Regression: tests/gc_test.clj gains gc-bang-during-incremental-major. Warms up enough OLDs to start an incremental major, then calls (gc!) in the transient shape that surfaced the bug. Fails under the old ordering, passes under the fix. Verified: tests/run.clj 1274/4557 green; mino_asan 1274/4557 green; release-gate 17/17 green; 5x consecutive reproducer runs all green with no flakiness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent da33681 commit b9603f9

4 files changed

Lines changed: 94 additions & 4 deletions

File tree

CHANGELOG.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,55 @@
11
# Changelog
22

3+
## v0.255.9 — Fix: `(gc!)` During In-Flight Major Mark Use-After-Free
4+
5+
Root cause of the v0.255.6 / .7 / .8 CI hang: `mino_gc_collect(MINO_GC_FULL)`
6+
(the `(gc!)` primitive) ran the nested minor BEFORE finishing the
7+
in-flight major. The minor freed a YOUNG header that was still on
8+
major's mark stack from a prior incremental slice; the subsequent
9+
`gc_force_finish_major` chased the freed pointer through
10+
`gc_mark_child_push`.
11+
12+
`gc_tick_during_major` (driver.c:141-147) already documents the
13+
correct ordering -- "finish-then-minor rather than nest" -- and
14+
honours it for the auto-tick path; the public-API path simply had
15+
the calls reversed. Auto-tick is the common case so the bug went
16+
unnoticed locally; CI's test ordering puts `transient-survives-gc-
17+
yield` (which explicitly calls `(gc!)`) inside an in-flight major
18+
on every push, surfacing the bug deterministically.
19+
20+
Symptom shape under each environment:
21+
22+
* GitHub Actions matrix (macos-14 / ubuntu-24.04{,-arm} /
23+
windows-2022): 8-min timeout in the Test step, identical hang
24+
point across all four OSes -- the v0.255.8 diagnostic trace
25+
pinned the hanging deftest to `transient-survives-gc-yield`.
26+
* Local without sanitizer: intermittent SIGSEGV in
27+
`tiny_free_no_lock` (sweep double-free) or
28+
`error[MTY001] inc expects a number` (corrupted value via the
29+
reused freed header).
30+
* Local under ASan: clean repro --
31+
`heap-use-after-free at gc_mark_child_push driver.c:414`
32+
with the freer site at `gc_minor_collect minor.c:472` inside the
33+
same `mino_gc_collect` call.
34+
35+
Fix: reorder `MINO_GC_FULL` to finish the major BEFORE the minor,
36+
mirroring `gc_tick_during_major`. Behaviour for `MINO_GC_MINOR` and
37+
`MINO_GC_MAJOR` unchanged.
38+
39+
Regression: `tests/gc_test.clj` gains
40+
`gc-bang-during-incremental-major` -- warms up enough OLDs to start
41+
an incremental major, then calls `(gc!)` mid-mark in the transient
42+
shape that surfaced the bug. Fails (SIGSEGV / inc-error / hang)
43+
under the old ordering, passes under the fix. Full suite climbs
44+
from 1273 / 4555 to 1274 / 4557 (one deftest, two assertions).
45+
46+
Local verification:
47+
* `./mino tests/run.clj`: 1274 / 4557 green in 1.6s.
48+
* `./mino_asan tests/run.clj`: 1274 / 4557 green, ASan clean.
49+
* `./mino task release-gate`: 17/17 probes green.
50+
* 5x consecutive `./mino /tmp/repro.clj` (35 transient deftests):
51+
all green, no flakiness.
52+
353
## v0.255.8 — Diagnostic: Per-Deftest Trace for CI Hang Investigation
454

555
A diagnostic-only release: surfaces which test is running at the

src/mino.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
*/
2929
#define MINO_VERSION_MAJOR 0
3030
#define MINO_VERSION_MINOR 255
31-
#define MINO_VERSION_PATCH 8
31+
#define MINO_VERSION_PATCH 9
3232

3333
/*
3434
* Human-readable version string of the *linked* runtime, e.g. "0.48.0".

src/public/gc.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,18 @@ void mino_gc_collect(mino_state_t *S, mino_gc_kind_t kind)
3535
if (mino_current_ctx(S)->gc_stack_bottom == NULL) {
3636
return;
3737
}
38-
if (S->gc_bytes_young > 0) {
39-
gc_minor_collect(S);
40-
}
38+
/* Finish any in-flight major BEFORE running the minor. A nested
39+
* minor while major's mark stack still holds OLD entries could
40+
* free a YOUNG object reachable only through an OLD pointer
41+
* pending on major's stack; major's next gc_trace_children
42+
* would then chase the freed pointer. Same invariant the
43+
* auto-tick path (gc_tick_during_major) honours. */
4144
if (S->gc_phase == GC_PHASE_MAJOR_MARK) {
4245
gc_force_finish_major(S);
4346
}
47+
if (S->gc_bytes_young > 0) {
48+
gc_minor_collect(S);
49+
}
4450
if (S->gc_phase == GC_PHASE_IDLE) {
4551
gc_major_collect(S);
4652
}

tests/gc_test.clj

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,37 @@
2929
acc
3030
(build__gc (- n 1) (list acc)))))
3131
(is (cons? (build__gc 200 42))))
32+
33+
;; Regression for gc! during mid-major-mark: mino_gc_collect(MINO_GC_FULL)
34+
;; used to run the minor BEFORE finishing the in-flight major; the minor
35+
;; would free a YOUNG header still on the major's mark stack, and the
36+
;; subsequent major step would chase the freed pointer. Surfaces on CI
37+
;; runners as a hang in tests/transient_test (test-suite ordering puts
38+
;; transient-survives-gc-yield inside an in-flight major). Locally
39+
;; reproduced via ASan as heap-use-after-free in gc_mark_child_push.
40+
;;
41+
;; The trigger is: heat enough OLD objects to start an incremental
42+
;; major, then call gc! while the major's mark stack is still
43+
;; non-empty. The fixed mino_gc_collect now finish-majors first, then
44+
;; runs the minor, matching the auto-tick path's invariant.
45+
(deftest gc-bang-during-incremental-major
46+
;; Warm up: promote many objects so the next minor will start a
47+
;; major. Each loop iteration grows a vec; the vec survives long
48+
;; enough to be promoted.
49+
(let [warmup (loop [i 0 acc []]
50+
(if (= i 5000)
51+
acc
52+
(recur (inc i) (conj acc i))))]
53+
(is (= 5000 (count warmup))))
54+
;; Now run the pattern that used to corrupt: transient + gc!
55+
;; sequence under an active major mark. If the bug regresses we
56+
;; get a SIGSEGV in gc_sweep / gc_mark_child_push, or a stray
57+
;; "inc expects a number" caught by the test framework.
58+
(let [t (transient [])]
59+
(conj! t 1)
60+
(gc!)
61+
(conj! t 2)
62+
(gc!)
63+
(conj! t 3)
64+
(gc!)
65+
(is (= [1 2 3] (persistent! t)))))

0 commit comments

Comments
 (0)