Skip to content

Commit 06f2a5b

Browse files
leifericfclaude
andcommitted
Fix mino_pcall catch arm: drop publish to last_error
The previous attempt at the lock-leak fix routed pcall's catch arm through a non-throwing record_eval_diag variant. That fixed the immediate longjmp problem, but it left a stale diag in last_error that downstream code misread as a fresh error -- specifically eval_impl's "evaled == NULL && mino_last_error != NULL" gate would fire on a subsequent successful call's NULL arg-list and propagate the stale diag, breaking the test runner mid-suite. Drop the publish entirely. pcall now restores the eval bookkeeping that was active at entry and returns -1 without touching last_error / last_diag. Callers that want a diag set after pcall returns -1 call set_eval_diag explicitly. record_eval_diag is removed (no callers). The lock-leak regression test from the previous attempt still passes -- the throw is caught, the lock is released, the second dosync completes -- it just doesn't observe a stale diag in last_error anymore. Internal suite 1513 / 7169 / 0. External baseline unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 1dba1da commit 06f2a5b

5 files changed

Lines changed: 45 additions & 77 deletions

File tree

CHANGELOG.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -377,23 +377,22 @@ would longjmp out of `run_ref_validator` past `stm_unlock`,
377377
leaving `S->stm_commit_lock` permanently held. The next
378378
`dosync` deadlocked.
379379

380-
Fix: extracted a non-throwing variant `record_eval_diag` from
381-
`set_eval_diag` and routed `mino_pcall`'s catch arm through it.
382-
The throw-conversion path of `set_eval_diag` is preserved for
383-
its primary use case (turning eval-phase diagnostics into
384-
catchable exceptions).
385-
386-
`tests/stm_test.clj`'s new
387-
`validator-throw-does-not-deadlock-stm-lock` regression test
388-
proves the lock is released after a validator throw (the first
389-
tx errors via retry exhaustion; the second tx completes
390-
normally instead of hanging).
391-
392-
The agent code's `agent_try_call` (added in E.4 to work around
393-
the same pcall bug) is left in place. It's not strictly
394-
necessary now but documents the historical pcall contract; a
395-
follow-up commit can fold it back into `mino_pcall` if the
396-
agent maintainers prefer one mechanism.
380+
Fix: `mino_pcall`'s catch arm no longer publishes anything to
381+
`last_error` / `last_diag`. Callers that want a diag set after
382+
`pcall` returns -1 do it explicitly. (An interim attempt routed
383+
the publish through a non-throwing `record_eval_diag` variant,
384+
but that left a stale diag in `last_error` that `eval_impl`'s
385+
`evaled == NULL && mino_last_error != NULL` check would then
386+
misread as a fresh error during a later call — flushed via the
387+
follow-up commit that drops the publish entirely.)
388+
389+
`tests/stm_test.clj`'s `validator-throw-does-not-deadlock-stm-lock`
390+
regression test proves the lock is released after a validator
391+
throw.
392+
393+
The agent code's `agent_try_call` workaround (added in E.4) is
394+
left in place by this commit; the follow-up replaces it with a
395+
direct `mino_pcall` call now that the catch arm is well-behaved.
397396

398397
### Agents (MVP)
399398

src/runtime/error.c

Lines changed: 13 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -87,33 +87,6 @@ void set_error_at(mino_state_t *S, const mino_val_t *form, const char *msg)
8787
}
8888
}
8989

90-
/* Store an eval-phase diagnostic without converting to a thrown
91-
* exception. Use this from places that need to record an error
92-
* without disturbing control flow -- in particular from a catch
93-
* arm that has already absorbed a thrown exception and now needs
94-
* to publish a message via mino_last_error / mino_last_error_map.
95-
*
96-
* set_eval_diag's longjmp path is the right behavior for primitive
97-
* error reporting (an error inside a try block becomes a catchable
98-
* exception). It is the wrong behavior in a catch arm: a second
99-
* longjmp would unwind further than the catcher intends, leaving
100-
* locks held and bookkeeping unrun. */
101-
void record_eval_diag(mino_state_t *S, const mino_val_t *form,
102-
const char *kind, const char *code, const char *msg)
103-
{
104-
mino_diag_t *d = diag_new(kind, code, "eval", msg);
105-
if (d != NULL && form != NULL && form->type == MINO_CONS
106-
&& form->as.cons.file != NULL && form->as.cons.line > 0) {
107-
mino_span_t span;
108-
memset(&span, 0, sizeof(span));
109-
span.file = form->as.cons.file;
110-
span.line = form->as.cons.line;
111-
span.column = form->as.cons.column;
112-
diag_set_span(d, span);
113-
}
114-
set_diag(S, d);
115-
}
116-
11790
/* Classified eval-phase diagnostic from a form with source info. */
11891
void set_eval_diag(mino_state_t *S, const mino_val_t *form,
11992
const char *kind, const char *code, const char *msg)
@@ -125,7 +98,19 @@ void set_eval_diag(mino_state_t *S, const mino_val_t *form,
12598
mino_current_ctx(S)->try_stack[mino_current_ctx(S)->try_depth - 1].exception = ex;
12699
longjmp(mino_current_ctx(S)->try_stack[mino_current_ctx(S)->try_depth - 1].buf, 1);
127100
}
128-
record_eval_diag(S, form, kind, code, msg);
101+
{
102+
mino_diag_t *d = diag_new(kind, code, "eval", msg);
103+
if (d != NULL && form != NULL && form->type == MINO_CONS
104+
&& form->as.cons.file != NULL && form->as.cons.line > 0) {
105+
mino_span_t span;
106+
memset(&span, 0, sizeof(span));
107+
span.file = form->as.cons.file;
108+
span.line = form->as.cons.line;
109+
span.column = form->as.cons.column;
110+
diag_set_span(d, span);
111+
}
112+
set_diag(S, d);
113+
}
129114
}
130115

131116
/* Return a short human-readable label for a value's type. */

src/runtime/internal.h

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -891,13 +891,6 @@ const char *source_cache_get_line(mino_state_t *S, const char *file,
891891
void set_eval_diag(mino_state_t *S, const mino_val_t *form,
892892
const char *kind, const char *code,
893893
const char *msg);
894-
/* Non-throwing variant: stores the diagnostic without converting it
895-
* to a thrown exception. Use from catch arms that have already
896-
* absorbed a throw and just need to publish the message via
897-
* mino_last_error / mino_last_error_map. */
898-
void record_eval_diag(mino_state_t *S, const mino_val_t *form,
899-
const char *kind, const char *code,
900-
const char *msg);
901894
const char *type_tag_str(const mino_val_t *v); /* static string */
902895
void push_frame(mino_state_t *S, const char *name, /* name: borrowed */
903896
const char *file, int line, /* file: borrowed */

src/runtime/state.c

Lines changed: 10 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -761,30 +761,20 @@ int mino_pcall(mino_state_t *S, mino_val_t *fn, mino_val_t *args, mino_env_t *en
761761
mino_current_ctx(S)->try_stack[mino_current_ctx(S)->try_depth].saved_load_len = S->load_stack_len;
762762
if (setjmp(mino_current_ctx(S)->try_stack[mino_current_ctx(S)->try_depth].buf) != 0) {
763763
/* Landed here from longjmp -- error was thrown. */
764-
mino_val_t *ex = mino_current_ctx(S)->try_stack[saved_try].exception;
765764
S->current_ns = mino_current_ctx(S)->try_stack[saved_try].saved_ns;
766765
S->fn_ambient_ns = mino_current_ctx(S)->try_stack[saved_try].saved_ambient;
767766
load_stack_truncate(S, mino_current_ctx(S)->try_stack[saved_try].saved_load_len);
768767
mino_current_ctx(S)->try_depth = saved_try;
769-
/* Populate last_error from the exception value so the host
770-
* can inspect it via mino_last_error(). Use record_eval_diag
771-
* (non-throwing) -- set_eval_diag would re-throw to any
772-
* outer try frame, leaking out of pcall's catch arm and
773-
* skipping bookkeeping the caller depends on (e.g. the STM
774-
* commit lock). */
775-
if (mino_last_error(S) == NULL || mino_last_error(S)[0] == '\0') {
776-
const char *s = NULL;
777-
size_t slen = 0;
778-
if (ex != NULL && mino_to_string(ex, &s, &slen)) {
779-
(void)slen;
780-
record_eval_diag(S, mino_current_ctx(S)->eval_current_form, "eval/contract", "MCT001", s);
781-
} else {
782-
record_eval_diag(S, mino_current_ctx(S)->eval_current_form, "eval/contract", "MCT001", "unhandled exception");
783-
}
784-
}
785-
if (out != NULL) {
786-
*out = NULL;
787-
}
768+
/* pcall does NOT publish to last_error / last_diag on a caught
769+
* throw. The eval loop has read sites that treat
770+
* mino_last_error as "an error happened during my call" (see
771+
* eval_impl's "evaled == NULL && mino_last_error != NULL"
772+
* gate), so leaving a stale diag from a successfully-caught
773+
* pcall would mislead the next failing call into thinking
774+
* its own error had already been reported. Callers that
775+
* want a diag published after pcall returns -1 call
776+
* set_eval_diag explicitly. */
777+
if (out != NULL) *out = NULL;
788778
return -1;
789779
}
790780
mino_current_ctx(S)->try_depth++;

tests/stm_test.clj

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -157,11 +157,12 @@
157157

158158
(deftest validator-throw-does-not-deadlock-stm-lock
159159
;; Regression: a validator that throws would previously leak the
160-
;; STM commit lock because mino_pcall's catch arm called set_eval_diag,
161-
;; which longjmps to any enclosing try frame -- bypassing tx_commit's
162-
;; stm_unlock. The next dosync would then hang on the leaked lock.
163-
;; Fixed by routing pcall's catch arm through record_eval_diag (the
164-
;; non-throwing variant). The first tx errors via the retry path; the
160+
;; STM commit lock because mino_pcall's catch arm called
161+
;; set_eval_diag, which longjmps to any enclosing try frame --
162+
;; bypassing tx_commit's stm_unlock. The next dosync would then
163+
;; hang on the leaked lock. Fixed by removing the publish-to-
164+
;; last_error step from pcall's catch arm; pcall now just unwinds
165+
;; to the caller without re-throwing. The first tx errors; the
165166
;; second tx must complete instead of hanging.
166167
(let [r (ref 1)]
167168
(set-validator! r (fn [v] (if (zero? v)

0 commit comments

Comments
 (0)