Skip to content

server, session: interrupt autocommit DML after disconnect#68237

Merged
ti-chi-bot[bot] merged 5 commits into
pingcap:masterfrom
King-Dylan:cover-insert-connection-alive
May 19, 2026
Merged

server, session: interrupt autocommit DML after disconnect#68237
ti-chi-bot[bot] merged 5 commits into
pingcap:masterfrom
King-Dylan:cover-insert-connection-alive

Conversation

@King-Dylan
Copy link
Copy Markdown
Member

@King-Dylan King-Dylan commented May 8, 2026

What problem does this PR solve?

Issue Number: close #68236

Problem Summary:

Autocommit write statements such as INSERT do not write a result set, so the connection-liveness check added for the result-set path does not cover the whole statement lifecycle. If the client disconnects after TiDB has read the request packet, the statement can keep running until expression evaluation, commit, prewrite, or retry/backoff finishes naturally.

What changed and how does it work?

  • Install a SQLKiller connection-liveness callback while executing INSERT, UPDATE, and DELETE, including text protocol EXECUTE and binary prepared execution.
  • Start a lightweight connection monitor for those write statements. When the underlying TCP connection is gone, it sends QueryInterrupted and cancels the current dispatch context, so prewrite/backoff can exit promptly.
  • Keep the existing result-set liveness behavior for queries that return rows.
  • Check connection liveness immediately before committing INSERT / UPDATE / DELETE, so a disconnect observed after execution but before commit is converted into an interrupt.
  • Reuse the dispatch-context cancellation helper for KILL QUERY.
  • Add regressions for autocommit INSERT disconnect before commit and during prewrite/backoff.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Validated locally:

GOTOOLCHAIN=go1.25.9 GOFLAGS=-mod=mod ./tools/check/failpoint-go-test.sh pkg/server/tests/commontest -run 'Test(ClientDisconnectKillsAutocommitInsert|ClientDisconnectCancelsAutocommitInsertPrewrite|Issue57531)$' -count=1
GOTOOLCHAIN=go1.25.9 GOFLAGS=-mod=mod ./tools/check/failpoint-go-test.sh pkg/session -run '^$' -count=1
GOTOOLCHAIN=go1.25.9 GOFLAGS=-mod=mod go test ./pkg/util/sqlkiller -run '^$' -count=1
GOFLAGS=-mod=mod make lint
make bazel_ci_simple_prepare
git diff --check

make bazel_prepare was also attempted, but the current checkout's go.mod contains a local replacement github.com/pingcap/tidb/pkg/parser => ./pkg/parser, and Gazelle failed with:

gazelle: go_repository does not support file path replacements for github.com/pingcap/tidb/pkg/parser -> ./pkg/parser

make bazel_ci_simple_prepare passed and did not generate additional Bazel diffs.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

No expected performance regression. The extra monitor is scoped to INSERT / UPDATE / DELETE execution while the statement is active. It checks once per second in normal builds and exits when the statement finishes.

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Fix an issue that autocommit write statements might not be interrupted promptly after the client disconnects.

Summary by CodeRabbit

  • New Features

    • Targeted in-execution liveness monitoring for autocommit writes and an added pre-commit liveness check for write statements.
  • Bug Fixes

    • More reliable and faster cancellation/cleanup when clients disconnect, including server-side dispatch cancellation and unified kill-signal handling.
  • Tests

    • New tests and helpers validating client disconnect behavior during autocommit writes (prepared and non-prepared) and processlist/connection checks.

Review Change Stack

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 8, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds bufReadConn-driven liveness monitoring wired into SQLKiller, cancels in-flight dispatch on client disconnect for eligible autocommit INSERT/UPDATE/DELETE, performs a pre-commit liveness check, refactors killQuery to use cancelDispatch, and adds integration tests and helpers.

Changes

Autocommit Write Statement Interruption on Client Disconnect

Layer / File(s) Summary
SQLKiller API
pkg/util/sqlkiller/sqlkiller.go
Adds CheckConnectionAlive() and routes periodic liveness failure through sendKillSignal(QueryInterrupted) instead of direct CAS.
Connection Monitoring Infrastructure
pkg/server/conn.go
Imports pkg/util/sqlkiller; adds setSQLKillerConnectionAlive() with cleanup, monitorConnectionAlive() ticker that calls bufReadConn.IsAlive() and sends QueryInterrupted and cancelDispatch(), adds cancelDispatch(), and shouldMonitorConnectionAliveDuringExecute().
Statement Execution Monitoring
pkg/server/conn.go, pkg/server/conn_stmt.go
Conditionally enables monitoring around ExecuteStmt for eligible prepared and direct statements, clears monitoring on early exit (nil result or error), removes unconditional plan-cache monitoring, and sets SQLKiller finish func and InWriteResultSet for successful write result sets.
Pre-Commit Connection Verification
pkg/session/tidb.go
Adds pre-commit CheckConnectionAlive() call in finishStmt for non-read-only, error-free statements and helper shouldCheckConnectionAliveBeforeCommit() restricting to resolved INSERT/UPDATE/DELETE.
Server Kill Signal
pkg/server/server.go
Refactors killQuery to call conn.cancelDispatch() to cancel in-flight dispatch instead of direct conn.mu access; still calls sessVars.SQLKiller.FinishResultSet().
Test Coverage
pkg/server/tests/commontest/tidb_test.go
Adds TestClientDisconnectKillsAutocommitInsert and TestClientDisconnectCancelsAutocommitInsertPrewrite, refactors TestIssue57531 to use getRawNetConn, and introduces helpers enableFastConnectionAliveMonitor, getRawNetConn, processlistCountByInfo, processlistIDByInfo, cleanupProcessByID, and runClientDisconnectAutocommitInsert.

Sequence Diagram

sequenceDiagram
  participant Client
  participant TiDBHandler
  participant Monitor
  participant BufConn
  participant SQLKiller
  Client->>TiDBHandler: SEND INSERT request
  TiDBHandler->>Monitor: setSQLKillerConnectionAlive(bufReadConn.IsAlive)
  Monitor->>BufConn: periodic IsAlive()
  BufConn-->>Monitor: alive=false
  Monitor->>SQLKiller: sendKillSignal(QueryInterrupted)
  Monitor->>TiDBHandler: cancelDispatch()
  TiDBHandler->>SQLKiller: FinishResultSet()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • pingcap/tidb#67655: Both PRs modify finishStmt in pkg/session/tidb.go and touch commit/finish handling.
  • pingcap/tidb#65499: Both PRs modify server.go's killQuery/dispatch cancellation behavior.

Suggested labels

size/M, approved, lgtm

Suggested reviewers

  • xzhangxian1008
  • YangKeao
  • xhebox

Poem

🐰 I twitch my whiskers by the line,

When clients vanish, I sound the chime.
A gentle nudge: QueryInterrupted sings,
The dispatcher stops — no lingering things.
Hops, a carrot, and cleared processlists.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'server, session: interrupt autocommit DML after disconnect' directly and clearly describes the main change: interrupting autocommit DML statements upon client disconnect.
Description check ✅ Passed The description includes the required Issue Number (close #68236), a comprehensive problem summary, detailed explanation of what changed and how, filled checkboxes for testing with validation commands, release notes, and side-effect/documentation assessments.
Linked Issues check ✅ Passed All coding requirements from issue #68236 are met: connection-liveness monitoring for autocommit DML [#68236], dispatch-context cancellation for disconnects and KILL QUERY [#68236], and regression tests for disconnect scenarios [#68236].
Out of Scope Changes check ✅ Passed All changes are directly scoped to the objectives in issue #68236: connection monitoring for autocommit DML, pre-commit liveness checks, dispatch cancellation, and related test coverage for client disconnects.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot Bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 8, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 8, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@King-Dylan King-Dylan requested a review from bb7133 May 8, 2026 22:12
@King-Dylan King-Dylan marked this pull request as ready for review May 8, 2026 22:13
@ti-chi-bot ti-chi-bot Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 8, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented May 8, 2026

@King-Dylan I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
pkg/util/sqlkiller/sqlkiller.go (1)

237-243: ⚡ Quick win

Consider routing through sendKillSignal for consistent kill logging.

CheckConnectionAlive directly CASes killer.Signal and skips the "kill initiated" log emitted by sendKillSignal (line 126). When a disconnect-triggered interrupt fires from the pre-commit path, no audit log records that the kill was initiated, which makes diagnosing post-disconnect aborts harder. The existing HandleSignal periodic check has the same gap, but adding a new code path is a good time to fix it.

♻️ Suggested change
 // CheckConnectionAlive checks whether the connection is alive immediately.
 func (killer *SQLKiller) CheckConnectionAlive() {
 	fn := killer.IsConnectionAlive.Load()
 	if fn != nil && !(*fn)() {
-		atomic.CompareAndSwapUint32(&killer.Signal, 0, QueryInterrupted)
+		killer.sendKillSignal(QueryInterrupted)
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/util/sqlkiller/sqlkiller.go` around lines 237 - 243, CheckConnectionAlive
currently updates killer.Signal directly via atomic.CompareAndSwapUint32 which
bypasses the "kill initiated" audit log; change CheckConnectionAlive to call the
existing sendKillSignal method on SQLKiller instead of performing the CAS itself
(i.e., replace the direct atomic.CompareAndSwapUint32(&killer.Signal, 0,
QueryInterrupted) with killer.sendKillSignal(QueryInterrupted)). Ensure
sendKillSignal continues to perform the atomic CAS and emit the kill-initiated
log so both disconnect-triggered and periodic checks use the same logging path;
reference CheckConnectionAlive, sendKillSignal, SQLKiller, Signal,
QueryInterrupted and HandleSignal when locating the change.
pkg/server/conn.go (3)

2181-2189: ⚡ Quick win

Same defer-cleanup hardening recommended as conn_stmt.go.

This handleStmt site mirrors the lifecycle in executePreparedStmtAndWriteResult. If cc.ctx.ExecuteStmt (line 2186) panics with monitoring set up, the eager cleanup at line 2188 is skipped and the deferred cleanup at line 2231 only runs on the rs != nil branch. The cleanup function is sync.Once-wrapped, so a defer clearConnectionAlive() immediately after setSQLKillerConnectionAlive() is safe and would close the panic-leak window in one place.

See the matching comment on pkg/server/conn_stmt.go lines 313-330 for the suggested refactor — applying the same change here would keep the two execution paths symmetric.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn.go` around lines 2181 - 2189, The cleanup function
clearConnectionAlive can be skipped if cc.ctx.ExecuteStmt panics when
monitoringConnectionAlive is true; mirror the fix used in
executePreparedStmtAndWriteResult by calling clearConnectionAlive =
cc.setSQLKillerConnectionAlive() and immediately deferring
clearConnectionAlive() (i.e., add defer clearConnectionAlive() right after
setSQLKillerConnectionAlive() inside handleStmt) so the sync.Once-protected
cleanup runs on panic as well; refer to the symbols clearConnectionAlive,
monitoringConnectionAlive, setSQLKillerConnectionAlive, cc.ctx.ExecuteStmt and
the handleStmt path to apply the same hardening as in conn_stmt.go.

2113-2132: ⚖️ Poor tradeoff

Consider per-connection monitor reuse instead of per-statement goroutine churn.

Every monitored autocommit DML spawns a fresh goroutine + ticker, runs for the duration of the statement, and tears down. On a busy TiDB instance with many short-lived INSERTs, this adds measurable scheduler/allocation overhead (goroutine stack + ticker timer per statement). A single per-clientConn monitor that toggles between active/idle states on a chan bool (or driven from dispatch) would avoid the churn.

Not blocking for this PR — the current design is simple and correct — but worth tracking as a follow-up if perf telemetry shows up. Also, the 1ms test interval (line 2116) is fine for unit tests but will hot-loop under heavy parallel test runs; acceptable trade-off for fast disconnect detection.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn.go` around lines 2113 - 2132, monitorConnectionAlive
currently spawns a new goroutine + ticker per statement which causes
goroutine/ticker churn; refactor to a single long-lived per-clientConn monitor
that lives on clientConn and accepts activation messages (e.g., an
activate/deactivate chan bool or context-driven signal) instead of creating a
new goroutine for each autocommit DML. Move monitorConnectionAlive to be started
once for each clientConn (e.g., in clientConn initialization), have it read from
the new activation channel and only run the ticker loop while active, call
cc.ctx.GetSessionVars().SQLKiller.SendKillSignal(sqlkiller.QueryInterrupted) and
cc.cancelDispatch() on isAlive failure as before, and update callers that
currently spawn this goroutine to instead send activate/deactivate signals to
the clientConn monitor.

2093-2111: ⚡ Quick win

Add a brief lifecycle comment to the new helper.

setSQLKillerConnectionAlive does three coupled things — install an IsConnectionAlive callback on the SQLKiller, fork a poller goroutine, and return a once-wrapped cleanup. None of those are obvious from the name alone, and a future reader has to walk through monitorConnectionAlive and Reset() semantics (which does not clear IsConnectionAlive) to understand why the cleanup contract matters.

Per the coding guidelines, "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees…". A short doc comment naming the contract (caller MUST invoke the returned cleanup; idempotent; stops the monitor and clears the callback) would help.

♻️ Suggested doc comment
+// setSQLKillerConnectionAlive installs a connection-liveness probe on the
+// session's SQLKiller and starts a background goroutine that interrupts the
+// statement when the underlying TCP connection is detected as dead.
+// Callers MUST invoke the returned cleanup once the statement is done; it is
+// idempotent and stops the monitor and clears the SQLKiller probe.
 func (cc *clientConn) setSQLKillerConnectionAlive() func() {

As per coding guidelines: "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn.go` around lines 2093 - 2111, Add a short doc comment above
setSQLKillerConnectionAlive describing its lifecycle and contract: explain that
the function installs an IsConnectionAlive callback on the session SQLKiller,
starts a background poller goroutine via monitorConnectionAlive, and returns a
cleanup function that the caller MUST invoke; note the cleanup is idempotent
(uses sync.Once), it stops the monitor (closes stopMonitor) and clears the
IsConnectionAlive store, and callers are expected to call the returned func to
avoid leaving the callback/goroutine running.
pkg/server/conn_stmt.go (1)

313-330: ⚡ Quick win

Defer cleanup right after setup to harden the panic path.

If ExecuteStmt (line 321) panics with monitoringConnectionAlive == true, neither the eager cleanup at line 322-324 nor the deferred cleanup at line 330 runs, and the monitor goroutine plus the IsConnectionAlive function pointer leak until the underlying connection finally closes. SQLKiller.Reset() does not clear IsConnectionAlive, so subsequent statements on the same session can observe a stale liveness function until the leaked goroutine self-exits.

Since setSQLKillerConnectionAlive returns a sync.Once-wrapped cleanup, deferring it immediately is idempotent with the eager call below — recommend pulling the lifecycle into a single deferred call:

♻️ Proposed change
-	clearConnectionAlive := func() {}
-	monitoringConnectionAlive := false
-	if planCacheStmt != nil && planCacheStmt.PreparedAst != nil {
-		monitoringConnectionAlive = shouldMonitorConnectionAliveDuringExecute(planCacheStmt.PreparedAst.Stmt, vars)
-		if monitoringConnectionAlive {
-			clearConnectionAlive = cc.setSQLKillerConnectionAlive()
-		}
-	}
+	clearConnectionAlive := func() {}
+	monitoringConnectionAlive := false
+	if planCacheStmt != nil && planCacheStmt.PreparedAst != nil &&
+		shouldMonitorConnectionAliveDuringExecute(planCacheStmt.PreparedAst.Stmt, vars) {
+		monitoringConnectionAlive = true
+		clearConnectionAlive = cc.setSQLKillerConnectionAlive()
+		defer clearConnectionAlive()
+	}
 	rs, err := (&cc.ctx).ExecuteStmt(ctx, execStmt)
-	if rs == nil || err != nil {
-		clearConnectionAlive()
-	}
 	var lazy bool
 	if rs != nil {
 		if !monitoringConnectionAlive {
 			clearConnectionAlive = cc.setSQLKillerConnectionAlive()
+			defer clearConnectionAlive()
 		}
-		defer clearConnectionAlive()

(Same pattern applies to handleStmt in conn.go.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn_stmt.go` around lines 313 - 330, The current setup can leak
the connection-alive monitor if ExecuteStmt panics because clearConnectionAlive
is only deferred after ExecuteStmt; change the lifecycle so you call
cc.setSQLKillerConnectionAlive() as soon as monitoringConnectionAlive is true
and immediately defer the returned cleanup (clearConnectionAlive) before calling
(&cc.ctx).ExecuteStmt, relying on the idempotent sync.Once cleanup to allow the
eager nil-check later to call the same cleanup safely; update the block that
uses monitoringConnectionAlive, clearConnectionAlive,
setSQLKillerConnectionAlive and ExecuteStmt so the defer is installed
immediately when setSQLKillerConnectionAlive is called (and remove redundant
separate defers/eager calls), ensuring IsConnectionAlive/monitor goroutine
cannot leak on panic.
pkg/server/tests/commontest/tidb_test.go (1)

3507-3602: ⚡ Quick win

Cover the prepared/binary DML path as well.

Both new regressions use conn.ExecContext(...), so they only exercise the COM_QUERY path. This PR also changes prepared execution behavior, and that branch can still regress unnoticed without a PrepareContext(... INSERT ...) / Stmt.ExecContext(...) variant here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/tests/commontest/tidb_test.go` around lines 3507 - 3602, Add a
variant of each failing test that exercises the prepared/binary DML path: in
TestClientDisconnectKillsAutocommitInsert and
TestClientDisconnectCancelsAutocommitInsertPrewrite, after creating the table
obtain conn via dbt.GetDB().Conn(...), call conn.PrepareContext(...) for the
INSERT SQL to get a *Stmt, then run stmt.ExecContext(...) inside the goroutine
(instead of conn.ExecContext), and follow the same netConn.Close(), processlist
checks (processlistCountByInfo), error assertions, and final row-count
assertions; ensure you Close() the prepared Stmt and the Conn in deferred
cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/server/tests/commontest/tidb_test.go`:
- Around line 3521-3595: Capture the server process id for the long-running
INSERT before closing the raw network connection in both
TestClientDisconnectCancelsInsert and
TestClientDisconnectCancelsAutocommitInsert (use the same query logic you use in
processlistCountByInfo to find the matching "insert into issue57531_*" entry),
then register a t.Cleanup that opens a separate DB session (dbt.GetDB().Conn)
and issues a best-effort KILL for that PID if the processlist entry still
exists; ensure you perform the PID capture just after the require.Eventually
that confirms the statement is present and before netConn.Close(), and make the
cleanup tolerant of errors so it never fails the test teardown.

---

Nitpick comments:
In `@pkg/server/conn_stmt.go`:
- Around line 313-330: The current setup can leak the connection-alive monitor
if ExecuteStmt panics because clearConnectionAlive is only deferred after
ExecuteStmt; change the lifecycle so you call cc.setSQLKillerConnectionAlive()
as soon as monitoringConnectionAlive is true and immediately defer the returned
cleanup (clearConnectionAlive) before calling (&cc.ctx).ExecuteStmt, relying on
the idempotent sync.Once cleanup to allow the eager nil-check later to call the
same cleanup safely; update the block that uses monitoringConnectionAlive,
clearConnectionAlive, setSQLKillerConnectionAlive and ExecuteStmt so the defer
is installed immediately when setSQLKillerConnectionAlive is called (and remove
redundant separate defers/eager calls), ensuring IsConnectionAlive/monitor
goroutine cannot leak on panic.

In `@pkg/server/conn.go`:
- Around line 2181-2189: The cleanup function clearConnectionAlive can be
skipped if cc.ctx.ExecuteStmt panics when monitoringConnectionAlive is true;
mirror the fix used in executePreparedStmtAndWriteResult by calling
clearConnectionAlive = cc.setSQLKillerConnectionAlive() and immediately
deferring clearConnectionAlive() (i.e., add defer clearConnectionAlive() right
after setSQLKillerConnectionAlive() inside handleStmt) so the
sync.Once-protected cleanup runs on panic as well; refer to the symbols
clearConnectionAlive, monitoringConnectionAlive, setSQLKillerConnectionAlive,
cc.ctx.ExecuteStmt and the handleStmt path to apply the same hardening as in
conn_stmt.go.
- Around line 2113-2132: monitorConnectionAlive currently spawns a new goroutine
+ ticker per statement which causes goroutine/ticker churn; refactor to a single
long-lived per-clientConn monitor that lives on clientConn and accepts
activation messages (e.g., an activate/deactivate chan bool or context-driven
signal) instead of creating a new goroutine for each autocommit DML. Move
monitorConnectionAlive to be started once for each clientConn (e.g., in
clientConn initialization), have it read from the new activation channel and
only run the ticker loop while active, call
cc.ctx.GetSessionVars().SQLKiller.SendKillSignal(sqlkiller.QueryInterrupted) and
cc.cancelDispatch() on isAlive failure as before, and update callers that
currently spawn this goroutine to instead send activate/deactivate signals to
the clientConn monitor.
- Around line 2093-2111: Add a short doc comment above
setSQLKillerConnectionAlive describing its lifecycle and contract: explain that
the function installs an IsConnectionAlive callback on the session SQLKiller,
starts a background poller goroutine via monitorConnectionAlive, and returns a
cleanup function that the caller MUST invoke; note the cleanup is idempotent
(uses sync.Once), it stops the monitor (closes stopMonitor) and clears the
IsConnectionAlive store, and callers are expected to call the returned func to
avoid leaving the callback/goroutine running.

In `@pkg/server/tests/commontest/tidb_test.go`:
- Around line 3507-3602: Add a variant of each failing test that exercises the
prepared/binary DML path: in TestClientDisconnectKillsAutocommitInsert and
TestClientDisconnectCancelsAutocommitInsertPrewrite, after creating the table
obtain conn via dbt.GetDB().Conn(...), call conn.PrepareContext(...) for the
INSERT SQL to get a *Stmt, then run stmt.ExecContext(...) inside the goroutine
(instead of conn.ExecContext), and follow the same netConn.Close(), processlist
checks (processlistCountByInfo), error assertions, and final row-count
assertions; ensure you Close() the prepared Stmt and the Conn in deferred
cleanup.

In `@pkg/util/sqlkiller/sqlkiller.go`:
- Around line 237-243: CheckConnectionAlive currently updates killer.Signal
directly via atomic.CompareAndSwapUint32 which bypasses the "kill initiated"
audit log; change CheckConnectionAlive to call the existing sendKillSignal
method on SQLKiller instead of performing the CAS itself (i.e., replace the
direct atomic.CompareAndSwapUint32(&killer.Signal, 0, QueryInterrupted) with
killer.sendKillSignal(QueryInterrupted)). Ensure sendKillSignal continues to
perform the atomic CAS and emit the kill-initiated log so both
disconnect-triggered and periodic checks use the same logging path; reference
CheckConnectionAlive, sendKillSignal, SQLKiller, Signal, QueryInterrupted and
HandleSignal when locating the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 36d77292-d723-4dc8-b5a2-fc303967de25

📥 Commits

Reviewing files that changed from the base of the PR and between 84548db and 2c6874f.

📒 Files selected for processing (6)
  • pkg/server/conn.go
  • pkg/server/conn_stmt.go
  • pkg/server/server.go
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/session/tidb.go
  • pkg/util/sqlkiller/sqlkiller.go

Comment thread pkg/server/tests/commontest/tidb_test.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 70.58824% with 35 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.7098%. Comparing base (296420e) to head (60060bd).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #68237        +/-   ##
================================================
- Coverage   77.2806%   75.7098%   -1.5709%     
================================================
  Files          2010       2008         -2     
  Lines        555350     563442      +8092     
================================================
- Hits         429178     426581      -2597     
- Misses       125252     136804     +11552     
+ Partials        920         57       -863     
Flag Coverage Δ
integration 41.5480% <70.5882%> (+1.7550%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 60.4679% <ø> (ø)
parser ∅ <ø> (∅)
br 49.9534% <ø> (-13.0557%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
pkg/server/conn.go (1)

2104-2132: ⚡ Quick win

Cleanup does not synchronize with goroutine exit.

close(stopMonitor) returns immediately; the goroutine may have already entered the case <-ticker.C arm (Go's select picks randomly when both cases are ready), so SendKillSignal/cancelDispatch can still fire after the caller treats monitoring as cleared. In practice the dispatch loop resets SQLKiller between commands so it's mostly benign, but a stray cancel landing on the next dispatch's context (Line 1373-1376) is observable. Consider re-checking stop after the isAlive() call, or have the cleanup wait for the goroutine via a done channel / sync.WaitGroup.

♻️ Sketch — re-check stop after liveness check
 	for {
 		select {
 		case <-ticker.C:
+			select {
+			case <-stop:
+				return
+			default:
+			}
 			if !isAlive() {
 				cc.ctx.GetSessionVars().SQLKiller.SendKillSignal(sqlkiller.QueryInterrupted)
 				cc.cancelDispatch()
 				return
 			}
 		case <-stop:
 			return
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/conn.go` around lines 2104 - 2132, The monitor cleanup races with
monitorConnectionAlive: close(stopMonitor) can return before the goroutine
exits, so SQLKiller.SendKillSignal / cc.cancelDispatch may run after cleanup.
Fix by adding a done channel (or sync.WaitGroup) that monitorConnectionAlive
closes/signals before it returns and have the clearOnce closure wait for that
signal after closing stopMonitor; also ensure monitorConnectionAlive checks stop
non-blockingly after detecting !isAlive() (e.g., select { case <-stop: return
default: } ) before calling SQLKiller.SendKillSignal and cc.cancelDispatch.
Update the symbols clearOnce, stopMonitor, monitorConnectionAlive, isAlive,
SQLKiller.SendKillSignal, and cancelDispatch accordingly so cleanup waits for
the goroutine to finish.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/server/conn.go`:
- Around line 2220-2225: The shutdown short-circuit in the
ExecuteStmt/connection handling branch can return before registering defer
clearConnectionAlive(), leaking the monitor goroutine and
SQLKiller.IsConnectionAlive pointer; modify the logic in the conn handling
around ExecuteStmt/rs so that when monitoringConnectionAlive is true and rs !=
nil you always call cc.setSQLKillerConnectionAlive() and register defer
clearConnectionAlive() (or call clearConnectionAlive() just before any early
return caused by cc.getStatus() == connStatusShutdown) to ensure the SQLKiller
monitor is cleaned up; update the code paths that currently return on
connStatusShutdown so they either set/defer clearConnectionAlive first or invoke
clearConnectionAlive synchronously before returning, ensuring no dangling
SendKillSignal/cancelDispatch targets remain.
- Around line 2093-2099: In clientConn.setSQLKillerConnectionAlive add a brief
clarifying comment next to the cc.bufReadConn.IsAlive() != 0 check explaining
that IsAlive() returns 1 for alive, 0 for dead and -1 for an error/unknown state
(e.g., SetReadDeadline failure), and that the code intentionally treats any
non-zero value (1 or -1) as “not dead” to conservatively avoid terminating
queries when liveness is uncertain; reference cc.bufReadConn.IsAlive() and the
setSQLKillerConnectionAlive closure in the comment so future readers understand
the safety rationale.

---

Nitpick comments:
In `@pkg/server/conn.go`:
- Around line 2104-2132: The monitor cleanup races with monitorConnectionAlive:
close(stopMonitor) can return before the goroutine exits, so
SQLKiller.SendKillSignal / cc.cancelDispatch may run after cleanup. Fix by
adding a done channel (or sync.WaitGroup) that monitorConnectionAlive
closes/signals before it returns and have the clearOnce closure wait for that
signal after closing stopMonitor; also ensure monitorConnectionAlive checks stop
non-blockingly after detecting !isAlive() (e.g., select { case <-stop: return
default: } ) before calling SQLKiller.SendKillSignal and cc.cancelDispatch.
Update the symbols clearOnce, stopMonitor, monitorConnectionAlive, isAlive,
SQLKiller.SendKillSignal, and cancelDispatch accordingly so cleanup waits for
the goroutine to finish.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7aa8e686-da73-41a6-8533-fa3ce3f6ef73

📥 Commits

Reviewing files that changed from the base of the PR and between 2c6874f and 75448a0.

📒 Files selected for processing (2)
  • pkg/server/conn.go
  • pkg/session/tidb.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/session/tidb.go

Comment thread pkg/server/conn.go
Comment thread pkg/server/conn.go
@King-Dylan
Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@King-Dylan
Copy link
Copy Markdown
Member Author

/retest

Comment thread pkg/server/tests/commontest/tidb_test.go Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/server/tests/commontest/tidb_test.go (1)

3604-3612: ⚡ Quick win

Prefer select with time.After over polling a buffered channel.

done is a buffered, single-shot channel; using require.Eventually with a non-blocking select polls every 50 ms instead of just blocking on the receive. A plain select with a timeout is more idiomatic, fails faster on a real hang, and makes the timeout message explicit.

♻️ Proposed simplification
-	var execErr error
-	require.Eventually(t, func() bool {
-		select {
-		case execErr = <-done:
-			return true
-		default:
-			return false
-		}
-	}, 5*time.Second, 50*time.Millisecond)
-	require.Error(t, execErr)
+	var execErr error
+	select {
+	case execErr = <-done:
+	case <-time.After(5 * time.Second):
+		t.Fatal("timeout waiting for autocommit insert to return after client disconnect")
+	}
+	require.Error(t, execErr)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/server/tests/commontest/tidb_test.go` around lines 3604 - 3612, The
current test polls the buffered single-shot channel done using
require.Eventually and a non-blocking select; replace that polling with a
blocking select that waits on the done channel and a timeout via time.After so
the test blocks until the goroutine finishes or the timeout fires. Modify the
code that assigns execErr from <-done (the done channel and execErr variable in
tidb_test.go) to use: select { case execErr = <-done: /* success */ case
<-time.After(5 * time.Second): t.Fatalf("timed out waiting for done") } so it
fails fast and reports an explicit timeout instead of polling.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/server/tests/commontest/tidb_test.go`:
- Around line 3604-3612: The current test polls the buffered single-shot channel
done using require.Eventually and a non-blocking select; replace that polling
with a blocking select that waits on the done channel and a timeout via
time.After so the test blocks until the goroutine finishes or the timeout fires.
Modify the code that assigns execErr from <-done (the done channel and execErr
variable in tidb_test.go) to use: select { case execErr = <-done: /* success */
case <-time.After(5 * time.Second): t.Fatalf("timed out waiting for done") } so
it fails fast and reports an explicit timeout instead of polling.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4cae3642-411d-485c-b3f3-b9c3e2f21040

📥 Commits

Reviewing files that changed from the base of the PR and between 2a464cc and 76574ec.

📒 Files selected for processing (1)
  • pkg/server/tests/commontest/tidb_test.go

@King-Dylan
Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@King-Dylan
Copy link
Copy Markdown
Member Author

/retest

Copy link
Copy Markdown
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 14, 2026
@King-Dylan King-Dylan force-pushed the cover-insert-connection-alive branch from 76574ec to 60060bd Compare May 18, 2026 21:21
@King-Dylan
Copy link
Copy Markdown
Member Author

/retest

@King-Dylan King-Dylan requested a review from Defined2014 May 18, 2026 21:42
@Defined2014
Copy link
Copy Markdown
Contributor

/retest

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bb7133, Defined2014

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 19, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 19, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-14 20:55:19.119239342 +0000 UTC m=+385487.652018671: ☑️ agreed by bb7133.
  • 2026-05-19 01:18:41.489000156 +0000 UTC m=+227050.993130832: ☑️ agreed by Defined2014.

@ti-chi-bot ti-chi-bot Bot merged commit a62a6d1 into pingcap:master May 19, 2026
34 of 36 checks passed
@King-Dylan King-Dylan added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label May 19, 2026
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #68481.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: autocommit INSERT is not interrupted promptly after client disconnect

4 participants