Skip to content

fix: serialize killchannel map access to prevent concurrent-map crash#318

Closed
ThiagoBauken wants to merge 2 commits into
asternic:mainfrom
ThiagoBauken:fix/killchannel-map-race
Closed

fix: serialize killchannel map access to prevent concurrent-map crash#318
ThiagoBauken wants to merge 2 commits into
asternic:mainfrom
ThiagoBauken:fix/killchannel-map-race

Conversation

@ThiagoBauken

Copy link
Copy Markdown
Contributor

Problem

killchannel (map[string]chan bool, declared in main.go) is accessed from two kinds of goroutines with no synchronization:

  • HTTP request goroutinesConnect writes killchannel[txtid] = …; Disconnect, Logout and the admin delete path send on killchannel[txtid].
  • The per-session startClient goroutine — reads killchannel[userID] in its loop and deletes the entry on kill.

Concurrent connects/disconnects therefore read, write and delete the same map at the same time. Go's runtime detects this and aborts the whole process with fatal error: concurrent map read and map write (or concurrent map writes). This is consistent with the crash reports in this area (#191, #153, #212).

This is separate from #242, which fixed the channel buffering; the map itself was never guarded.

Fix

Introduce a dedicated sync.Mutex (killchannelMu) and route every access through four small helpers:

  • setKillChannel, getKillChannel, deleteKillChannel — lock only around the map op;
  • signalKill — looks up the channel under the lock, then does the existing non-blocking send outside the lock.

The lock is never held while sending on or receiving from a channel, so a slow or absent receiver cannot block another session or deadlock. The startClient loop reads its channel via getKillChannel each iteration instead of indexing the map directly. No lifecycle behaviour changes — it purely makes the existing operations thread-safe, complementing the buffered-channel fix from #242.

Testing

  • TestKillChannelHelpers — set/get/signal/delete cycle + safe no-op signal on a missing entry.
  • TestKillChannelConcurrent — 100 users hammered from concurrent goroutines; under go test -race (CI) the old bare-map access reports the race, the guarded helpers are clean.
  • go build ./... ✅ · go vet ./... ✅ · go test ./...

killchannel (map[string]chan bool) was read, written and deleted from both HTTP
request goroutines (Connect/Disconnect/logout/admin-delete) and the per-session
startClient goroutine with no synchronization, which can crash the process with
"concurrent map read and map write" / "concurrent map writes" under simultaneous
connects.

Guard every map operation with a dedicated mutex via four helpers
(setKillChannel / getKillChannel / deleteKillChannel / signalKill). The lock is
held only around the map access, never while sending on or receiving from a
channel, so a slow or absent receiver cannot block another session. The
startClient loop now reads its channel through getKillChannel each iteration
instead of indexing the map directly. Behaviour is otherwise unchanged; this
complements the buffered-channel fix from asternic#242.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces direct, unguarded access to the killchannel map with mutex-guarded helper functions to prevent data races, and adds concurrent unit tests. The review feedback highlights a critical concurrency bug and performance issue in wmiau.go where retrieving the kill channel inside a for loop on every iteration can lead to an infinite busy-loop, resource leaks, and high mutex contention. The reviewer suggests retrieving the channel once before entering the loop.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread wmiau.go
Comment on lines 658 to +661
for {
kill, _ := getKillChannel(userID)
select {
case <-killchannel[userID]:
case <-kill:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Critical Concurrency Bug & Performance Overhead

By calling getKillChannel(userID) inside the for loop on every iteration, two major issues are introduced:

  1. Resource Leak & Infinite Busy-Loop (Bug): If a new connection is established for the same userID while an older startClient goroutine is still running, the older goroutine will retrieve the new connection's kill channel from the map. When a disconnect signal is sent, only one of the goroutines (non-deterministically) will receive the signal and exit. The remaining goroutine will continue running, and on its next iteration, it will find that the channel has been deleted from the map (getKillChannel returns nil). Reading from a nil channel blocks forever, causing the select to always fall through to the default branch and sleep for 1 second. This results in an infinite busy-loop that leaks CPU and memory indefinitely.
  2. High Mutex Contention (Performance): Calling getKillChannel every second for every active session constantly locks and unlocks the global killchannelMu mutex, leading to severe CPU overhead and lock contention under high load.

Solution

Retrieve the kill channel once before entering the loop. This ensures the goroutine always listens to its own dedicated channel, preventing the leak/race bug and completely eliminating the repeated mutex locking overhead.

Suggested change
for {
kill, _ := getKillChannel(userID)
select {
case <-killchannel[userID]:
case <-kill:
kill, _ := getKillChannel(userID)
for {
select {
case <-kill:

Reading getKillChannel(userID) on every loop iteration re-locked killchannelMu
once per second per session (contention under load) and could switch to a
replaced channel after a reconnect. Capture the channel once before the loop so
each session goroutine listens on its own channel without re-locking.
Addresses Gemini Code Assist review on asternic#318.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ThiagoBauken

Copy link
Copy Markdown
Contributor Author

Closing to consolidate my recent PRs into a smaller, focused set that's easier to review — I'll resubmit the changes grouped together. Sorry for the churn, and thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant