fix: data races that can crash the process under load by ThiagoBauken · Pull Request #325 · asternic/wuzapi

ThiagoBauken · 2026-06-02T05:25:20Z

Summary

Three related data-race fixes, each of which can crash the whole process with fatal error: concurrent map ... (or cause torn reads) under concurrent load. They're the same class of issue, so I've grouped them into one PR.

These supersede #317, #318 and #319 — I closed those to consolidate into a single, easier-to-review change.

1. `updateUserInfo` mutated a shared map in place — `helpers.go`

updateUserInfo wrote straight into the map held by userinfocache (values.(Values).m[field] = value). That same map is handed to request goroutines through the request context, so the in-place write races with concurrent readers (Values.Get), producing fatal error: concurrent map read and map write.

Now it's copy-on-write: build a fresh map, copy the old entries, set the new value, and return a new Values. Callers already persist the result via userinfocache.Set, so no call sites change.

2. `killchannel` map accessed without synchronization — `main.go`, `wmiau.go`, `handlers.go`

The global killchannel (map[string]chan bool) was read, written and deleted from HTTP request goroutines (Connect / Disconnect / logout) and from the per-session startClient goroutine, with no lock — so two simultaneous connects could crash with fatal error: concurrent map writes.

It's now guarded by a dedicated mutex behind small helpers (setKillChannel / getKillChannel / deleteKillChannel / signalKill). The lock is held only around the map operation, never while sending/receiving on a channel. As a small bonus, the keep-alive loop now blocks on <-kill instead of polling the map every second.

3. Removed the write-only `MyClient.subscriptions` field — `clients.go`, `wmiau.go`, `handlers.go`

This field was written from two goroutines (the event handler and the request path) but never read — a data race on dead state. Subscriptions are already re-read from users.events / userinfocache on every event (updateAndGetUserSubscriptions), so the field was redundant. Removed the field, its assignments, and the now-unused subscribedEvents parameter of startClient.

Testing

New unit tests covering each fix:

TestUpdateUserInfoCopyOnWrite — proves the original Values is untouched after an update.
TestUpdateUserInfoConcurrent — hammers updateUserInfo + Get from many goroutines.
TestKillChannelHelpers — set/get/delete/signal round-trip.
TestKillChannelConcurrent — concurrent set/get/delete across many goroutines.
TestUpdateAndGetUserSubscriptionsFromCache — subscriptions still resolve from the cache after removing the field.

The two *Concurrent tests rely on Go's built-in concurrent-map detector: on the unfixed code they fatal-error with concurrent map writes / concurrent map read and map write (that was the failing "before" state); they pass on the fixed code.

go build ./...                       # ok
go vet ./...                         # ok
go test ./...                        # ok
GOOS=linux GOARCH=amd64 go build .   # ok (deploys run on Linux)
GOOS=linux GOARCH=amd64 go vet ./... # ok

No behaviour changes for API consumers — these are internal correctness/stability fixes.

…polling

gemini-code-assist

Code Review

This pull request introduces thread-safe helpers (setKillChannel, getKillChannel, deleteKillChannel, signalKill) guarded by a mutex to replace direct, concurrent map access to killchannel, resolving potential data races. It also implements a copy-on-write mechanism in updateUserInfo to prevent concurrent map read/write panics, removes unused event subscription fields, and adds comprehensive tests. The reviewer's feedback focuses on further improving thread safety and avoiding redundant lookups by passing the kill channel directly as a parameter to startClient and modifying deleteKillChannel to accept the channel as an argument to prevent a slow-cleanup goroutine from deleting a newly established session's channel.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T05:27:03Z

+func deleteKillChannel(userID string) {
+	killchannelMu.Lock()
+	delete(killchannel, userID)
+	killchannelMu.Unlock()
+}


To prevent a race condition where a slow-cleanup goroutine from an old session deletes the kill channel of a newly established session for the same user, deleteKillChannel should accept the channel as an argument and only delete it if it matches the current channel in the map.

Suggested change

func deleteKillChannel(userID string) {

killchannelMu.Lock()

delete(killchannel, userID)

killchannelMu.Unlock()

}

func deleteKillChannel(userID string, ch chan bool) {

killchannelMu.Lock()

if current, ok := killchannel[userID]; ok && current == ch {

delete(killchannel, userID)

}

killchannelMu.Unlock()

}

gemini-code-assist · 2026-06-02T05:27:03Z

 }

-func (s *server) startClient(userID string, textjid string, token string, subscriptions []string) {
+func (s *server) startClient(userID string, textjid string, token string) {


Pass the kill channel directly as a parameter to startClient to avoid any potential race conditions or lookups from the global map during startup.

Suggested change

func (s *server) startClient(userID string, textjid string, token string) {

func (s *server) startClient(userID string, textjid string, token string, kill chan bool) {

gemini-code-assist · 2026-06-02T05:27:03Z

+		setKillChannel(txtid, make(chan bool, 1))
+		go s.startClient(txtid, jid, token)


Create the kill channel and pass it directly to startClient to avoid looking it up from the global map inside the goroutine.

Suggested change

setKillChannel(txtid, make(chan bool, 1))

go s.startClient(txtid, jid, token)

ch := make(chan bool, 1)

setKillChannel(txtid, ch)

go s.startClient(txtid, jid, token, ch)

gemini-code-assist · 2026-06-02T05:27:04Z

+			setKillChannel(txtid, make(chan bool, 1))
+			go s.startClient(txtid, jid, token)


Create the kill channel and pass it directly to startClient to avoid looking it up from the global map inside the goroutine.

Suggested change

setKillChannel(txtid, make(chan bool, 1))

go s.startClient(txtid, jid, token)

ch := make(chan bool, 1)

setKillChannel(txtid, ch)

go s.startClient(txtid, jid, token, ch)

gemini-code-assist · 2026-06-02T05:27:04Z

+	// Keep the session goroutine alive until a kill signal arrives. Block on the
+	// channel (captured once via the mutex-guarded helper) instead of polling —
+	// this parks the goroutine with zero CPU and no per-second mutex access.
+	kill, ok := getKillChannel(userID)
+	if !ok {
+		log.Error().Str("userid", userID).Msg("no kill channel registered for session; goroutine exiting")
+		return
+	}
+	<-kill
+	log.Info().Str("userid", userID).Msg("Received kill signal")
+	client.Disconnect()
+	clientManager.DeleteWhatsmeowClient(userID)
+	clientManager.DeleteMyClient(userID)
+	clientManager.DeleteHTTPClient(userID)
+	if _, err := s.db.Exec(`UPDATE users SET qrcode='', connected=0 WHERE id=$1`, userID); err != nil {
+		log.Error().Err(err).Msg("failed to mark user disconnected on kill")
 	}
+	deleteKillChannel(userID)


Since the kill channel is now passed directly as a parameter, we can eliminate the map lookup and the error handling. Also, pass the kill channel to deleteKillChannel to ensure we only delete the correct channel from the map.

Suggested change

// Keep the session goroutine alive until a kill signal arrives. Block on the

// channel (captured once via the mutex-guarded helper) instead of polling —

// this parks the goroutine with zero CPU and no per-second mutex access.

kill, ok := getKillChannel(userID)

if !ok {

log.Error().Str("userid", userID).Msg("no kill channel registered for session; goroutine exiting")

return

}

<-kill

log.Info().Str("userid", userID).Msg("Received kill signal")

client.Disconnect()

clientManager.DeleteWhatsmeowClient(userID)

clientManager.DeleteMyClient(userID)

clientManager.DeleteHTTPClient(userID)

if _, err := s.db.Exec(`UPDATE users SET qrcode='', connected=0 WHERE id=$1`, userID); err != nil {

log.Error().Err(err).Msg("failed to mark user disconnected on kill")

}

deleteKillChannel(userID)

// Keep the session goroutine alive until a kill signal arrives. Block on the

// channel (passed directly as a parameter) instead of polling —

// this parks the goroutine with zero CPU and no per-second mutex access.

<-kill

log.Info().Str("userid", userID).Msg("Received kill signal")

client.Disconnect()

clientManager.DeleteWhatsmeowClient(userID)

clientManager.DeleteMyClient(userID)

clientManager.DeleteHTTPClient(userID)

if _, err := s.db.Exec("UPDATE users SET qrcode='', connected=0 WHERE id=$1", userID); err != nil {

log.Error().Err(err).Msg("failed to mark user disconnected on kill")

}

deleteKillChannel(userID, kill)

gemini-code-assist · 2026-06-02T05:27:04Z

+	}
+
+	// delete removes the entry; signalKill on a missing entry is a safe no-op.
+	deleteKillChannel(u)


Update the test to pass the channel to deleteKillChannel.

Suggested change

deleteKillChannel(u)

deleteKillChannel(u, ch)

gemini-code-assist · 2026-06-02T05:27:04Z

+			setKillChannel(uid, make(chan bool, 1))
+			signalKill(uid)
+			_, _ = getKillChannel(uid)
+			deleteKillChannel(uid)


Update the concurrent test to use a local channel variable and pass it to deleteKillChannel.

ch := make(chan bool, 1) setKillChannel(uid, ch) signalKill(uid) _, _ = getKillChannel(uid) deleteKillChannel(uid, ch)

…elete A slow-cleanup goroutine from an old session could delete the kill channel of a newer session for the same user (reconnect), leaving the new session unkillable. deleteKillChannel now takes the channel the caller owns and removes the entry only if the map still holds that exact channel. Addresses Gemini review on asternic#325. New test TestDeleteKillChannelStaleSession fails (unkillable session) on the old unconditional delete, passes now.

Each startClient goroutine now receives the channel its caller registered instead of looking it up from the global map at startup. This removes a lookup-ownership race (a concurrent reconnect could replace the map entry between setKillChannel and the goroutine's getKillChannel, so two sessions could end up blocked on the same buffered channel and one would leak). The goroutine now deterministically owns its own channel, which also pairs with the compare-and-delete in deleteKillChannel. Addresses the remaining Gemini review items on asternic#325.

ThiagoBauken · 2026-06-02T05:41:08Z

Thanks for the thorough review — addressed all of it:

deleteKillChannel compare-and-delete (1ab2435): it now takes the channel the caller owns and deletes only if the map still holds that exact channel, so a slow-cleanup goroutine from an old session can't drop a newer session's channel. Added TestDeleteKillChannelStaleSession, which fails (session is now unkillable) on the old unconditional delete and passes with the fix.
Pass kill as an explicit parameter to startClient (4d9b622): both callers (Connect and connectOnStartup) now create the channel, register it via setKillChannel, and hand it to the goroutine. This removes the map lookup + error path inside startClient and closes the lookup-ownership race (a reconnect could replace the map entry between setKillChannel and the goroutine's getKillChannel, leaving two sessions blocked on one buffered channel). Each goroutine now deterministically owns its own channel, which also pairs cleanly with the compare-and-delete above.
Tests updated to the new deleteKillChannel signature.

go build ./..., go vet ./... and go test ./... all pass, including a GOOS=linux cross-build (deploys run on Linux).

Thiago Bauken added 3 commits June 2, 2026 02:19

fix: copy-on-write in updateUserInfo to avoid shared-map data race

08e7c51

fix: serialize killchannel map access with a mutex; block instead of …

51cac6e

…polling

refactor: remove write-only MyClient.subscriptions field (data race)

da8325f

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Thiago Bauken added 2 commits June 2, 2026 02:36

Merge branch 'main' into fix/data-races

39a307e

asternic merged commit d9eaeaf into asternic:main Jun 2, 2026
1 check passed

asternic had a problem deploying to DOCKER June 2, 2026 20:27 — with GitHub Actions Failure

ThiagoBauken mentioned this pull request Jun 4, 2026

fix: data race and nil deref reading S3 bucket config #331

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: data races that can crash the process under load#325

fix: data races that can crash the process under load#325
asternic merged 6 commits into
asternic:mainfrom
ThiagoBauken:fix/data-races

ThiagoBauken commented Jun 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

ThiagoBauken commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	func (s *server) startClient(userID string, textjid string, token string) {
	func (s *server) startClient(userID string, textjid string, token string, kill chan bool) {

		setKillChannel(txtid, make(chan bool, 1))
		go s.startClient(txtid, jid, token)

-		setKillChannel(txtid, make(chan bool, 1))
-		go s.startClient(txtid, jid, token)
+		ch := make(chan bool, 1)
+		setKillChannel(txtid, ch)
+		go s.startClient(txtid, jid, token, ch)

Uh oh!

Conversation

ThiagoBauken commented Jun 2, 2026

Summary

1. updateUserInfo mutated a shared map in place — helpers.go

2. killchannel map accessed without synchronization — main.go, wmiau.go, handlers.go

3. Removed the write-only MyClient.subscriptions field — clients.go, wmiau.go, handlers.go

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

ThiagoBauken commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `updateUserInfo` mutated a shared map in place — `helpers.go`

2. `killchannel` map accessed without synchronization — `main.go`, `wmiau.go`, `handlers.go`

3. Removed the write-only `MyClient.subscriptions` field — `clients.go`, `wmiau.go`, `handlers.go`