Skip to content

Bug: Data race in ws.Client — pingLoop reads fields concurrently modified by configure #188

@Acosmi

Description

@Acosmi

Title
Bug: Data race in ws.Client —
pingLoop
reads fields concurrently modified by configure

Labels
bug
, websocket, concurrency

Description
Summary
ws.Client has a data race between the
pingLoop
goroutine and the configure method. The configure method writes to c.pingInterval, c.reconnectCount, c.reconnectInterval, and c.reconnectNonce without holding any lock, while
pingLoop
reads c.pingInterval and c.serviceID from a separate goroutine — also without any lock.

This is detected by Go's race detector (go test -race).

Environment
SDK version: v3.5.3
Go version: go1.23+
OS: macOS (arm64), also reproducible on Linux
How to Reproduce
Create a Feishu app with WebSocket event subscription enabled
Initialize and start the WebSocket client:
go
cli := larkws.NewClient(appID, appSecret,
larkws.WithAutoReconnect(true),
larkws.WithEventHandler(dispatcher),
)
go cli.Start(context.Background())
Run your application with the race detector enabled:
bash
go test -race ./...
The race detector reports a data race as shown below.
Race Detector Output

WARNING: DATA RACE
Write at 0x00c0000ec468 by goroutine 141:
github.com/larksuite/oapi-sdk-go/v3/ws.(*Client).configure()
.../ws/client.go:510 +0x300
github.com/larksuite/oapi-sdk-go/v3/ws.(*Client).handleControlFrame()
.../ws/client.go:393 +0x31c
github.com/larksuite/oapi-sdk-go/v3/ws.(*Client).handleMessage()
.../ws/client.go:371 +0x1ec
github.com/larksuite/oapi-sdk-go/v3/ws.(*Client).receiveMessageLoop.gowrap1()
.../ws/client.go:352 +0x64
Previous read at 0x00c0000ec468 by goroutine 130:
github.com/larksuite/oapi-sdk-go/v3/ws.(*Client).pingLoop()
.../ws/client.go:318 +0xa0
github.com/larksuite/oapi-sdk-go/v3/ws.(*Client).Start.gowrap1()
.../ws/client.go:124 +0x48

Root Cause Analysis
In ws/client.go:

Writer — configure() (line ~510): Called from handleControlFrame() → handleMessage() goroutine. Writes to shared fields without any synchronization:

go
func (c *Client) configure(conf *ClientConfig) {
c.reconnectCount = conf.ReconnectCount // ⚠️ unsynchronized write
c.reconnectInterval = time.Duration(conf.ReconnectInterval) * time.Second // ⚠️
c.reconnectNonce = conf.ReconnectNonce // ⚠️
c.pingInterval = time.Duration(conf.PingInterval) * time.Second // ⚠️
}
Reader —
pingLoop()
(line ~318): Runs in a separate goroutine started by
Start()
. Reads the same fields without synchronization:

go
func (c *Client) pingLoop(ctx context.Context) {
for {
if c.conn != nil {
i, _ := strconv.ParseInt(c.serviceID, 10, 32) // ⚠️ unsynchronized read
// ...
}
time.Sleep(c.pingInterval) // ⚠️ unsynchronized read — races with configure()
}
}
The existing c.mu mutex is only used in
writeMessage()
and disconnect(), but NOT in configure() or
pingLoop()
.

Suggested Fix
Add c.mu lock protection to configure(), and read c.pingInterval / c.serviceID under the same lock in
pingLoop()
:

diff
func (c *Client) configure(conf *ClientConfig) {

  • c.mu.Lock()
  • defer c.mu.Unlock()
    c.reconnectCount = conf.ReconnectCount
    c.reconnectInterval = time.Duration(conf.ReconnectInterval) * time.Second
    c.reconnectNonce = conf.ReconnectNonce
    c.pingInterval = time.Duration(conf.PingInterval) * time.Second
    }
    func (c *Client) pingLoop(ctx context.Context) {
    // ...
    for {
    if c.conn != nil {
  •        c.mu.Lock()
           i, _ := strconv.ParseInt(c.serviceID, 10, 32)
    
  •        interval := c.pingInterval
    
  •        c.mu.Unlock()
           frame := NewPingFrame(int32(i))
           // ...
    
  •        time.Sleep(interval)
    
  •        time.Sleep(c.pingInterval)
    
  •    } else {
    
  •        c.mu.Lock()
    
  •        interval := c.pingInterval
    
  •        c.mu.Unlock()
    
  •        time.Sleep(interval)
       }
    
  •    time.Sleep(c.pingInterval)
    
    }
    }
    Additionally, getConnURL() also calls c.configure() (line ~305) — this call path similarly needs protection.

Impact
Programs using go test -race will report data races and may fail CI
In production, concurrent reads/writes to time.Duration fields can cause time.Sleep with corrupted durations (e.g., sleeping for extremely long or negative durations)
The race also affects c.serviceID (string), which is not safe for concurrent access in Go
Workaround
Users can set OPENACOSMI_SKIP_CHANNELS or avoid starting the WebSocket client during go test -race runs until this is fixed upstream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions