Skip to content

Bug: panic: send on closed channel in receiveLoop on EOF #46

@naufalandika

Description

@naufalandika

Summary

When the remote FreeSWITCH server closes the TCP connection (EOF), receiveLoop panics with send on closed channel because it attempts to send to a response channel that has already been closed by a concurrent close() call.

Environment

  • Library: github.com/percipia/eslgo v1.5.0
  • Go version: 1.21+

Panic Output

panic: send on closed channel

goroutine 3110 [running]:
github.com/percipia/eslgo.(*Conn).receiveLoop(0xc0006803c0)
        .../vendor/github.com/percipia/eslgo/connection.go:299 +0x345
created by github.com/percipia/eslgo.newConnection in goroutine 1219
        .../vendor/github.com/percipia/eslgo/connection.go:92 +0x5ff

How to Reproduce

Real-world scenario

  1. Establish an inbound or outbound ESL connection to FreeSWITCH
  2. Have FreeSWITCH terminate the connection from its side (e.g., fs_cli -x "reload mod_event_socket", a FreeSWITCH restart, or a network drop)
  3. Simultaneously call conn.ExitAndClose() from the application side (e.g., triggered by a session timeout or context cancellation racing with the EOF)
  4. The panic occurs non-deterministically — it depends on goroutine scheduling. Running under load or with -race makes it more consistent.

Minimal reproducer

The following test reliably triggers the race by simulating an abrupt server-side close concurrent with a client Close() call:

package eslgo_test

import (
    "net"
    "testing"
    "time"

    "github.com/percipia/eslgo"
)

// fakeFreeSWITCH writes a minimal ESL auth/response handshake then immediately
// closes the connection, simulating a server-side EOF.
func fakeFreeSWITCH(t *testing.T) net.Listener {
    t.Helper()
    ln, err := net.Listen("tcp", "127.0.0.1:0")
    if err != nil {
        t.Fatal(err)
    }
    go func() {
        conn, err := ln.Accept()
        if err != nil {
            return
        }
        // Minimal ESL handshake: send auth request, accept any auth reply, then drop
        conn.Write([]byte("Content-Type: auth/request\r\n\r\n"))
        buf := make([]byte, 256)
        conn.Read(buf) // consume auth reply
        conn.Write([]byte("Content-Type: command/reply\r\nReply-Text: +OK accepted\r\n\r\n"))
        time.Sleep(50 * time.Millisecond)
        conn.Close() // triggers EOF in receiveLoop
    }()
    return ln
}

func TestReceiveLoopPanicOnEOF(t *testing.T) {
    ln := fakeFreeSWITCH(t)
    defer ln.Close()

    conn, err := eslgo.Dial(ln.Addr().String(), "ClueCon", eslgo.DefaultOptions)
    if err != nil {
        t.Fatal(err)
    }

    // Race: server closes (EOF) vs client calling ExitAndClose concurrently
    go func() {
        time.Sleep(60 * time.Millisecond)
        conn.ExitAndClose() // may race with receiveLoop's EOF handling
    }()

    time.Sleep(200 * time.Millisecond) // give enough time for the panic to surface
}

Run with the race detector to confirm:

go test -race -run TestReceiveLoopPanicOnEOF -count=10 ./...

Root Cause

There is a race between receiveLoop and close() on the responseChannels map.

close() closes all response channels under a write lock:

func (c *Conn) close() {
    c.stopFunc()
    c.responseChanMutex.Lock()
    defer c.responseChanMutex.Unlock()
    for key, responseChan := range c.responseChannels {
        close(responseChan)                  // TypeDisconnect channel is closed here
        delete(c.responseChannels, key)
    }
    c.conn.Close()
}

receiveLoop sends to TypeDisconnect on EOF — without holding the mutex:

func (c *Conn) receiveLoop() {
    for c.runningContext.Err() == nil {
        err := c.doMessage()
        if err != nil {
            if err.Error() == "EOF" {
                c.logger.Warn("Connection closed, stopping receive loop\n")
                select {
                // No mutex held — channel may already be closed by close()
                case c.responseChannels[TypeDisconnect] <- &RawResponse{...}:
                default:
                }
                return
            }
            break
        }
    }
}

Note that doMessage() correctly holds responseChanMutex.RLock() when writing to response channels. The EOF path in receiveLoop skips this protection.

Race Sequence

  1. FreeSWITCH drops the TCP connection → receiveLoop receives EOF from doMessage()
  2. Simultaneously, a caller invokes ExitAndClose()close() acquires write lock → closes and deletes all channels including TypeDisconnect
  3. receiveLoop evaluates c.responseChannels[TypeDisconnect] — obtains the now-closed channel pointer (map read is not yet protected)
  4. select attempts to send to the closed channel → panic

Proposed Fix

Mirror the same responseChanMutex.RLock() pattern used by doMessage() around the disconnect send in receiveLoop:

func (c *Conn) receiveLoop() {
    for c.runningContext.Err() == nil {
        err := c.doMessage()
        if err != nil {
            if err.Error() == "EOF" {
                c.logger.Warn("Connection closed, stopping receive loop\n")
                c.responseChanMutex.RLock()
                disconnectCh, ok := c.responseChannels[TypeDisconnect]
                if ok {
                    select {
                    case disconnectCh <- &RawResponse{
                        Headers: textproto.MIMEHeader{
                            "Content-Type": []string{TypeDisconnect},
                            "Error":        []string{err.Error()},
                        },
                        Body: []byte("connection closed: " + err.Error()),
                    }:
                    default:
                    }
                }
                c.responseChanMutex.RUnlock()
                return
            }
            break
        }
    }
}

This ensures the channel is not read from the map and sent to after close() has already deleted and closed it, consistent with how doMessage() handles the same channels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions